summaryrefslogtreecommitdiffstats
path: root/Documentation/admin-guide
AgeCommit message (Collapse)AuthorFilesLines
2019-09-28Merge branch 'next-lockdown' of ↵Linus Torvalds1-0/+9
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull kernel lockdown mode from James Morris: "This is the latest iteration of the kernel lockdown patchset, from Matthew Garrett, David Howells and others. From the original description: This patchset introduces an optional kernel lockdown feature, intended to strengthen the boundary between UID 0 and the kernel. When enabled, various pieces of kernel functionality are restricted. Applications that rely on low-level access to either hardware or the kernel may cease working as a result - therefore this should not be enabled without appropriate evaluation beforehand. The majority of mainstream distributions have been carrying variants of this patchset for many years now, so there's value in providing a doesn't meet every distribution requirement, but gets us much closer to not requiring external patches. There are two major changes since this was last proposed for mainline: - Separating lockdown from EFI secure boot. Background discussion is covered here: https://lwn.net/Articles/751061/ - Implementation as an LSM, with a default stackable lockdown LSM module. This allows the lockdown feature to be policy-driven, rather than encoding an implicit policy within the mechanism. The new locked_down LSM hook is provided to allow LSMs to make a policy decision around whether kernel functionality that would allow tampering with or examining the runtime state of the kernel should be permitted. The included lockdown LSM provides an implementation with a simple policy intended for general purpose use. This policy provides a coarse level of granularity, controllable via the kernel command line: lockdown={integrity|confidentiality} Enable the kernel lockdown feature. If set to integrity, kernel features that allow userland to modify the running kernel are disabled. If set to confidentiality, kernel features that allow userland to extract confidential information from the kernel are also disabled. This may also be controlled via /sys/kernel/security/lockdown and overriden by kernel configuration. New or existing LSMs may implement finer-grained controls of the lockdown features. Refer to the lockdown_reason documentation in include/linux/security.h for details. The lockdown feature has had signficant design feedback and review across many subsystems. This code has been in linux-next for some weeks, with a few fixes applied along the way. Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf when kernel lockdown is in confidentiality mode") is missing a Signed-off-by from its author. Matthew responded that he is providing this under category (c) of the DCO" * 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits) kexec: Fix file verification on S390 security: constify some arrays in lockdown LSM lockdown: Print current->comm in restriction messages efi: Restrict efivar_ssdt_load when the kernel is locked down tracefs: Restrict tracefs when the kernel is locked down debugfs: Restrict debugfs when the kernel is locked down kexec: Allow kexec_file() with appropriate IMA policy when locked down lockdown: Lock down perf when in confidentiality mode bpf: Restrict bpf when kernel lockdown is in confidentiality mode lockdown: Lock down tracing and perf kprobes when in confidentiality mode lockdown: Lock down /proc/kcore x86/mmiotrace: Lock down the testmmiotrace module lockdown: Lock down module params that specify hardware parameters (eg. ioport) lockdown: Lock down TIOCSSERIAL lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down acpi: Disable ACPI table override if the kernel is locked down acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down ACPI: Limit access to custom_method when the kernel is locked down x86/msr: Restrict MSR access when the kernel is locked down x86: Lock down IO port access when the kernel is locked down ...
2019-09-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-1/+5
Merge updates from Andrew Morton: - a few hot fixes - ocfs2 updates - almost all of -mm (slab-generic, slab, slub, kmemleak, kasan, cleanups, debug, pagecache, memcg, gup, pagemap, memory-hotplug, sparsemem, vmalloc, initialization, z3fold, compaction, mempolicy, oom-kill, hugetlb, migration, thp, mmap, madvise, shmem, zswap, zsmalloc) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits) mm/zsmalloc.c: fix a -Wunused-function warning zswap: do not map same object twice zswap: use movable memory if zpool support allocate movable memory zpool: add malloc_support_movable to zpool_driver shmem: fix obsolete comment in shmem_getpage_gfp() mm/madvise: reduce code duplication in error handling paths mm: mmap: increase sockets maximum memory size pgoff for 32bits mm/mmap.c: refine find_vma_prev() with rb_last() riscv: make mmap allocation top-down by default mips: use generic mmap top-down layout and brk randomization mips: replace arch specific way to determine 32bit task with generic version mips: adjust brk randomization offset to fit generic version mips: use STACK_TOP when computing mmap base address mips: properly account for stack randomization and stack guard gap arm: use generic mmap top-down layout and brk randomization arm: use STACK_TOP when computing mmap base address arm: properly account for stack randomization and stack guard gap arm64, mm: make randomization selected by generic topdown mmap layout arm64, mm: move generic mmap layout functions to mm arm64: consider stack randomization for mmap base only when necessary ...
2019-09-24memcg, kmem: deprecate kmem.limit_in_bytesMichal Hocko1-1/+3
Cgroup v1 memcg controller has exposed a dedicated kmem limit to users which turned out to be really a bad idea because there are paths which cannot shrink the kernel memory usage enough to get below the limit (e.g. because the accounted memory is not reclaimable). There are cases when the failure is even not allowed (e.g. __GFP_NOFAIL). This means that the kmem limit is in excess to the hard limit without any way to shrink and thus completely useless. OOM killer cannot be invoked to handle the situation because that would lead to a premature oom killing. As a result many places might see ENOMEM returning from kmalloc and result in unexpected errors. E.g. a global OOM killer when there is a lot of free memory because ENOMEM is translated into VM_FAULT_OOM in #PF path and therefore pagefault_out_of_memory would result in OOM killer. Please note that the kernel memory is still accounted to the overall limit along with the user memory so removing the kmem specific limit should still allow to contain kernel memory consumption. Unlike the kmem one, though, it invokes memory reclaim and targeted memcg oom killing if necessary. Start the deprecation process by crying to the kernel log. Let's see whether there are relevant usecases and simply return to EINVAL in the second stage if nobody complains in few releases. [akpm@linux-foundation.org: tweak documentation text] Link: http://lkml.kernel.org/r/20190911151612.GI4023@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Thomas Lindroth <thomas.lindroth@gmail.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24mm, page_owner, debug_pagealloc: save and dump freeing stack traceVlastimil Babka1-0/+2
The debug_pagealloc functionality is useful to catch buggy page allocator users that cause e.g. use after free or double free. When page inconsistency is detected, debugging is often simpler by knowing the call stack of process that last allocated and freed the page. When page_owner is also enabled, we record the allocation stack trace, but not freeing. This patch therefore adds recording of freeing process stack trace to page owner info, if both page_owner and debug_pagealloc are configured and enabled. With only page_owner enabled, this info is not useful for the memory leak debugging use case. dump_page() is adjusted to print the info. An example result of calling __free_pages() twice may look like this (note the page last free stack trace): BUG: Bad page state in process bash pfn:13d8f8 page:ffffc31984f63e00 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x1affff800000000() raw: 01affff800000000 dead000000000100 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000 page dumped because: nonzero _refcount page_owner tracks the page as freed page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL) prep_new_page+0x143/0x150 get_page_from_freelist+0x289/0x380 __alloc_pages_nodemask+0x13c/0x2d0 khugepaged+0x6e/0xc10 kthread+0xf9/0x130 ret_from_fork+0x3a/0x50 page last free stack trace: free_pcp_prepare+0x134/0x1e0 free_unref_page+0x18/0x90 khugepaged+0x7b/0xc10 kthread+0xf9/0x130 ret_from_fork+0x3a/0x50 Modules linked in: CPU: 3 PID: 271 Comm: bash Not tainted 5.3.0-rc4-2.g07a1a73-default+ #57 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x85/0xc0 bad_page.cold+0xba/0xbf rmqueue_pcplist.isra.0+0x6c5/0x6d0 rmqueue+0x2d/0x810 get_page_from_freelist+0x191/0x380 __alloc_pages_nodemask+0x13c/0x2d0 __get_free_pages+0xd/0x30 __pud_alloc+0x2c/0x110 copy_page_range+0x4f9/0x630 dup_mmap+0x362/0x480 dup_mm+0x68/0x110 copy_process+0x19e1/0x1b40 _do_fork+0x73/0x310 __x64_sys_clone+0x75/0x80 do_syscall_64+0x6e/0x1e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f10af854a10 ... Link: http://lkml.kernel.org/r/20190820131828.22684-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-23Merge tag 'pci-v5.4-changes' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Consolidate _HPP/_HPX stuff in pci-acpi.c and simplify it (Krzysztof Wilczynski) - Fix incorrect PCIe device types and remove dev->has_secondary_link to simplify code that deals with upstream/downstream ports (Mika Westerberg) - After suspend, restore Resizable BAR size bits correctly for 1MB BARs (Sumit Saxena) - Enable PCI_MSI_IRQ_DOMAIN support for RISC-V (Wesley Terpstra) Virtualization: - Add ACS quirks for iProc PAXB (Abhinav Ratna), Amazon Annapurna Labs (Ali Saidi) - Move sysfs SR-IOV functions to iov.c (Kelsey Skunberg) - Remove group write permissions from sysfs sriov_numvfs, sriov_drivers_autoprobe (Kelsey Skunberg) Hotplug: - Simplify pciehp indicator control (Denis Efremov) Peer-to-peer DMA: - Allow P2P DMA between root ports for whitelisted bridges (Logan Gunthorpe) - Whitelist some Intel host bridges for P2P DMA (Logan Gunthorpe) - DMA map P2P DMA requests that traverse host bridge (Logan Gunthorpe) Amazon Annapurna Labs host bridge driver: - Add DT binding and controller driver (Jonathan Chocron) Hyper-V host bridge driver: - Fix hv_pci_dev->pci_slot use-after-free (Dexuan Cui) - Fix PCI domain number collisions (Haiyang Zhang) - Use instance ID bytes 4 & 5 as PCI domain numbers (Haiyang Zhang) - Fix build errors on non-SYSFS config (Randy Dunlap) i.MX6 host bridge driver: - Limit DBI register length (Stefan Agner) Intel VMD host bridge driver: - Fix config addressing issues (Jon Derrick) Layerscape host bridge driver: - Add bar_fixed_64bit property to endpoint driver (Xiaowei Bao) - Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC drivers separately (Xiaowei Bao) Mediatek host bridge driver: - Add MT7629 controller support (Jianjun Wang) Mobiveil host bridge driver: - Fix CPU base address setup (Hou Zhiqiang) - Make "num-lanes" property optional (Hou Zhiqiang) Tegra host bridge driver: - Fix OF node reference leak (Nishka Dasgupta) - Disable MSI for root ports to work around design problem (Vidya Sagar) - Add Tegra194 DT binding and controller support (Vidya Sagar) - Add support for sideband pins and slot regulators (Vidya Sagar) - Add PIPE2UPHY support (Vidya Sagar) Misc: - Remove unused pci_block_cfg_access() et al (Kelsey Skunberg) - Unexport pci_bus_get(), etc (Kelsey Skunberg) - Hide PM, VC, link speed, ATS, ECRC, PTM constants and interfaces in the PCI core (Kelsey Skunberg) - Clean up sysfs DEVICE_ATTR() usage (Kelsey Skunberg) - Mark expected switch fall-through (Gustavo A. R. Silva) - Propagate errors for optional regulators and PHYs (Thierry Reding) - Fix kernel command line resource_alignment parameter issues (Logan Gunthorpe)" * tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (112 commits) PCI: Add pci_irq_vector() and other stubs when !CONFIG_PCI arm64: tegra: Add PCIe slot supply information in p2972-0000 platform arm64: tegra: Add configuration for PCIe C5 sideband signals PCI: tegra: Add support to enable slot regulators PCI: tegra: Add support to configure sideband pins PCI: vmd: Fix shadow offsets to reflect spec changes PCI: vmd: Fix config addressing when using bus offsets PCI: dwc: Add validation that PCIe core is set to correct mode PCI: dwc: al: Add Amazon Annapurna Labs PCIe controller driver dt-bindings: PCI: Add Amazon's Annapurna Labs PCIe host bridge binding PCI: Add quirk to disable MSI-X support for Amazon's Annapurna Labs Root Port PCI/VPD: Prevent VPD access for Amazon's Annapurna Labs Root Port PCI: Add ACS quirk for Amazon Annapurna Labs root ports PCI: Add Amazon's Annapurna Labs vendor ID MAINTAINERS: Add PCI native host/endpoint controllers designated reviewer PCI: hv: Use bytes 4 and 5 from instance ID as the PCI domain numbers dt-bindings: PCI: tegra: Add PCIe slot supplies regulator entries dt-bindings: PCI: tegra: Add sideband pins configuration entries PCI: tegra: Add Tegra194 PCIe support PCI: Get rid of dev->has_secondary_link flag ...
2019-09-21Merge tag 'for-5.4/dm-changes' of ↵Linus Torvalds2-0/+340
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - crypto and DM crypt advances that allow the crypto API to reclaim implementation details that do not belong in DM crypt. The wrapper template for ESSIV generation that was factored out will also be used by fscrypt in the future. - Add root hash pkcs#7 signature verification to the DM verity target. - Add a new "clone" DM target that allows for efficient remote replication of a device. - Enhance DM bufio's cache to be tailored to each client based on use. Clients that make heavy use of the cache get more of it, and those that use less have reduced cache usage. - Add a new DM_GET_TARGET_VERSION ioctl to allow userspace to query the version number of a DM target (even if the associated module isn't yet loaded). - Fix invalid memory access in DM zoned target. - Fix the max_discard_sectors limit advertised by the DM raid target; it was mistakenly storing the limit in bytes rather than sectors. - Small optimizations and cleanups in DM writecache target. - Various fixes and cleanups in DM core, DM raid1 and space map portion of DM persistent data library. * tag 'for-5.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits) dm: introduce DM_GET_TARGET_VERSION dm bufio: introduce a global cache replacement dm bufio: remove old-style buffer cleanup dm bufio: introduce a global queue dm bufio: refactor adjust_total_allocated dm bufio: call adjust_total_allocated from __link_buffer and __unlink_buffer dm: add clone target dm raid: fix updating of max_discard_sectors limit dm writecache: skip writecache_wait for pmem mode dm stats: use struct_size() helper dm crypt: omit parsing of the encapsulated cipher dm crypt: switch to ESSIV crypto API template crypto: essiv - create wrapper template for ESSIV generation dm space map common: remove check for impossible sm_find_free() return value dm raid1: use struct_size() with kzalloc() dm writecache: optimize performance by sorting the blocks for writeback_all dm writecache: add unlikely for getting two block with same LBA dm writecache: remove unused member pointer in writeback_struct dm zoned: fix invalid memory access dm verity: add root hash pkcs#7 signature verification ...
2019-09-20Merge tag 'powerpc-5.4-1' of ↵Linus Torvalds1-0/+28
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "This is a bit late, partly due to me travelling, and partly due to a power outage knocking out some of my test systems *while* I was travelling. - Initial support for running on a system with an Ultravisor, which is software that runs below the hypervisor and protects guests against some attacks by the hypervisor. - Support for building the kernel to run as a "Secure Virtual Machine", ie. as a guest capable of running on a system with an Ultravisor. - Some changes to our DMA code on bare metal, to allow devices with medium sized DMA masks (> 32 && < 59 bits) to use more than 2GB of DMA space. - Support for firmware assisted crash dumps on bare metal (powernv). - Two series fixing bugs in and refactoring our PCI EEH code. - A large series refactoring our exception entry code to use gas macros, both to make it more readable and also enable some future optimisations. As well as many cleanups and other minor features & fixups. Thanks to: Adam Zerella, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual, Balbir Singh, Benjamin Herrenschmidt, Cédric Le Goater, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig, Claudio Carvalho, Daniel Axtens, David Gibson, David Hildenbrand, Desnes A. Nunes do Rosario, Ganesh Goudar, Gautham R. Shenoy, Greg Kurz, Guerney Hunt, Gustavo Romero, Halil Pasic, Hari Bathini, Joakim Tjernlund, Jonathan Neuschafer, Jordan Niethe, Leonardo Bras, Lianbo Jiang, Madhavan Srinivasan, Mahesh Salgaonkar, Mahesh Salgaonkar, Masahiro Yamada, Maxiwell S. Garcia, Michael Anderson, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Qian Cai, Ram Pai, Ravi Bangoria, Reza Arbab, Ryan Grimm, Sam Bobroff, Santosh Sivaraj, Segher Boessenkool, Sukadev Bhattiprolu, Thiago Bauermann, Thiago Jung Bauermann, Thomas Gleixner, Tom Lendacky, Vasant Hegde" * tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (264 commits) powerpc/mm/mce: Keep irqs disabled during lockless page table walk powerpc: Use ftrace_graph_ret_addr() when unwinding powerpc/ftrace: Enable HAVE_FUNCTION_GRAPH_RET_ADDR_PTR ftrace: Look up the address of return_to_handler() using helpers powerpc: dump kernel log before carrying out fadump or kdump docs: powerpc: Add missing documentation reference powerpc/xmon: Fix output of XIVE IPI powerpc/xmon: Improve output of XIVE interrupts powerpc/mm/radix: remove useless kernel messages powerpc/fadump: support holes in kernel boot memory area powerpc/fadump: remove RMA_START and RMA_END macros powerpc/fadump: update documentation about option to release opalcore powerpc/fadump: consider f/w load area powerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file powerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel powerpc/fadump: improve how crashed kernel's memory is reserved powerpc/fadump: consider reserved ranges while releasing memory powerpc/fadump: make crash memory ranges array allocation generic ...
2019-09-18Merge tag 'tty-5.4-rc1' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Even in this age, people are still making new serial port silicon, why... Anyway, here's the TTY and Serial driver update for 5.4-rc1. Lots of changes in here for a number of embedded serial port devices that are being worked on because people really like to see those console logs... Other than that, nothing major here, no core tty changes that anyone should care about. All of these have been in linux-next for a while with no reported issues" * tag 'tty-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (125 commits) serial: tegra: Add PIO mode support serial: tegra: report clk rate errors serial: tegra: add support to adjust baud rate serial: tegra: DT for Adjusted baud rates serial: tegra: add support to use 8 bytes trigger serial: tegra: set maximum num of uart ports to 8 serial: tegra: check for FIFO mode enabled status dt-binding: serial: tegra: add new chips serial: tegra: report error to upper tty layer serial: tegra: flush the RX fifo on frame error serial: tegra: avoid reg access when clk disabled serial: tegra: add support to ignore read serial: sprd: correct the wrong sequence of arguments dt-bindings: serial: Convert riscv,sifive-serial to json-schema serial: max310x: turn off transmitter before activating AutoCTS or auto transmitter flow control serial: max310x: Properly set flags in AutoCTS mode tty: serial: fix platform_no_drv_owner.cocci warnings dt-bindings: serial: Document Freescale LINFlexD UART serial: fsl_linflexuart: Update compatible string tty: n_gsm: avoid recursive locking with async port hangup ...
2019-09-17Merge tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-blockLinus Torvalds3-11/+100
Pull block updates from Jens Axboe: - Two NVMe pull requests: - ana log parse fix from Anton - nvme quirks support for Apple devices from Ben - fix missing bio completion tracing for multipath stack devices from Hannes and Mikhail - IP TOS settings for nvme rdma and tcp transports from Israel - rq_dma_dir cleanups from Israel - tracing for Get LBA Status command from Minwoo - Some nvme-tcp cleanups from Minwoo, Potnuri and Myself - Some consolidation between the fabrics transports for handling the CAP register - reset race with ns scanning fix for fabrics (move fabrics commands to a dedicated request queue with a different lifetime from the admin request queue)." - controller reset and namespace scan races fixes - nvme discovery log change uevent support - naming improvements from Keith - multiple discovery controllers reject fix from James - some regular cleanups from various people - Series fixing (and re-fixing) null_blk debug printing and nr_devices checks (André) - A few pull requests from Song, with fixes from Andy, Guoqing, Guilherme, Neil, Nigel, and Yufen. - REQ_OP_ZONE_RESET_ALL support (Chaitanya) - Bio merge handling unification (Christoph) - Pick default elevator correctly for devices with special needs (Damien) - Block stats fixes (Hou) - Timeout and support devices nbd fixes (Mike) - Series fixing races around elevator switching and device add/remove (Ming) - sed-opal cleanups (Revanth) - Per device weight support for BFQ (Fam) - Support for blk-iocost, a new model that can properly account cost of IO workloads. (Tejun) - blk-cgroup writeback fixes (Tejun) - paride queue init fixes (zhengbin) - blk_set_runtime_active() cleanup (Stanley) - Block segment mapping optimizations (Bart) - lightnvm fixes (Hans/Minwoo/YueHaibing) - Various little fixes and cleanups * tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits) null_blk: format pr_* logs with pr_fmt null_blk: match the type of parameter nr_devices null_blk: do not fail the module load with zero devices block: also check RQF_STATS in blk_mq_need_time_stamp() block: make rq sector size accessible for block stats bfq: Fix bfq linkage error raid5: use bio_end_sector in r5_next_bio raid5: remove STRIPE_OPS_REQ_PENDING md: add feature flag MD_FEATURE_RAID0_LAYOUT md/raid0: avoid RAID0 data corruption due to layout confusion. raid5: don't set STRIPE_HANDLE to stripe which is in batch list raid5: don't increment read_errors on EILSEQ return nvmet: fix a wrong error status returned in error log page nvme: send discovery log page change events to userspace nvme: add uevent variables for controller devices nvme: enable aen regardless of the presence of I/O queues nvme-fabrics: allow discovery subsystems accept a kato nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery() nvme: Remove redundant assignment of cq vector nvme: Assign subsys instance from first ctrl ...
2019-09-17Merge tag 'docs-5.4' of git://git.lwn.net/linuxLinus Torvalds21-20/+1935
Pull documentation updates from Jonathan Corbet: "It's a somewhat calmer cycle for docs this time, as the churn of the mass RST conversion is happily mostly behind us. - A new document on reproducible builds. - We finally got around to zapping the documentation for hardware support that was removed in 2004; one doesn't want to rush these things. - The usual assortment of fixes, typo corrections, etc" * tag 'docs-5.4' of git://git.lwn.net/linux: (67 commits) Documentation: kbuild: Add document about reproducible builds docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi] Documentation: Add "earlycon=sbi" to the admin guide doc:lock: remove reference to clever use of read-write lock devices.txt: improve entry for comedi (char major 98) docs: mtd: Update spi nor reference driver doc: arm64: fix grammar dtb placed in no attributes region Documentation: sysrq: don't recommend 'S' 'U' before 'B' mailmap: Update email address for Quentin Perret docs: ftrace: clarify when tracing is disabled by the trace file docs: process: fix broken link Documentation/arm/samsung-s3c24xx: Remove stray U+FEFF character to fix title Documentation/arm/sa1100/assabet: Fix 'make assabet_defconfig' command Documentation/arm/sa1100: Remove some obsolete documentation docs/zh_CN: update Chinese howto.rst for latexdocs making Documentation: virt: Fix broken reference to virt tree's index docs: Fix typo on pull requests guide kernel-doc: Allow anonymous enum Documentation: sphinx: Don't parse socket() as identifier reference Documentation: sphinx: Add missing comma to list of strings ...
2019-09-16Merge tag 'platform-drivers-x86-v5.4-1' of ↵Linus Torvalds1-0/+23
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform-drivers updates from Andy Shevchenko: - ASUS WMI driver got a couple of updates, i.e. support of FAN is fixed for recent products and the charge threshold support has been added - Two uknown key events for Dell laptops are being ignored now to avoid spamming users with harmless messages - HP ZBook 17 G5 and ASUS Zenbook UX430UNR got accelerometer support. - Intel CherryTrail platforms had a regression with wake up. Now it's fixed - Intel PMC driver got fixed in order to work nicely in Xen environment - Intel Speed Select driver provides bucket vs core count relationship. Besides that the tools has been updated for better output - The PrivacyGuard is enabled on Lenovo ThinkPad laptops - Three tablets - Trekstor Primebook C11B 2-in-1, Irbis TW90 and Chuwi Surbook Mini - got touchscreen support * tag 'platform-drivers-x86-v5.4-1' of git://git.infradead.org/linux-platform-drivers-x86: (53 commits) MAINTAINERS: Switch PDx86 subsystem status to Odd Fixes platform/x86: asus-wmi: Refactor charge threshold to use the battery hooking API platform/x86: asus-wmi: Rename CHARGE_THRESHOLD to RSOC platform/x86: asus-wmi: Reorder ASUS_WMI_CHARGE_THRESHOLD tools/power/x86/intel-speed-select: Display core count for bucket platform/x86: ISST: Allow additional TRL MSRs tools/power/x86/intel-speed-select: Fix memory leak tools/power/x86/intel-speed-select: Output success/failed for command output tools/power/x86/intel-speed-select: Output human readable CPU list tools/power/x86/intel-speed-select: Change turbo ratio output to maximum turbo frequency tools/power/x86/intel-speed-select: Switch output to MHz tools/power/x86/intel-speed-select: Simplify output for turbo-freq and base-freq tools/power/x86/intel-speed-select: Fix cpu-count output tools/power/x86/intel-speed-select: Fix help option typo tools/power/x86/intel-speed-select: Fix package typo tools/power/x86/intel-speed-select: Fix a read overflow in isst_set_tdp_level_msr() platform/x86: intel_int0002_vgpio: Use device_init_wakeup platform/x86: intel_int0002_vgpio: Fix wakeups not working on Cherry Trail platform/x86: compal-laptop: Initialize "value" in ec_read_u8() platform/x86: touchscreen_dmi: Add info for the Trekstor Primebook C11B 2-in-1 ...
2019-09-16Merge branch 'sched-core-for-linus' of ↵Linus Torvalds1-0/+34
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann, Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers. As perf and the scheduler is getting bigger and more complex, document the status quo of current responsibilities and interests, and spread the review pain^H^H^H^H fun via an increase in the Cc: linecount generated by scripts/get_maintainer.pl. :-) - Add another series of patches that brings the -rt (PREEMPT_RT) tree closer to mainline: split the monolithic CONFIG_PREEMPT dependencies into a new CONFIG_PREEMPTION category that will allow the eventual introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches to go though. - Extend the CPU cgroup controller with uclamp.min and uclamp.max to allow the finer shaping of CPU bandwidth usage. - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS). - Improve the behavior of high CPU count, high thread count applications running under cpu.cfs_quota_us constraints. - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present. - Improve CPU isolation housekeeping CPU allocation NUMA locality. - Fix deadline scheduler bandwidth calculations and logic when cpusets rebuilds the topology, or when it gets deadline-throttled while it's being offlined. - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from setscheduler() system calls without creating global serialization. Add new synchronization between cpuset topology-changing events and the deadline acceptance tests in setscheduler(), which were broken before. - Rework the active_mm state machine to be less confusing and more optimal. - Rework (simplify) the pick_next_task() slowpath. - Improve load-balancing on AMD EPYC systems. - ... and misc cleanups, smaller fixes and improvements - please see the Git log for more details. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) sched/psi: Correct overly pessimistic size calculation sched/fair: Speed-up energy-aware wake-ups sched/uclamp: Always use 'enum uclamp_id' for clamp_id values sched/uclamp: Update CPU's refcount on TG's clamp changes sched/uclamp: Use TG's clamps to restrict TASK's clamps sched/uclamp: Propagate system defaults to the root group sched/uclamp: Propagate parent clamps sched/uclamp: Extend CPU's cgroup controller sched/topology: Improve load balancing on AMD EPYC systems arch, ia64: Make NUMA select SMP sched, perf: MAINTAINERS update, add submaintainers and reviewers sched/fair: Use rq_lock/unlock in online_fair_sched_group cpufreq: schedutil: fix equation in comment sched: Rework pick_next_task() slow-path sched: Allow put_prev_task() to drop rq->lock sched/fair: Expose newidle_balance() sched: Add task_struct pointer to sched_class::set_curr_task sched: Rework CPU hotplug task selection sched/{rt,deadline}: Fix set_next_task vs pick_next_task sched: Fix kerneldoc comment for ia64_set_curr_task ...
2019-09-16Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds1-6/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "This cycle's RCU changes were: - A few more RCU flavor consolidation cleanups. - Updates to RCU's list-traversal macros improving lockdep usability. - Forward-progress improvements for no-CBs CPUs: Avoid ignoring incoming callbacks during grace-period waits. - Forward-progress improvements for no-CBs CPUs: Use ->cblist structure to take advantage of others' grace periods. - Also added a small commit that avoids needlessly inflicting scheduler-clock ticks on callback-offloaded CPUs. - Forward-progress improvements for no-CBs CPUs: Reduce contention on ->nocb_lock guarding ->cblist. - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass list to further reduce contention on ->nocb_lock guarding ->cblist. - Miscellaneous fixes. - Torture-test updates. - minor LKMM updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (86 commits) MAINTAINERS: Update from paulmck@linux.ibm.com to paulmck@kernel.org rcu: Don't include <linux/ktime.h> in rcutiny.h rcu: Allow rcu_do_batch() to dynamically adjust batch sizes rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload rcu/nocb: Reduce __call_rcu_nocb_wake() leaf rcu_node ->lock contention rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks() rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake() rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended rcu/nocb: Add bypass callback queueing rcu/nocb: Atomic ->len field in rcu_segcblist structure rcu/nocb: Unconditionally advance and wake for excessive CBs rcu/nocb: Reduce ->nocb_lock contention with separate ->nocb_gp_lock rcu/nocb: Reduce contention at no-CBs invocation-done time rcu/nocb: Reduce contention at no-CBs registry-time CB advancement rcu/nocb: Round down for number of no-CBs grace-period kthreads rcu/nocb: Avoid ->nocb_lock capture by corresponding CPU rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread rcu/nocb: Make __call_rcu_nocb_wake() safe for many callbacks ...
2019-09-16Merge tag 'please-pull-ia64_for_5.4' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull ia64 updates from Tony Luck: "The big change here is removal of support for SGI Altix" * tag 'please-pull-ia64_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: (33 commits) genirq: remove the is_affinity_mask_valid hook ia64: remove CONFIG_SWIOTLB ifdefs ia64: remove support for machvecs ia64: move the screen_info setup to common code ia64: move the ROOT_DEV setup to common code ia64: rework iommu probing ia64: remove the unused sn_coherency_id symbol ia64: remove the SGI UV simulator support ia64: remove the zx1 swiotlb machvec ia64: remove CONFIG_ACPI ifdefs ia64: remove CONFIG_PCI ifdefs ia64: remove the hpsim platform ia64: remove now unused machvec indirections ia64: remove support for the SGI SN2 platform drivers: remove the SGI SN2 IOC4 base support drivers: remove the SGI SN2 IOC3 base support qla2xxx: remove SGI SN2 support qla1280: remove SGI SN2 support misc/sgi-xp: remove SGI SN2 support char/mspec: remove SGI SN2 support ...
2019-09-16Merge tag 'arm64-upstream' of ↵Linus Torvalds1-0/+52
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "Although there isn't tonnes of code in terms of line count, there are a fair few headline features which I've noted both in the tag and also in the merge commits when I pulled everything together. The part I'm most pleased with is that we had 35 contributors this time around, which feels like a big jump from the usual small group of core arm64 arch developers. Hopefully they all enjoyed it so much that they'll continue to contribute, but we'll see. It's probably worth highlighting that we've pulled in a branch from the risc-v folks which moves our CPU topology code out to where it can be shared with others. Summary: - 52-bit virtual addressing in the kernel - New ABI to allow tagged user pointers to be dereferenced by syscalls - Early RNG seeding by the bootloader - Improve robustness of SMP boot - Fix TLB invalidation in light of recent architectural clarifications - Support for i.MX8 DDR PMU - Remove direct LSE instruction patching in favour of static keys - Function error injection using kprobes - Support for the PPTT "thread" flag introduced by ACPI 6.3 - Move PSCI idle code into proper cpuidle driver - Relaxation of implicit I/O memory barriers - Build with RELR relocations when toolchain supports them - Numerous cleanups and non-critical fixes" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (114 commits) arm64: remove __iounmap arm64: atomics: Use K constraint when toolchain appears to support it arm64: atomics: Undefine internal macros after use arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL arm64: asm: Kill 'asm/atomic_arch.h' arm64: lse: Remove unused 'alt_lse' assembly macro arm64: atomics: Remove atomic_ll_sc compilation unit arm64: avoid using hard-coded registers for LSE atomics arm64: atomics: avoid out-of-line ll/sc atomics arm64: Use correct ll/sc atomic constraints jump_label: Don't warn on __exit jump entries docs/perf: Add documentation for the i.MX8 DDR PMU perf/imx_ddr: Add support for AXI ID filtering arm64: kpti: ensure patched kernel text is fetched from PoU arm64: fix fixmap copy for 16K pages and 48-bit VA perf/smmuv3: Validate groups for global filtering perf/smmuv3: Validate group size arm64: Relax Documentation/arm64/tagged-pointers.rst arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F arm64: mm: Ignore spurious translation faults taken from the kernel ...
2019-09-16Merge branch 'sched/rt' into sched/core, to pick up -rt changesIngo Molnar3-12/+2
Pick up the first couple of patches working towards PREEMPT_RT. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-14Documentation: Add "earlycon=sbi" to the admin guidePalmer Dabbelt1-0/+4
This argument is supported on RISC-V systems and widely used, but was not documented here. Signed-off-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-09-14devices.txt: improve entry for comedi (char major 98)Ian Abbott1-1/+10
Describe how the comedi minor device numbers are split across comedi devices and comedi subdevices. Replace the current, long dead URL with an official URL for the Comedi project. Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-09-12dm: add clone targetNikos Tsironis1-0/+333
Add the dm-clone target, which allows cloning of arbitrary block devices. dm-clone produces a one-to-one copy of an existing, read-only source device into a writable destination device: It presents a virtual block device which makes all data appear immediately, and redirects reads and writes accordingly. The main use case of dm-clone is to clone a potentially remote, high-latency, read-only, archival-type block device into a writable, fast, primary-type device for fast, low-latency I/O. The cloned device is visible/mountable immediately and the copy of the source device to the destination device happens in the background, in parallel with user I/O. When the cloning completes, the dm-clone table can be removed altogether and be replaced, e.g., by a linear table, mapping directly to the destination device. For further information and examples of how to use dm-clone, please read Documentation/admin-guide/device-mapper/dm-clone.rst Suggested-by: Vangelis Koukis <vkoukis@arrikto.com> Co-developed-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-09-11Merge branches 'arm/omap', 'arm/exynos', 'arm/smmu', 'arm/mediatek', ↵Joerg Roedel1-1/+6
'arm/qcom', 'arm/renesas', 'x86/amd', 'x86/vt-d' and 'core' into next
2019-09-11iommu/vt-d: Check whether device requires bounce bufferLu Baolu1-0/+5
This adds a helper to check whether a device needs to use bounce buffer. It also provides a boot time option to disable the bounce buffer. Users can use this to prevent the iommu driver from using the bounce buffer for performance gain. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Xu Pengfei <pengfei.xu@intel.com> Tested-by: Mika Westerberg <mika.westerberg@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-09-07platform/x86: thinkpad_acpi: Add ThinkPad PrivacyGuardAlexander Schremmer1-0/+23
This feature is found optionally in T480s, T490, T490s. The feature is called lcdshadow and visible via /proc/acpi/ibm/lcdshadow. The ACPI methods \_SB.PCI0.LPCB.EC.HKEY.{GSSS,SSSS,TSSS,CSSS} are available in these machines. They get, set, toggle or change the state apparently. The patch was tested on a 5.0 series kernel on a T480s. Signed-off-by: Alexander Schremmer <alex@alexanderweb.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-09-06Documentation: sysrq: don't recommend 'S' 'U' before 'B'Adam Borowski1-11/+9
This advice is obsolete and slightly harmful for filesystems from this millenium: any modern filesystem can handle unexpected crashes without requiring fsck -- and on the other hand, trying to write to the disk when the kernel is in a bad state risks introducing corruption. For ext2, any unsafe shutdown meant widespread breakage, but it's no longer a reasonable filesystem for any non-special use. Signed-off-by: Adam Borowski <kilobyte@angband.pl> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-09-05powerpc/64s/radix: introduce options to disable use of the tlbie instructionNicholas Piggin1-0/+4
Introduce two options to control the use of the tlbie instruction. A boot time option which completely disables the kernel using the instruction, this is currently incompatible with HASH MMU, KVM, and coherent accelerators. And a debugfs option can be switched at runtime and avoids using tlbie for invalidating CPU TLBs for normal process and kernel address mappings. Coherent accelerators are still managed with tlbie, as will KVM partition scope translations. Cross-CPU TLB flushing is implemented with IPIs and tlbiel. This is a basic implementation which does not attempt to make any optimisation beyond the tlbie implementation. This is useful for performance testing among other things. For example in certain situations on large systems, using IPIs may be faster than tlbie as they can be directed rather than broadcast. Later we may also take advantage of the IPIs to do more interesting things such as trim the mm cpumask more aggressively. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190902152931.17840-7-npiggin@gmail.com
2019-09-04tty: serial: Add linflexuart driver for S32V234Stefan-gabriel Mirea1-0/+6
Introduce support for LINFlex driver, based on: - the version of Freescale LPUART driver after commit b3e3bf2ef2c7 ("Merge 4.0-rc7 into tty-next"); - commit abf1e0a98083 ("tty: serial: fsl_lpuart: lock port on console write"). In this basic version, the driver can be tested using initramfs and relies on the clocks and pin muxing set up by U-Boot. Remarks concerning the earlycon support: - LinFlexD does not allow character transmissions in the INIT mode (see section 47.4.2.1 in the reference manual[1]). Therefore, a mutual exclusion between the first linflex_setup_watermark/linflex_set_termios executions and linflex_earlycon_putchar was employed and the characters normally sent to earlycon during initialization are kept in a buffer and sent afterwards. - Empirically, character transmission is also forbidden within the last 1-2 ms before entering the INIT mode, so we use an explicit timeout (PREINIT_DELAY) between linflex_earlycon_putchar and the first call to linflex_setup_watermark. - U-Boot currently uses the UART FIFO mode, while this driver makes the transition to the buffer mode. Therefore, the earlycon putchar function matches the U-Boot behavior before initializations and the Linux behavior after. [1] https://www.nxp.com/webapp/Download?colCode=S32V234RM Signed-off-by: Stoica Cosmin-Stefan <cosmin.stoica@nxp.com> Signed-off-by: Adrian.Nitu <adrian.nitu@freescale.com> Signed-off-by: Larisa Grigore <Larisa.Grigore@nxp.com> Signed-off-by: Ana Nedelcu <B56683@freescale.com> Signed-off-by: Mihaela Martinas <Mihaela.Martinas@freescale.com> Signed-off-by: Matthew Nunez <matthew.nunez@nxp.com> [stefan-gabriel.mirea@nxp.com: Reduced for upstreaming and implemented earlycon support] Signed-off-by: Stefan-Gabriel Mirea <stefan-gabriel.mirea@nxp.com> Link: https://lore.kernel.org/r/20190809112853.15846-6-stefan-gabriel.mirea@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-03Documentation:kernel-per-CPU-kthreads.txt: Remove reference to elevator=Marcos Paulo de Souza1-5/+3
This argument was not being considered since blk-mq was set by default, so removed this documentation to avoid confusion. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> .txt file is now .rst Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-03block: elevator.c: Remove now unused elevator= argumentMarcos Paulo de Souza1-6/+0
Since the inclusion of blk-mq, elevator argument was not being considered anymore, and it's utility died long with the legacy IO path, now removed too. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Fold with doc removal patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-03sched/uclamp: Extend CPU's cgroup controllerPatrick Bellasi1-0/+34
The cgroup CPU bandwidth controller allows to assign a specified (maximum) bandwidth to the tasks of a group. However this bandwidth is defined and enforced only on a temporal base, without considering the actual frequency a CPU is running on. Thus, the amount of computation completed by a task within an allocated bandwidth can be very different depending on the actual frequency the CPU is running that task. The amount of computation can be affected also by the specific CPU a task is running on, especially when running on asymmetric capacity systems like Arm's big.LITTLE. With the availability of schedutil, the scheduler is now able to drive frequency selections based on actual task utilization. Moreover, the utilization clamping support provides a mechanism to bias the frequency selection operated by schedutil depending on constraints assigned to the tasks currently RUNNABLE on a CPU. Giving the mechanisms described above, it is now possible to extend the cpu controller to specify the minimum (or maximum) utilization which should be considered for tasks RUNNABLE on a cpu. This makes it possible to better defined the actual computational power assigned to task groups, thus improving the cgroup CPU bandwidth controller which is currently based just on time constraints. Extend the CPU controller with a couple of new attributes uclamp.{min,max} which allow to enforce utilization boosting and capping for all the tasks in a group. Specifically: - uclamp.min: defines the minimum utilization which should be considered i.e. the RUNNABLE tasks of this group will run at least at a minimum frequency which corresponds to the uclamp.min utilization - uclamp.max: defines the maximum utilization which should be considered i.e. the RUNNABLE tasks of this group will run up to a maximum frequency which corresponds to the uclamp.max utilization These attributes: a) are available only for non-root nodes, both on default and legacy hierarchies, while system wide clamps are defined by a generic interface which does not depends on cgroups. This system wide interface enforces constraints on tasks in the root node. b) enforce effective constraints at each level of the hierarchy which are a restriction of the group requests considering its parent's effective constraints. Root group effective constraints are defined by the system wide interface. This mechanism allows each (non-root) level of the hierarchy to: - request whatever clamp values it would like to get - effectively get only up to the maximum amount allowed by its parent c) have higher priority than task-specific clamps, defined via sched_setattr(), thus allowing to control and restrict task requests. Add two new attributes to the cpu controller to collect "requested" clamp values. Allow that at each non-root level of the hierarchy. Keep it simple by not caring now about "effective" values computation and propagation along the hierarchy. Update sysctl_sched_uclamp_handler() to use the newly introduced uclamp_mutex so that we serialize system default updates with cgroup relate updates. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Michal Koutny <mkoutny@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Alessio Balsini <balsini@android.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lkml.kernel.org/r/20190822132811.31294-2-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-30Merge branches 'for-next/52-bit-kva', 'for-next/cpu-topology', ↵Will Deacon1-0/+52
'for-next/error-injection', 'for-next/perf', 'for-next/psci-cpuidle', 'for-next/rng', 'for-next/smpboot', 'for-next/tbi' and 'for-next/tlbi' into for-next/core * for-next/52-bit-kva: (25 commits) Support for 52-bit virtual addressing in kernel space * for-next/cpu-topology: (9 commits) Move CPU topology parsing into core code and add support for ACPI 6.3 * for-next/error-injection: (2 commits) Support for function error injection via kprobes * for-next/perf: (8 commits) Support for i.MX8 DDR PMU and proper SMMUv3 group validation * for-next/psci-cpuidle: (7 commits) Move PSCI idle code into a new CPUidle driver * for-next/rng: (4 commits) Support for 'rng-seed' property being passed in the devicetree * for-next/smpboot: (3 commits) Reduce fragility of secondary CPU bringup in debug configurations * for-next/tbi: (10 commits) Introduce new syscall ABI with relaxed requirements for pointer tags * for-next/tlbi: (6 commits) Handle spurious page faults arising from kernel space
2019-08-30powerpc/prom_init: Add the ESM call to prom_initRam Pai1-0/+5
Make the Enter-Secure-Mode (ESM) ultravisor call to switch the VM to secure mode. Pass kernel base address and FDT address so that the Ultravisor is able to verify the integrity of the VM using information from the ESM blob. Add "svm=" command line option to turn on switching to secure mode. Signed-off-by: Ram Pai <linuxram@us.ibm.com> [ andmike: Generate an RTAS os-term hcall when the ESM ucall fails. ] Signed-off-by: Michael Anderson <andmike@linux.ibm.com> [ bauerman: Cleaned up the code a bit. ] Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-5-bauerman@linux.ibm.com
2019-08-28blkcg: add tools/cgroup/iocost_coef_gen.pyTejun Heo1-0/+3
Add a script which can be used to generate device-specific iocost linear model coefficients. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28blkcg: implement blk-iocostTejun Heo1-0/+94
This patchset implements IO cost model based work-conserving proportional controller. While io.latency provides the capability to comprehensively prioritize and protect IOs depending on the cgroups, its protection is binary - the lowest latency target cgroup which is suffering is protected at the cost of all others. In many use cases including stacking multiple workload containers in a single system, it's necessary to distribute IO capacity with better granularity. One challenge of controlling IO resources is the lack of trivially observable cost metric. The most common metrics - bandwidth and iops - can be off by orders of magnitude depending on the device type and IO pattern. However, the cost isn't a complete mystery. Given several key attributes, we can make fairly reliable predictions on how expensive a given stream of IOs would be, at least compared to other IO patterns. The function which determines the cost of a given IO is the IO cost model for the device. This controller distributes IO capacity based on the costs estimated by such model. The more accurate the cost model the better but the controller adapts based on IO completion latency and as long as the relative costs across differents IO patterns are consistent and sensible, it'll adapt to the actual performance of the device. Currently, the only implemented cost model is a simple linear one with a few sets of default parameters for different classes of device. This covers most common devices reasonably well. All the infrastructure to tune and add different cost models is already in place and a later patch will also allow using bpf progs for cost models. Please see the top comment in blk-iocost.c and documentation for more details. v2: Rebased on top of RQ_ALLOC_TIME changes and folded in Rik's fix for a divide-by-zero bug in current_hweight() triggered by zero inuse_sum. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andy Newell <newella@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28docs/perf: Add documentation for the i.MX8 DDR PMUJoakim Zhang1-0/+52
Add some documentation describing the DDR PMU residing in the Freescale i.MDX SoC and its perf driver implementation in Linux. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-25Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A few fixes for x86: - Fix a boot regression caused by the recent bootparam sanitizing change, which escaped the attention of all people who reviewed that code. - Address a boot problem on machines with broken E820 tables caused by an underflow which ended up placing the trampoline start at physical address 0. - Handle machines which do not advertise a legacy timer of any form, but need calibration of the local APIC timer gracefully by making the calibration routine independent from the tick interrupt. Marked for stable as well as there seems to be quite some new laptops rolled out which expose this. - Clear the RDRAND CPUID bit on AMD family 15h and 16h CPUs which are affected by broken firmware which does not initialize RDRAND correctly after resume. Add a command line parameter to override this for machine which either do not use suspend/resume or have a fixed BIOS. Unfortunately there is no way to detect this on boot, so the only safe decision is to turn it off by default. - Prevent RFLAGS from being clobbers in CALL_NOSPEC on 32bit which caused fast KVM instruction emulation to break. - Explain the Intel CPU model naming convention so that the repeating discussions come to an end" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386 x86/boot: Fix boot regression caused by bootparam sanitizing x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h x86/boot/compressed/64: Fix boot on machines with broken E820 table x86/apic: Handle missing global clockevent gracefully x86/cpu: Explain Intel model naming convention
2019-08-23dm verity: add root hash pkcs#7 signature verificationJaskaran Khurana1-0/+7
The verification is to support cases where the root hash is not secured by Trusted Boot, UEFI Secureboot or similar technologies. One of the use cases for this is for dm-verity volumes mounted after boot, the root hash provided during the creation of the dm-verity volume has to be secure and thus in-kernel validation implemented here will be used before we trust the root hash and allow the block device to be created. The signature being provided for verification must verify the root hash and must be trusted by the builtin keyring for verification to succeed. The hash is added as a key of type "user" and the description is passed to the kernel so it can look it up and use it for verification. Adds CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG which can be turned on if root hash verification is needed. Kernel commandline dm_verity module parameter 'require_signatures' will indicate whether to force root hash signature verification (for all dm verity volumes). Signed-off-by: Jaskaran Khurana <jaskarankhurana@linux.microsoft.com> Tested-and-Reviewed-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-23Documentation: Update Documentation for iommu.passthroughJoerg Roedel1-1/+1
This kernel parameter now takes also effect on X86. Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-08-22Merge branch 'for-mingo' of ↵Ingo Molnar1-6/+11
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU and LKMM changes from Paul E. McKenney: - A few more RCU flavor consolidation cleanups. - Miscellaneous fixes. - Updates to RCU's list-traversal macros improving lockdep usability. - Torture-test updates. - Forward-progress improvements for no-CBs CPUs: Avoid ignoring incoming callbacks during grace-period waits. - Forward-progress improvements for no-CBs CPUs: Use ->cblist structure to take advantage of others' grace periods. - Also added a small commit that avoids needlessly inflicting scheduler-clock ticks on callback-offloaded CPUs. - Forward-progress improvements for no-CBs CPUs: Reduce contention on ->nocb_lock guarding ->cblist. - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass list to further reduce contention on ->nocb_lock guarding ->cblist. - LKMM updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-22powerpc: Document xmon optionsGustavo Romero1-0/+19
Document all options currently supported by xmon debugger. Signed-off-by: Gustavo Romero <gromero@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190814205638.25322-1-gromero@linux.ibm.com
2019-08-19security: Add a static lockdown policy LSMMatthew Garrett1-0/+9
While existing LSMs can be extended to handle lockdown policy, distributions generally want to be able to apply a straightforward static policy. This patch adds a simple LSM that can be configured to reject either integrity or all lockdown queries, and can be configured at runtime (through securityfs), boot time (via a kernel parameter) or build time (via a kconfig option). Based on initial code by David Howells. Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16hTom Lendacky1-0/+7
There have been reports of RDRAND issues after resuming from suspend on some AMD family 15h and family 16h systems. This issue stems from a BIOS not performing the proper steps during resume to ensure RDRAND continues to function properly. RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND support using CPUID, including the kernel, will believe that RDRAND is not supported. Update the CPU initialization to clear the RDRAND CPUID bit for any family 15h and 16h processor that supports RDRAND. If it is known that the family 15h or family 16h system does not have an RDRAND resume issue or that the system will not be placed in suspend, the "rdrand=force" kernel parameter can be used to stop the clearing of the RDRAND CPUID bit. Additionally, update the suspend and resume path to save and restore the MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in place after resuming from suspend. Note, that clearing the RDRAND CPUID bit does not prevent a processor that normally supports the RDRAND instruction from executing it. So any code that determined the support based on family and model won't #UD. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chen Yu <yu.c.chen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org> Cc: "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.1566229943.git.thomas.lendacky@amd.com
2019-08-16ia64: remove the zx1 swiotlb machvecChristoph Hellwig1-1/+1
The aim of this machvec is to support devices with < 32-bit dma masks. But given that ia64 only has a ZONE_DMA32 and not a ZONE_DMA that isn't supported by swiotlb either. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lkml.kernel.org/r/20190813072514.23299-21-hch@lst.de Signed-off-by: Tony Luck <tony.luck@intel.com>
2019-08-13rcu/nocb: Rename rcu_nocb_leader_stride kernel boot parameterPaul E. McKenney1-6/+7
This commit changes the name of the rcu_nocb_leader_stride kernel boot parameter to rcu_nocb_gp_stride in order to account for the new distinction between callback and grace-period no-CBs kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-08docs: admin-guide: remove references to IPX and token-ringStephen Hemminger1-28/+1
Both IPX and TR have not been supported for a while now. Remove them from the /proc/sys/net documentation. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08PCI: Correct pci=resource_alignment parameter exampleAlexey Kardashevskiy1-2/+3
The "pci=resource_alignment" parameter is described as requiring an order (not a size) and the code in pci_specified_resource_alignment() expects an order. But the example wrongly shows a size. Convert the example to an order. Fixes: 8b078c603249 ("PCI: Update "pci=resource_alignment" documentation") Link: https://lore.kernel.org/r/20190606032557.107542-1-aik@ozlabs.ru Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-08-03Documentation: Add swapgs description to the Spectre v1 documentationJosh Poimboeuf1-8/+80
Add documentation to the Spectre document about the new swapgs variant of Spectre v1. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-08-01rcu: Add kernel parameter to dump trace after RCU CPU stall warningPaul E. McKenney1-0/+4
This commit adds a rcu_cpu_stall_ftrace_dump kernel boot parameter, that, when set, causes the trace buffer to be dumped after an RCU CPU stall warning is printed. This kernel boot parameter is disabled by default, maintaining compatibility with previous behavior. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-07-31docs: fs: cifs: convert to ReST and add to admin-guide bookMauro Carvalho Chehab8-0/+1209
The filenames for cifs documentation is not using the same convention as almost all Kernel documents is using. So, rename them to a more appropriate name. Then, manually convert the documentation files for CIFS to ReST. By doing a manual conversion, we can preserve the original author's style, while making it to look more like the other Kernel documents. Most of the conversion here is trivial. The most complex one was the README file (which was renamed to usage.rst). Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-07-31docs: wimax: convert to ReST and add to admin-guideMauro Carvalho Chehab4-0/+392
Manually convert wimax documentation to ReST and add theit to the Kernel doc body, inside the admin-guide. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-07-31docs: admin-guide: add auxdisplay files to it after conversion to ReSTMauro Carvalho Chehab4-0/+165
Those two files describe userspace-faced information. While part of it might fit on uAPI, it sounds to me that the admin guide is the best place for them. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-07-31Documentation: filesystems: Convert ufs.txt to reStructuredText formatShobhit Kukreti2-0/+69
This converts the plain text documentation of ufs.txt to reStructuredText format. Added to documentation build process and verified with make htmldocs Signed-off-by: Shobhit Kukreti <shobhitkukreti@gmail.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>