summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-06-03mm: allow swappiness that prefers reclaiming anon over the file workingsetJohannes Weiner1-1/+2
With the advent of fast random IO devices (SSDs, PMEM) and in-memory swap devices such as zswap, it's possible for swap to be much faster than filesystems, and for swapping to be preferable over thrashing filesystem caches. Allow setting swappiness - which defines the rough relative IO cost of cache misses between page cache and swap-backed pages - to reflect such situations by making the swap-preferred range configurable. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Link: http://lkml.kernel.org/r/20200520232525.798933-4-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03mm: memcontrol: delete unused lrucare handlingJohannes Weiner1-2/+1
Swapin faults were the last event to charge pages after they had already been put on the LRU list. Now that we charge directly on swapin, the lrucare portion of the charge code is unused. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Link: http://lkml.kernel.org/r/20200508183105.225460-19-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() APIJohannes Weiner1-8/+3
With the page->mapping requirement gone from memcg, we can charge anon and file-thp pages in one single step, right after they're allocated. This removes two out of three API calls - especially the tricky commit step that needed to happen at just the right time between when the page is "set up" and when it's "published" - somewhat vague and fluid concepts that varied by page type. All we need is a freshly allocated page and a memcg context to charge. v2: prevent double charges on pre-allocated hugepages in khugepaged [hannes@cmpxchg.org: Fix crash - *hpage could be ERR_PTR instead of NULL] Link: http://lkml.kernel.org/r/20200512215813.GA487759@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Qian Cai <cai@lca.pw> Link: http://lkml.kernel.org/r/20200508183105.225460-13-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03mm: memcontrol: switch to native NR_ANON_MAPPED counterJohannes Weiner1-1/+1
Memcg maintains a private MEMCG_RSS counter. This divergence from the generic VM accounting means unnecessary code overhead, and creates a dependency for memcg that page->mapping is set up at the time of charging, so that page types can be told apart. Convert the generic accounting sites to mod_lruvec_page_state and friends to maintain the per-cgroup vmstat counter of NR_ANON_MAPPED. We use lock_page_memcg() to stabilize page->mem_cgroup during rmap changes, the same way we do for NR_FILE_MAPPED. With the previous patch removing MEMCG_CACHE and the private NR_SHMEM counter, this patch finally eliminates the need to have page->mapping set up at charge time. However, we need to have page->mem_cgroup set up by the time rmap runs and does the accounting, so switch the commit and the rmap callbacks around. v2: fix temporary accounting bug by switching rmap<->commit (Joonsoo) Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Balbir Singh <bsingharora@gmail.com> Link: http://lkml.kernel.org/r/20200508183105.225460-11-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03mm: memcontrol: drop @compound parameter from memcg charging APIJohannes Weiner1-3/+3
The memcg charging API carries a boolean @compound parameter that tells whether the page we're dealing with is a hugepage. mem_cgroup_commit_charge() has another boolean @lrucare that indicates whether the page needs LRU locking or not while charging. The majority of callsites know those parameters at compile time, which results in a lot of naked "false, false" argument lists. This makes for cryptic code and is a breeding ground for subtle mistakes. Thankfully, the huge page state can be inferred from the page itself and doesn't need to be passed along. This is safe because charging completes before the page is published and somebody may split it. Simplify the callsites by removing @compound, and let memcg infer the state by using hpage_nr_pages() unconditionally. That function does PageTransHuge() to identify huge pages, which also helpfully asserts that nobody passes in tail pages by accident. The following patches will introduce a new charging API, best not to carry over unnecessary weight. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Balbir Singh <bsingharora@gmail.com> Link: http://lkml.kernel.org/r/20200508183105.225460-4-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03padata: add basic support for multithreaded jobsDaniel Jordan1-3/+149
Sometimes the kernel doesn't take full advantage of system memory bandwidth, leading to a single CPU spending excessive time in initialization paths where the data scales with memory size. Multithreading naturally addresses this problem. Extend padata, a framework that handles many parallel yet singlethreaded jobs, to also handle multithreaded jobs by adding support for splitting up the work evenly, specifying a minimum amount of work that's appropriate for one helper thread to do, load balancing between helpers, and coordinating them. This is inspired by work from Pavel Tatashin and Steve Sistare. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Josh Triplett <josh@joshtriplett.org> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Link: http://lkml.kernel.org/r/20200527173608.2885243-5-daniel.m.jordan@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03padata: allocate work structures for parallel jobs from a poolDaniel Jordan1-41/+77
padata allocates per-CPU, per-instance work structs for parallel jobs. A do_parallel call assigns a job to a sequence number and hashes the number to a CPU, where the job will eventually run using the corresponding work. This approach fit with how padata used to bind a job to each CPU round-robin, makes less sense after commit bfde23ce200e6 ("padata: unbind parallel jobs from specific CPUs") because a work isn't bound to a particular CPU anymore, and isn't needed at all for multithreaded jobs because they don't have sequence numbers. Replace the per-CPU works with a preallocated pool, which allows sharing them between existing padata users and the upcoming multithreaded user. The pool will also facilitate setting NUMA-aware concurrency limits with later users. The pool is sized according to the number of possible CPUs. With this limit, MAX_OBJ_NUM no longer makes sense, so remove it. If the global pool is exhausted, a parallel job is run in the current task instead to throttle a system trying to do too much in parallel. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Josh Triplett <josh@joshtriplett.org> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Link: http://lkml.kernel.org/r/20200527173608.2885243-4-daniel.m.jordan@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03padata: initialize earlierDaniel Jordan1-9/+8
padata will soon initialize the system's struct pages in parallel, so it needs to be ready by page_alloc_init_late(). The error return from padata_driver_init() triggers an initcall warning, so add a warning to padata_init() to avoid silent failure. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Josh Triplett <josh@joshtriplett.org> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Link: http://lkml.kernel.org/r/20200527173608.2885243-3-daniel.m.jordan@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03padata: remove exit routineDaniel Jordan1-6/+0
Patch series "padata: parallelize deferred page init", v3. Deferred struct page init is a bottleneck in kernel boot--the biggest for us and probably others. Optimizing it maximizes availability for large-memory systems and allows spinning up short-lived VMs as needed without having to leave them running. It also benefits bare metal machines hosting VMs that are sensitive to downtime. In projects such as VMM Fast Restart[1], where guest state is preserved across kexec reboot, it helps prevent application and network timeouts in the guests. So, multithread deferred init to take full advantage of system memory bandwidth. Extend padata, a framework that handles many parallel singlethreaded jobs, to handle multithreaded jobs as well by adding support for splitting up the work evenly, specifying a minimum amount of work that's appropriate for one helper thread to do, load balancing between helpers, and coordinating them. More documentation in patches 4 and 8. This series is the first step in a project to address other memory proportional bottlenecks in the kernel such as pmem struct page init, vfio page pinning, hugetlb fallocate, and munmap. Deferred page init doesn't require concurrency limits, resource control, or priority adjustments like these other users will because it happens during boot when the system is otherwise idle and waiting for page init to finish. This has been run on a variety of x86 systems and speeds up kernel boot by 4% to 49%, saving up to 1.6 out of 4 seconds. Patch 6 has more numbers. This patch (of 8): padata_driver_exit() is unnecessary because padata isn't built as a module and doesn't exit. padata's init routine will soon allocate memory, so getting rid of the exit function now avoids pointless code to free it. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Josh Triplett <josh@joshtriplett.org> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Link: http://lkml.kernel.org/r/20200527173608.2885243-1-daniel.m.jordan@oracle.com Link: http://lkml.kernel.org/r/20200527173608.2885243-2-daniel.m.jordan@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02Merge tag 'audit-pr-20200601' of ↵Linus Torvalds4-36/+113
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Summary of the significant patches: - Record information about binds/unbinds to the audit multicast socket. This helps identify which processes have/had access to the information in the audit stream. - Cleanup and add some additional information to the netfilter configuration events collected by audit. - Fix some of the audit error handling code so we don't leak network namespace references" * tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: add subj creds to NETFILTER_CFG record to audit: Replace zero-length array with flexible-array audit: make symbol 'audit_nfcfgs' static netfilter: add audit table unregister actions audit: tidy and extend netfilter_cfg x_tables audit: log audit netlink multicast bind and unbind audit: fix a net reference leak in audit_list_rules_send() audit: fix a net reference leak in audit_send_reply()
2020-06-02Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-blockLinus Torvalds1-2/+2
Pull block updates from Jens Axboe: "Core block changes that have been queued up for this release: - Remove dead blk-throttle and blk-wbt code (Guoqing) - Include pid in blktrace note traces (Jan) - Don't spew I/O errors on wouldblock termination (me) - Zone append addition (Johannes, Keith, Damien) - IO accounting improvements (Konstantin, Christoph) - blk-mq hardware map update improvements (Ming) - Scheduler dispatch improvement (Salman) - Inline block encryption support (Satya) - Request map fixes and improvements (Weiping) - blk-iocost tweaks (Tejun) - Fix for timeout failing with error injection (Keith) - Queue re-run fixes (Douglas) - CPU hotplug improvements (Christoph) - Queue entry/exit improvements (Christoph) - Move DMA drain handling to the few drivers that use it (Christoph) - Partition handling cleanups (Christoph)" * tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits) block: mark bio_wouldblock_error() bio with BIO_QUIET blk-wbt: rename __wbt_update_limits to wbt_update_limits blk-wbt: remove wbt_update_limits blk-throttle: remove tg_drain_bios blk-throttle: remove blk_throtl_drain null_blk: force complete for timeout request blk-mq: drain I/O when all CPUs in a hctx are offline blk-mq: add blk_mq_all_tag_iter blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx blk-mq: use BLK_MQ_NO_TAG in more places blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG blk-mq: move more request initialization to blk_mq_rq_ctx_init blk-mq: simplify the blk_mq_get_request calling convention blk-mq: remove the bio argument to ->prepare_request nvme: force complete cancelled requests blk-mq: blk-mq: provide forced completion method block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds block: blk-crypto-fallback: remove redundant initialization of variable err block: reduce part_stat_lock() scope block: use __this_cpu_add() instead of access by smp_processor_id() ...
2020-06-02Merge tag 'pm-5.8-rc1' of ↵Linus Torvalds5-13/+48
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These rework the system-wide PM driver flags, make runtime switching of cpuidle governors easier, improve the user space hibernation interface code, add intel-speed-select interface documentation, add more debug messages to the ACPI code handling suspend to idle, update the cpufreq core and drivers, fix a minor issue in the cpuidle core and update two cpuidle drivers, improve the PM-runtime framework, update the Intel RAPL power capping driver, update devfreq core and drivers, and clean up the cpupower utility. Specifics: - Rework the system-wide PM driver flags to make them easier to understand and use and update their documentation (Rafael Wysocki, Alan Stern). - Allow cpuidle governors to be switched at run time regardless of the kernel configuration and update the related documentation accordingly (Hanjun Guo). - Improve the resume device handling in the user space hibernarion interface code (Domenico Andreoli). - Document the intel-speed-select sysfs interface (Srinivas Pandruvada). - Make the ACPI code handing suspend to idle print more debug messages to help diagnose issues with it (Rafael Wysocki). - Fix a helper routine in the cpufreq core and correct a typo in the struct cpufreq_driver kerneldoc comment (Rafael Wysocki, Wang Wenhu). - Update cpufreq drivers: - Make the intel_pstate driver start in the passive mode by default on systems without HWP (Rafael Wysocki). - Add i.MX7ULP support to the imx-cpufreq-dt driver and add i.MX7ULP to the cpufreq-dt-platdev blacklist (Peng Fan). - Convert the qoriq cpufreq driver to a platform one, make the platform code create a suitable device object for it and add platform dependencies to it (Mian Yousaf Kaukab, Geert Uytterhoeven). - Fix wrong compatible binding in the qcom driver (Ansuel Smith). - Build the omap driver by default for ARCH_OMAP2PLUS (Anders Roxell). - Add r8a7742 SoC support to the dt cpufreq driver (Lad Prabhakar). - Update cpuidle core and drivers: - Fix three reference count leaks in error code paths in the cpuidle core (Qiushi Wu). - Convert Qualcomm SPM to a generic cpuidle driver (Stephan Gerhold). - Fix up the execution order when entering a domain idle state in the PSCI driver (Ulf Hansson). - Fix a reference counting issue related to clock management and clean up two oddities in the PM-runtime framework (Rafael Wysocki, Andy Shevchenko). - Add ElkhartLake support to the Intel RAPL power capping driver and remove an unused local MSR definition from it (Jacob Pan, Sumeet Pawnikar). - Update devfreq core and drivers: - Replace strncpy() with strscpy() in the devfreq core and use lockdep asserts instead of manual checks for a locked mutex in it (Dmitry Osipenko, Krzysztof Kozlowski). - Add a generic imx bus scaling driver and make it register an interconnect device (Leonard Crestez, Gustavo A. R. Silva). - Make the cpufreq notifier in the tegra30 driver take boosting into account and delete an unuseful error message from that driver (Dmitry Osipenko, Markus Elfring). - Remove unneeded semicolon from the cpupower code (Zou Wei)" * tag 'pm-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (51 commits) cpuidle: Fix three reference count leaks PM: runtime: Replace pm_runtime_callbacks_present() PM / devfreq: Use lockdep asserts instead of manual checks for locked mutex PM / devfreq: imx-bus: Fix inconsistent IS_ERR and PTR_ERR PM / devfreq: Replace strncpy with strscpy PM / devfreq: imx: Register interconnect device PM / devfreq: Add generic imx bus scaling driver PM / devfreq: tegra30: Delete an error message in tegra_devfreq_probe() PM / devfreq: tegra30: Make CPUFreq notifier to take into account boosting PM: hibernate: Restrict writes to the resume device PM: runtime: clk: Fix clk_pm_runtime_get() error path cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver ACPI: EC: PM: s2idle: Extend GPE dispatching debug message ACPI: PM: s2idle: Print type of wakeup debug messages powercap: RAPL: remove unused local MSR define PM: runtime: Make clear what we do when conditions are wrong in rpm_suspend() Documentation: admin-guide: pm: Document intel-speed-select PM: hibernate: Split off snapshot dev option PM: hibernate: Incorporate concurrency handling Documentation: ABI: make current_governer_ro as a candidate for removal ...
2020-06-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds8-67/+32
Merge updates from Andrew Morton: "A few little subsystems and a start of a lot of MM patches. Subsystems affected by this patch series: squashfs, ocfs2, parisc, vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup, swap, memcg, pagemap, memory-failure, vmalloc, kasan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits) kasan: move kasan_report() into report.c mm/mm_init.c: report kasan-tag information stored in page->flags ubsan: entirely disable alignment checks under UBSAN_TRAP kasan: fix clang compilation warning due to stack protector x86/mm: remove vmalloc faulting mm: remove vmalloc_sync_(un)mappings() x86/mm/32: implement arch_sync_kernel_mappings() x86/mm/64: implement arch_sync_kernel_mappings() mm/ioremap: track which page-table levels were modified mm/vmalloc: track which page-table levels were modified mm: add functions to track page directory modifications s390: use __vmalloc_node in stack_alloc powerpc: use __vmalloc_node in alloc_vm_stack arm64: use __vmalloc_node in arch_alloc_vmap_stack mm: remove vmalloc_user_node_flags mm: switch the test_vmalloc module to use __vmalloc_node mm: remove __vmalloc_node_flags_caller mm: remove both instances of __vmalloc_node_flags mm: remove the prot argument to __vmalloc_node mm: remove the pgprot argument to __vmalloc ...
2020-06-02mm: remove vmalloc_sync_(un)mappings()Joerg Roedel2-13/+0
These functions are not needed anymore because the vmalloc and ioremap mappings are now synchronized when they are created or torn down. Remove all callers and function definitions. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200515140023.25469-7-joro@8bytes.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove vmalloc_user_node_flagsChristoph Hellwig1-10/+14
Open code it in __bpf_map_area_alloc, which is the only caller. Also clean up __bpf_map_area_alloc to have a single vmalloc call with slightly different flags instead of the current two different calls. For this to compile for the nommu case add a __vmalloc_node_range stub to nommu.c. [akpm@linux-foundation.org: fix nommu.c build] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200414131348.444715-27-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove __vmalloc_node_flags_callerChristoph Hellwig1-3/+2
Just use __vmalloc_node instead which gets and extra argument. To be able to to use __vmalloc_node in all caller make it available outside of vmalloc and implement it in nommu.c. [akpm@linux-foundation.org: fix nommu build] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200414131348.444715-25-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove the pgprot argument to __vmallocChristoph Hellwig3-6/+5
The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv] Acked-by: Gao Xiang <xiang@kernel.org> [erofs] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Wei Liu <wei.liu@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02dma-mapping: use vmap insted of reimplementing itChristoph Hellwig1-36/+12
Replace the open coded instance of vmap with the actual function. In the non-contiguous (IOMMU) case this requires an extra find_vm_area, but given that this isn't a fast path function that is a small price to pay. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-6-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLENeilBrown1-1/+1
PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a daemon needs to write to one bdi (the final bdi) in order to free up writes queued to another bdi (the client bdi). The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty pages, so that it can still dirty pages after other processses have been throttled. The purpose of this is to avoid deadlock that happen when the PF_LESS_THROTTLE process must write for any dirty pages to be freed, but it is being thottled and cannot write. This approach was designed when all threads were blocked equally, independently on which device they were writing to, or how fast it was. Since that time the writeback algorithm has changed substantially with different threads getting different allowances based on non-trivial heuristics. This means the simple "add 25%" heuristic is no longer reliable. The important issue is not that the daemon needs a *larger* dirty page allowance, but that it needs a *private* dirty page allowance, so that dirty pages for the "client" bdi that it is helping to clear (the bdi for an NFS filesystem or loop block device etc) do not affect the throttling of the daemon writing to the "final" bdi. This patch changes the heuristic so that the task is not throttled when the bdi it is writing to has a dirty page count below below (or equal to) the free-run threshold for that bdi. This ensures it will always be able to have some pages in flight, and so will not deadlock. In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might still be throttled by global threshold, but that is acceptable as it is only the deadlock state that is interesting for this flag. This approach of "only throttle when target bdi is busy" is consistent with the other use of PF_LESS_THROTTLE in current_may_throttle(), were it causes attention to be focussed only on the target bdi. So this patch - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE, - removes the 25% bonus that that flag gives, and - If PF_LOCAL_THROTTLE is set, don't delay at all unless the global and the local free-run thresholds are exceeded. Note that previously realtime threads were treated the same as PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour for real-time threads, so it is now different from the behaviour of nfsd and loop tasks. I don't know what is wanted for realtime. [akpm@linux-foundation.org: coding style fixes] Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Chuck Lever <chuck.lever@oracle.com> [nfsd] Cc: Christoph Hellwig <hch@lst.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-01Merge branch 'work.set_fs-exec' of ↵Linus Torvalds1-53/+53
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/coredump updates from Al Viro: "set_fs() removal in coredump-related area - mostly Christoph's stuff..." * 'work.set_fs-exec' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: binfmt_elf_fdpic: remove the set_fs(KERNEL_DS) in elf_fdpic_core_dump binfmt_elf: remove the set_fs(KERNEL_DS) in elf_core_dump binfmt_elf: remove the set_fs in fill_siginfo_note signal: refactor copy_siginfo_to_user32 powerpc/spufs: simplify spufs core dumping powerpc/spufs: stop using access_ok powerpc/spufs: fix copy_to_user while atomic
2020-06-01Merge branch 'uaccess.__put_user' of ↵Linus Torvalds1-16/+17
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/__put-user updates from Al Viro: "Removal of __put_user() calls - misc patches that don't fit into any other series" * 'uaccess.__put_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: pcm_native: result of put_user() needs to be checked scsi_ioctl.c: switch SCSI_IOCTL_GET_IDLUN to copy_to_user() compat sysinfo(2): don't bother with field-by-field copyout
2020-06-01Merge branch 'uaccess.readdir' of ↵Linus Torvalds2-12/+12
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/readdir updates from Al Viro: "Finishing the conversion of readdir.c to unsafe_... API. This includes the uaccess_{read,write}_begin series by Christophe Leroy" * 'uaccess.readdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: readdir.c: get rid of the last __put_user(), drop now-useless access_ok() readdir.c: get compat_filldir() more or less in sync with filldir() switch readdir(2) to unsafe_copy_dirent_name() drm/i915/gem: Replace user_access_begin by user_write_access_begin uaccess: Selectively open read or write user access uaccess: Add user_read_access_begin/end and user_write_access_begin/end
2020-06-01Merge tag 'docs-5.8' of git://git.lwn.net/linuxLinus Torvalds2-1/+4
Pull documentation updates from Jonathan Corbet: "A fair amount of stuff this time around, dominated by yet another massive set from Mauro toward the completion of the RST conversion. I *really* hope we are getting close to the end of this. Meanwhile, those patches reach pretty far afield to update document references around the tree; there should be no actual code changes there. There will be, alas, more of the usual trivial merge conflicts. Beyond that we have more translations, improvements to the sphinx scripting, a number of additions to the sysctl documentation, and lots of fixes" * tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits) Documentation: fixes to the maintainer-entry-profile template zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst tracing: Fix events.rst section numbering docs: acpi: fix old http link and improve document format docs: filesystems: add info about efivars content Documentation: LSM: Correct the basic LSM description mailmap: change email for Ricardo Ribalda docs: sysctl/kernel: document unaligned controls Documentation: admin-guide: update bug-hunting.rst docs: sysctl/kernel: document ngroups_max nvdimm: fixes to maintainter-entry-profile Documentation/features: Correct RISC-V kprobes support entry Documentation/features: Refresh the arch support status files Revert "docs: sysctl/kernel: document ngroups_max" docs: move locking-specific documents to locking/ docs: move digsig docs to the security book docs: move the kref doc into the core-api book docs: add IRQ documentation at the core-api book docs: debugging-via-ohci1394.txt: add it to the core-api book docs: fix references for ipmi.rst file ...
2020-06-01Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds1-2/+7
Pull ARM updates from Russell King: - remove a now unnecessary usage of the KERNEL_DS for sys_oabi_epoll_ctl() - update my email address in a number of drivers - decompressor EFI updates from Ard Biesheuvel - module unwind section handling updates - sparsemem Kconfig cleanups - make act_mm macro respect THREAD_SIZE * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8980/1: Allow either FLATMEM or SPARSEMEM on the multiplatform build ARM: 8979/1: Remove redundant ARCH_SPARSEMEM_DEFAULT setting ARM: 8978/1: mm: make act_mm() respect THREAD_SIZE ARM: decompressor: run decompressor in place if loaded via UEFI ARM: decompressor: move GOT into .data for EFI enabled builds ARM: decompressor: defer loading of the contents of the LC0 structure ARM: decompressor: split off _edata and stack base into separate object ARM: decompressor: move headroom variable out of LC0 ARM: 8976/1: module: allow arch overrides for .init section names ARM: 8975/1: module: fix handling of unwind init sections ARM: 8974/1: use SPARSMEM_STATIC when SPARSEMEM is enabled ARM: 8971/1: replace the sole use of a symbol with its definition ARM: 8969/1: decompressor: simplify libfdt builds Update rmk's email address in various drivers ARM: compat: remove KERNEL_DS usage in sys_oabi_epoll_ctl()
2020-06-01Merge tag 'arm64-upstream' of ↵Linus Torvalds4-0/+119
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "A sizeable pile of arm64 updates for 5.8. Summary below, but the big two features are support for Branch Target Identification and Clang's Shadow Call stack. The latter is currently arm64-only, but the high-level parts are all in core code so it could easily be adopted by other architectures pending toolchain support Branch Target Identification (BTI): - Support for ARMv8.5-BTI in both user- and kernel-space. This allows branch targets to limit the types of branch from which they can be called and additionally prevents branching to arbitrary code, although kernel support requires a very recent toolchain. - Function annotation via SYM_FUNC_START() so that assembly functions are wrapped with the relevant "landing pad" instructions. - BPF and vDSO updates to use the new instructions. - Addition of a new HWCAP and exposure of BTI capability to userspace via ID register emulation, along with ELF loader support for the BTI feature in .note.gnu.property. - Non-critical fixes to CFI unwind annotations in the sigreturn trampoline. Shadow Call Stack (SCS): - Support for Clang's Shadow Call Stack feature, which reserves platform register x18 to point at a separate stack for each task that holds only return addresses. This protects function return control flow from buffer overruns on the main stack. - Save/restore of x18 across problematic boundaries (user-mode, hypervisor, EFI, suspend, etc). - Core support for SCS, should other architectures want to use it too. - SCS overflow checking on context-switch as part of the existing stack limit check if CONFIG_SCHED_STACK_END_CHECK=y. CPU feature detection: - Removed numerous "SANITY CHECK" errors when running on a system with mismatched AArch32 support at EL1. This is primarily a concern for KVM, which disabled support for 32-bit guests on such a system. - Addition of new ID registers and fields as the architecture has been extended. Perf and PMU drivers: - Minor fixes and cleanups to system PMU drivers. Hardware errata: - Unify KVM workarounds for VHE and nVHE configurations. - Sort vendor errata entries in Kconfig. Secure Monitor Call Calling Convention (SMCCC): - Update to the latest specification from Arm (v1.2). - Allow PSCI code to query the SMCCC version. Software Delegated Exception Interface (SDEI): - Unexport a bunch of unused symbols. - Minor fixes to handling of firmware data. Pointer authentication: - Add support for dumping the kernel PAC mask in vmcoreinfo so that the stack can be unwound by tools such as kdump. - Simplification of key initialisation during CPU bringup. BPF backend: - Improve immediate generation for logical and add/sub instructions. vDSO: - Minor fixes to the linker flags for consistency with other architectures and support for LLVM's unwinder. - Clean up logic to initialise and map the vDSO into userspace. ACPI: - Work around for an ambiguity in the IORT specification relating to the "num_ids" field. - Support _DMA method for all named components rather than only PCIe root complexes. - Minor other IORT-related fixes. Miscellaneous: - Initialise debug traps early for KGDB and fix KDB cacheflushing deadlock. - Minor tweaks to early boot state (documentation update, set TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections). - Refactoring and cleanup" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits) KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h KVM: arm64: Check advertised Stage-2 page size capability arm64/cpufeature: Add get_arm64_ftr_reg_nowarn() ACPI/IORT: Remove the unused __get_pci_rid() arm64/cpuinfo: Add ID_MMFR4_EL1 into the cpuinfo_arm64 context arm64/cpufeature: Add remaining feature bits in ID_AA64PFR1 register arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register arm64/cpufeature: Add remaining feature bits in ID_AA64ISAR0 register arm64/cpufeature: Add remaining feature bits in ID_MMFR4 register arm64/cpufeature: Add remaining feature bits in ID_PFR0 register arm64/cpufeature: Introduce ID_MMFR5 CPU register arm64/cpufeature: Introduce ID_DFR1 CPU register arm64/cpufeature: Introduce ID_PFR2 CPU register arm64/cpufeature: Make doublelock a signed feature in ID_AA64DFR0 arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register arm64/cpufeature: Add explicit ftr_id_isar0[] for ID_ISAR0 register arm64: mm: Add asid_gen_match() helper firmware: smccc: Fix missing prototype warning for arm_smccc_version_init arm64: vdso: Fix CFI directives in sigreturn trampoline arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction ...
2020-06-01Merge tag 'x86-cleanups-2020-06-01' of ↵Linus Torvalds1-6/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups, with an emphasis on removing obsolete/dead code" * tag 'x86-cleanups-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/spinlock: Remove obsolete ticket spinlock macros and types x86/mm: Drop deprecated DISCONTIGMEM support for 32-bit x86/apb_timer: Drop unused declaration and macro x86/apb_timer: Drop unused TSC calibration x86/io_apic: Remove unused function mp_init_irq_at_boot() x86/mm: Stop printing BRK addresses x86/audit: Fix a -Wmissing-prototypes warning for ia32_classify_syscall() x86/nmi: Remove edac.h include leftover mm: Remove MPX leftovers x86/mm/mmap: Fix -Wmissing-prototypes warnings x86/early_printk: Remove unused includes crash_dump: Remove no longer used saved_max_pfn x86/smpboot: Remove the last ICPU() macro
2020-06-01Merge tag 'smp-core-2020-06-01' of ↵Linus Torvalds2-11/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP updates from Ingo Molnar: "Misc cleanups in the SMP hotplug and cross-call code" * tag 'smp-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Remove __freeze_secondary_cpus() cpu/hotplug: Remove disable_nonboot_cpus() cpu/hotplug: Fix a typo in comment "broadacasted"->"broadcasted" smp: Use smp_call_func_t in on_each_cpu()
2020-06-01Merge tag 'perf-core-2020-06-01' of ↵Linus Torvalds4-15/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - Add AMD Fam17h RAPL support - Introduce CAP_PERFMON to kernel and user space - Add Zhaoxin CPU support - Misc fixes and cleanups Tooling changes: - perf record: Introduce '--switch-output-event' to use arbitrary events to be setup and read from a side band thread and, when they take place a signal be sent to the main 'perf record' thread, reusing the core for '--switch-output' to take perf.data snapshots from the ring buffer used for '--overwrite', e.g.: # perf record --overwrite -e sched:* \ --switch-output-event syscalls:*connect* \ workload will take perf.data.YYYYMMDDHHMMSS snapshots up to around the connect syscalls. Add '--num-synthesize-threads' option to control degree of parallelism of the synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be time consuming. This mimics pre-existing behaviour in 'perf top'. - perf bench: Add a multi-threaded synthesize benchmark and kallsyms parsing benchmark. - Intel PT support: Stitch LBR records from multiple samples to get deeper backtraces, there are caveats, see the csets for details. Allow using Intel PT to synthesize callchains for regular events. Add support for synthesizing branch stacks for regular events (cycles, instructions, etc) from Intel PT data. Misc changes: - Updated perf vendor events for power9 and Coresight. - Add flamegraph.py script via 'perf flamegraph' - Misc other changes, fixes and cleanups - see the Git log for details Also, since over the last couple of years perf tooling has matured and decoupled from the kernel perf changes to a large degree, going forward Arnaldo is going to send perf tooling changes via direct pull requests" * tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (163 commits) perf/x86/rapl: Add AMD Fam17h RAPL support perf/x86/rapl: Make perf_probe_msr() more robust and flexible perf/x86/rapl: Flip logic on default events visibility perf/x86/rapl: Refactor to share the RAPL code between Intel and AMD CPUs perf/x86/rapl: Move RAPL support to common x86 code perf/core: Replace zero-length array with flexible-array perf/x86: Replace zero-length array with flexible-array perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont perf/x86/rapl: Add Ice Lake RAPL support perf flamegraph: Use /bin/bash for report and record scripts perf cs-etm: Move definition of 'traceid_list' global variable from header file libsymbols kallsyms: Move hex2u64 out of header libsymbols kallsyms: Parse using io api perf bench: Add kallsyms parsing perf: cs-etm: Update to build with latest opencsd version. perf symbol: Fix kernel symbol address display perf inject: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf annotate: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf trace: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf script: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() ...
2020-06-01Merge tag 'locking-core-2020-06-01' of ↵Linus Torvalds2-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The biggest change to core locking facilities in this cycle is the introduction of local_lock_t - this primitive comes from the -rt project and identifies CPU-local locking dependencies normally handled opaquely beind preempt_disable() or local_irq_save/disable() critical sections. The generated code on mainline kernels doesn't change as a result, but still there are benefits: improved debugging and better documentation of data structure accesses. The new local_lock_t primitives are introduced and then utilized in a couple of kernel subsystems. No change in functionality is intended. There's also other smaller changes and cleanups" * tag 'locking-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: zram: Use local lock to protect per-CPU data zram: Allocate struct zcomp_strm as per-CPU memory connector/cn_proc: Protect send_msg() with a local lock squashfs: Make use of local lock in multi_cpu decompressor mm/swap: Use local_lock for protection radix-tree: Use local_lock for protection locking: Introduce local_lock() locking/lockdep: Replace zero-length array with flexible-array locking/rtmutex: Remove unused rt_mutex_cmpxchg_relaxed()
2020-06-01Merge tag 'core-rcu-2020-06-01' of ↵Linus Torvalds20-638/+1995
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The RCU updates for this cycle were: - RCU-tasks update, including addition of RCU Tasks Trace for BPF use and TASKS_RUDE_RCU - kfree_rcu() updates. - Remove scheduler locking restriction - RCU CPU stall warning updates. - Torture-test updates. - Miscellaneous fixes and other updates" * tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) rcu: Allow for smp_call_function() running callbacks from idle rcu: Provide rcu_irq_exit_check_preempt() rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter() rcu: Provide __rcu_is_watching() rcu: Provide rcu_irq_exit_preempt() rcu: Make RCU IRQ enter/exit functions rely on in_nmi() rcu/tree: Mark the idle relevant functions noinstr x86: Replace ist_enter() with nmi_enter() x86/mce: Send #MC singal from task work x86/entry: Get rid of ist_begin/end_non_atomic() sched,rcu,tracing: Avoid tracing before in_nmi() is correct sh/ftrace: Move arch_ftrace_nmi_{enter,exit} into nmi exception lockdep: Always inline lockdep_{off,on}() hardirq/nmi: Allow nested nmi_enter() arm64: Prepare arch_nmi_enter() for recursion printk: Disallow instrumenting print_nmi_enter() printk: Prepare for nested printk_nmi_enter() rcutorture: Convert ULONG_CMP_LT() to time_before() torture: Add a --kasan argument torture: Save a few lines by using config_override_param initially ...
2020-06-01Merge tag 'core-kprobes-2020-06-01' of ↵Linus Torvalds2-1/+94
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kprobes updates from Ingo Molnar: "Various kprobes updates, mostly centered around cleaning up the no-instrumentation logic. Instead of the current per debug facility blacklist, use the more generic .noinstr.text approach, combined with a 'noinstr' marker for functions. Also add instrumentation_begin()/end() to better manage the exact place in entry code where instrumentation may be used. And add a kprobes blacklist for modules" * tag 'core-kprobes-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes: Prevent probes in .noinstr.text section vmlinux.lds.h: Create section for protection against instrumentation samples/kprobes: Add __kprobes and NOKPROBE_SYMBOL() for handlers. kprobes: Support NOKPROBE_SYMBOL() in modules kprobes: Support __kprobes blacklist in modules kprobes: Lock kprobe_mutex while showing kprobe_blacklist
2020-06-01Merge tag 'printk-for-5.8' of ↵Linus Torvalds2-49/+97
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Benjamin Herrenschmidt solved a problem with non-matched console aliases by first checking consoles defined on the command line. It is a more conservative approach than the previous attempts. - Benjamin also made sure that the console accessible via /dev/console always has CON_CONSDEV flag. - Andy Shevchenko added the %ptT modifier for printing struct time64_t. It extends the existing %ptR handling for struct rtc_time. - Bruno Meneguele fixed /dev/kmsg error value returned by unsupported SEEK_CUR. - Tetsuo Handa removed unused pr_cont_once(). ... and a few small fixes. * tag 'printk-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: Remove pr_cont_once() printk: handle blank console arguments passed in. kernel/printk: add kmsg SEEK_CUR handling printk: Fix a typo in comment "interator"->"iterator" usb: pulse8-cec: Switch to use %ptT ARM: bcm2835: Switch to use %ptT lib/vsprintf: Print time64_t in human readable format lib/vsprintf: update comment about simple_strto<foo>() functions printk: Correctly set CON_CONSDEV even when preferred console was not registered printk: Fix preferred console selection with multiple matches printk: Move console matching logic into a separate function printk: Convert a use of sprintf to snprintf in console_unlock
2020-06-01Merge tag 'pstore-v5.8-rc1' of ↵Linus Torvalds2-7/+31
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: "Fixes and new features for pstore. This is a pretty big set of changes (relative to past pstore pulls), but it has been in -next for a while. The biggest change here is the ability to support a block device as a pstore backend, which has been desired for a while. A lot of additional fixes and refactorings are also included, mostly in support of the new features. - refactor pstore locking for safer module unloading (Kees Cook) - remove orphaned records from pstorefs when backend unloaded (Kees Cook) - refactor dump_oops parameter into max_reason (Pavel Tatashin) - introduce pstore/zone for common code for contiguous storage (WeiXiong Liao) - introduce pstore/blk for block device backend (WeiXiong Liao) - introduce mtd backend (WeiXiong Liao)" * tag 'pstore-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (35 commits) mtd: Support kmsg dumper based on pstore/blk pstore/blk: Introduce "best_effort" mode pstore/blk: Support non-block storage devices pstore/blk: Provide way to query pstore configuration pstore/zone: Provide way to skip "broken" zone for MTD devices Documentation: Add details for pstore/blk pstore/zone,blk: Add ftrace frontend support pstore/zone,blk: Add console frontend support pstore/zone,blk: Add support for pmsg frontend pstore/blk: Introduce backend for block devices pstore/zone: Introduce common layer to manage storage zones ramoops: Add "max-reason" optional field to ramoops DT node pstore/ram: Introduce max_reason and convert dump_oops pstore/platform: Pass max_reason to kmesg dump printk: Introduce kmsg_dump_reason_str() printk: honor the max_reason field in kmsg_dumper printk: Collapse shutdown types into a single dump reason pstore/ftrace: Provide ftrace log merging routine pstore/ram: Refactor ftrace buffer merging pstore/ram: Refactor DT size parsing ...
2020-06-01Merge branch 'linus' of ↵Linus Torvalds2-15/+17
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Introduce crypto_shash_tfm_digest() and use it wherever possible. - Fix use-after-free and race in crypto_spawn_alg. - Add support for parallel and batch requests to crypto_engine. Algorithms: - Update jitter RNG for SP800-90B compliance. - Always use jitter RNG as seed in drbg. Drivers: - Add Arm CryptoCell driver cctrng. - Add support for SEV-ES to the PSP driver in ccp" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (114 commits) crypto: hisilicon - fix driver compatibility issue with different versions of devices crypto: engine - do not requeue in case of fatal error crypto: cavium/nitrox - Fix a typo in a comment crypto: hisilicon/qm - change debugfs file name from qm_regs to regs crypto: hisilicon/qm - add DebugFS for xQC and xQE dump crypto: hisilicon/zip - add debugfs for Hisilicon ZIP crypto: hisilicon/hpre - add debugfs for Hisilicon HPRE crypto: hisilicon/sec2 - add debugfs for Hisilicon SEC crypto: hisilicon/qm - add debugfs to the QM state machine crypto: hisilicon/qm - add debugfs for QM crypto: stm32/crc32 - protect from concurrent accesses crypto: stm32/crc32 - don't sleep in runtime pm crypto: stm32/crc32 - fix multi-instance crypto: stm32/crc32 - fix run-time self test issue. crypto: stm32/crc32 - fix ext4 chksum BUG_ON() crypto: hisilicon/zip - Use temporary sqe when doing work crypto: hisilicon - add device error report through abnormal irq crypto: hisilicon - remove codes of directly report device errors through MSI crypto: hisilicon - QM memory management optimization crypto: hisilicon - unify initial value assignment into QM ...
2020-06-01Merge branches 'pm-core' and 'pm-sleep'Rafael J. Wysocki5-13/+48
* pm-core: PM: runtime: Replace pm_runtime_callbacks_present() PM: runtime: clk: Fix clk_pm_runtime_get() error path PM: runtime: Make clear what we do when conditions are wrong in rpm_suspend() * pm-sleep: PM: hibernate: Restrict writes to the resume device PM: hibernate: Split off snapshot dev option PM: hibernate: Incorporate concurrency handling PM: sleep: Helpful edits for devices.rst documentation Documentation: PM: sleep: Update driver flags documentation PM: sleep: core: Rename DPM_FLAG_LEAVE_SUSPENDED PM: sleep: core: Rename DPM_FLAG_NEVER_SKIP PM: sleep: core: Rename dev_pm_smart_suspend_and_suspended() PM: sleep: core: Rename dev_pm_may_skip_resume() PM: sleep: core: Rework the power.may_skip_resume handling PM: sleep: core: Do not skip callbacks in the resume phase PM: sleep: core: Fold functions into their callers PM: sleep: core: Simplify the SMART_SUSPEND flag handling
2020-06-01Merge branch 'WIP.core/rcu' into core/rcu, to pick up two x86/entry dependenciesIngo Molnar1-20/+80
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-06-01Merge branch 'for-5.8' into for-linusPetr Mladek196-4926/+11824
2020-06-01Merge branch 'for-5.7-preferred-console' into for-linusPetr Mladek2-44/+79
2020-05-31Merge tag 'sched-urgent-2020-05-31' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "A single scheduler fix preventing a crash in NUMA balancing. The current->mm check is not reliable as the mm might be temporary due to use_mm() in a kthread. Check for PF_KTHREAD explictly" * tag 'sched-urgent-2020-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Don't NUMA balance for kthreads
2020-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds1-18/+16
Pull networking fixes from David Miller: "Another week, another set of bug fixes: 1) Fix pskb_pull length in __xfrm_transport_prep(), from Xin Long. 2) Fix double xfrm_state put in esp{4,6}_gro_receive(), also from Xin Long. 3) Re-arm discovery timer properly in mac80211 mesh code, from Linus Lüssing. 4) Prevent buffer overflows in nf_conntrack_pptp debug code, from Pablo Neira Ayuso. 5) Fix race in ktls code between tls_sw_recvmsg() and tls_decrypt_done(), from Vinay Kumar Yadav. 6) Fix crashes on TCP fallback in MPTCP code, from Paolo Abeni. 7) More validation is necessary of untrusted GSO packets coming from virtualization devices, from Willem de Bruijn. 8) Fix endianness of bnxt_en firmware message length accesses, from Edwin Peer. 9) Fix infinite loop in sch_fq_pie, from Davide Caratti. 10) Fix lockdep splat in DSA by setting lockless TX in netdev features for slave ports, from Vladimir Oltean. 11) Fix suspend/resume crashes in mlx5, from Mark Bloch. 12) Fix use after free in bpf fmod_ret, from Alexei Starovoitov. 13) ARP retransmit timer guard uses wrong offset, from Hongbin Liu. 14) Fix leak in inetdev_init(), from Yang Yingliang. 15) Don't try to use inet hash and unhash in l2tp code, results in crashes. From Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) l2tp: add sk_family checks to l2tp_validate_socket l2tp: do not use inet_hash()/inet_unhash() net: qrtr: Allocate workqueue before kernel_bind mptcp: remove msk from the token container at destruction time. mptcp: fix race between MP_JOIN and close mptcp: fix unblocking connect() net/sched: act_ct: add nat mangle action only for NAT-conntrack devinet: fix memleak in inetdev_init() virtio_vsock: Fix race condition in virtio_transport_recv_pkt drivers/net/ibmvnic: Update VNIC protocol version reporting NFC: st21nfca: add missed kfree_skb() in an error path neigh: fix ARP retransmit timer guard bpf, selftests: Add a verifier test for assigning 32bit reg states to 64bit ones bpf, selftests: Verifier bounds tests need to be updated bpf: Fix a verifier issue when assigning 32bit reg states to 64bit ones bpf: Fix use-after-free in fmod_ret check net/mlx5e: replace EINVAL in mlx5e_flower_parse_meta() net/mlx5e: Fix MLX5_TC_CT dependencies net/mlx5e: Properly set default values when disabling adaptive moderation net/mlx5e: Fix arch depending casting issue in FEC ...
2020-05-30printk: Introduce kmsg_dump_reason_str()Kees Cook1-0/+17
The pstore subsystem already had a private version of this function. With the coming addition of the pstore/zone driver, this needs to be shared. As it really should live with printk, move it there instead. Link: https://lore.kernel.org/lkml/20200515184434.8470-4-keescook@chromium.org/ Acked-by: Petr Mladek <pmladek@suse.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-05-30printk: honor the max_reason field in kmsg_dumperPavel Tatashin1-4/+11
kmsg_dump() allows to dump kmesg buffer for various system events: oops, panic, reboot, etc. It provides an interface to register a callback call for clients, and in that callback interface there is a field "max_reason", but it was getting ignored when set to any "reason" higher than KMSG_DUMP_OOPS unless "always_kmsg_dump" was passed as kernel parameter. Allow clients to actually control their "max_reason", and keep the current behavior when "max_reason" is not set. Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/lkml/20200515184434.8470-3-keescook@chromium.org/ Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-05-30printk: Collapse shutdown types into a single dump reasonKees Cook1-3/+3
To turn the KMSG_DUMP_* reasons into a more ordered list, collapse the redundant KMSG_DUMP_(RESTART|HALT|POWEROFF) reasons into KMSG_DUMP_SHUTDOWN. The current users already don't meaningfully distinguish between them, so there's no need to, as discussed here: https://lore.kernel.org/lkml/CA+CK2bAPv5u1ih5y9t5FUnTyximtFCtDYXJCpuyjOyHNOkRdqw@mail.gmail.com/ Link: https://lore.kernel.org/lkml/20200515184434.8470-2-keescook@chromium.org/ Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Kees Cook <keescook@chromium.org>
2020-05-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-18/+16
Alexei Starovoitov says: ==================== pull-request: bpf 2020-05-29 The following pull-request contains BPF updates for your *net* tree. We've added 6 non-merge commits during the last 7 day(s) which contain a total of 4 files changed, 55 insertions(+), 34 deletions(-). The main changes are: 1) minor verifier fix for fmod_ret progs, from Alexei. 2) af_xdp overflow check, from Bjorn. 3) minor verifier fix for 32bit assignment, from John. 4) powerpc has non-overlapping addr space, from Petr. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-29bpf: Fix a verifier issue when assigning 32bit reg states to 64bit onesJohn Fastabend1-5/+5
With the latest trunk llvm (llvm 11), I hit a verifier issue for test_prog subtest test_verif_scale1. The following simplified example illustrate the issue: w9 = 0 /* R9_w=inv0 */ r8 = *(u32 *)(r1 + 80) /* __sk_buff->data_end */ r7 = *(u32 *)(r1 + 76) /* __sk_buff->data */ ...... w2 = w9 /* R2_w=inv0 */ r6 = r7 /* R6_w=pkt(id=0,off=0,r=0,imm=0) */ r6 += r2 /* R6_w=inv(id=0) */ r3 = r6 /* R3_w=inv(id=0) */ r3 += 14 /* R3_w=inv(id=0) */ if r3 > r8 goto end r5 = *(u32 *)(r6 + 0) /* R6_w=inv(id=0) */ <== error here: R6 invalid mem access 'inv' ... end: In real test_verif_scale1 code, "w9 = 0" and "w2 = w9" are in different basic blocks. In the above, after "r6 += r2", r6 becomes a scalar, which eventually caused the memory access error. The correct register state should be a pkt pointer. The inprecise register state starts at "w2 = w9". The 32bit register w9 is 0, in __reg_assign_32_into_64(), the 64bit reg->smax_value is assigned to be U32_MAX. The 64bit reg->smin_value is 0 and the 64bit register itself remains constant based on reg->var_off. In adjust_ptr_min_max_vals(), the verifier checks for a known constant, smin_val must be equal to smax_val. Since they are not equal, the verifier decides r6 is a unknown scalar, which caused later failure. The llvm10 does not have this issue as it generates different code: w9 = 0 /* R9_w=inv0 */ r8 = *(u32 *)(r1 + 80) /* __sk_buff->data_end */ r7 = *(u32 *)(r1 + 76) /* __sk_buff->data */ ...... r6 = r7 /* R6_w=pkt(id=0,off=0,r=0,imm=0) */ r6 += r9 /* R6_w=pkt(id=0,off=0,r=0,imm=0) */ r3 = r6 /* R3_w=pkt(id=0,off=0,r=0,imm=0) */ r3 += 14 /* R3_w=pkt(id=0,off=14,r=0,imm=0) */ if r3 > r8 goto end ... To fix the above issue, we can include zero in the test condition for assigning the s32_max_value and s32_min_value to their 64-bit equivalents smax_value and smin_value. Further, fix the condition to avoid doing zero extension bounds checks when s32_min_value <= 0. This could allow for the case where bounds 32-bit bounds (-1,1) get incorrectly translated to (0,1) 64-bit bounds. When in-fact the -1 min value needs to force U32_MAX bound. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/159077331983.6014.5758956193749002737.stgit@john-Precision-5820-Tower
2020-05-29bpf: Fix use-after-free in fmod_ret checkAlexei Starovoitov1-13/+11
Fix the following issue: [ 436.749342] BUG: KASAN: use-after-free in bpf_trampoline_put+0x39/0x2a0 [ 436.749995] Write of size 4 at addr ffff8881ef38b8a0 by task kworker/3:5/2243 [ 436.750712] [ 436.752677] Workqueue: events bpf_prog_free_deferred [ 436.753183] Call Trace: [ 436.756483] bpf_trampoline_put+0x39/0x2a0 [ 436.756904] bpf_prog_free_deferred+0x16d/0x3d0 [ 436.757377] process_one_work+0x94a/0x15b0 [ 436.761969] [ 436.762130] Allocated by task 2529: [ 436.763323] bpf_trampoline_lookup+0x136/0x540 [ 436.763776] bpf_check+0x2872/0xa0a8 [ 436.764144] bpf_prog_load+0xb6f/0x1350 [ 436.764539] __do_sys_bpf+0x16d7/0x3720 [ 436.765825] [ 436.765988] Freed by task 2529: [ 436.767084] kfree+0xc6/0x280 [ 436.767397] bpf_trampoline_put+0x1fd/0x2a0 [ 436.767826] bpf_check+0x6832/0xa0a8 [ 436.768197] bpf_prog_load+0xb6f/0x1350 [ 436.768594] __do_sys_bpf+0x16d7/0x3720 prog->aux->trampoline = tr should be set only when prog is valid. Otherwise prog freeing will try to put trampoline via prog->aux->trampoline, but it may not point to a valid trampoline. Fixes: 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200529043839.15824-2-alexei.starovoitov@gmail.com
2020-05-28rcu: Allow for smp_call_function() running callbacks from idlePeter Zijlstra1-6/+19
Current RCU hard relies on smp_call_function() callbacks running from interrupt context. A pending optimization is going to break that, it will allow idle CPUs to run the callbacks from the idle loop. This avoids raising the IPI on the requesting CPU and avoids handling an exception on the receiving CPU. Change rcu_is_cpu_rrupt_from_idle() to also accept task context, provided it is the idle task. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Link: https://lore.kernel.org/r/20200527171236.GC706495@hirez.programming.kicks-ass.net
2020-05-28Merge tag 'v5.7-rc7' into WIP.locking/core, to refresh the treeIngo Molnar17-126/+250
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-28Merge tag 'v5.7-rc7' into perf/core, to pick up fixesIngo Molnar26-154/+321
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-27Merge branch 'for-5.7-fixes' of ↵Linus Torvalds1-13/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Reverted stricter synchronization for cgroup recursive stats which was prepping it for event counter usage which never got merged. The change was causing performation regressions in some cases. - Restore bpf-based device-cgroup operation even when cgroup1 device cgroup is disabled. - An out-param init fix. * 'for-5.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: device_cgroup: Cleanup cgroup eBPF device filter code xattr: fix uninitialized out-param Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race window"