summaryrefslogtreecommitdiffstats
path: root/mm/slab.c
AgeCommit message (Collapse)AuthorFilesLines
2023-01-02mm, slab: periodically resched in drain_freelist()David Rientjes1-0/+2
drain_freelist() can be called with a very large number of slabs to free, such as for kmem_cache_shrink(), or depending on various settings of the slab cache when doing periodic reaping. If there is a potentially long list of slabs to drain, periodically schedule to ensure we aren't saturating the cpu for too long. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-21Merge branch 'slab/for-6.2/kmalloc_redzone' into slab/for-nextVlastimil Babka1-3/+4
kmalloc() redzone improvements by Feng Tang From cover letter [1]: kmalloc's API family is critical for mm, and one of its nature is that it will round up the request size to a fixed one (mostly power of 2). When user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes could be allocated, so there is an extra space than what is originally requested. This patchset tries to extend the redzone sanity check to the extra kmalloced buffer than requested, to better detect un-legitimate access to it. (depends on SLAB_STORE_USER & SLAB_RED_ZONE) [1] https://lore.kernel.org/all/20221021032405.1825078-1-feng.tang@intel.com/
2022-11-21Merge branch 'slab/for-6.2/fit_rcu_head' into slab/for-nextVlastimil Babka1-1/+5
A series by myself to reorder fields in struct slab to allow the embedded rcu_head to grow (for debugging purposes). Requires changes to isolate_movable_page() to skip slab pages which can otherwise become false-positive __PageMovable due to its use of low bits in page->mapping.
2022-11-21Merge branch 'slab/for-6.2/locking' into slab/for-nextVlastimil Babka1-45/+45
A patch from Jiri Kosina that makes SLAB's list_lock a raw_spinlock_t. While there are no plans to make SLAB actually compatible with PREEMPT_RT or any other future, it makes !PREEMPT_RT lockdep happy.
2022-11-21mm/migrate: make isolate_movable_page() skip slab pagesVlastimil Babka1-1/+5
In the next commit we want to rearrange struct slab fields to allow a larger rcu_head. Afterwards, the page->mapping field will overlap with SLUB's "struct list_head slab_list", where the value of prev pointer can become LIST_POISON2, which is 0x122 + POISON_POINTER_DELTA. Unfortunately the bit 1 being set can confuse PageMovable() to be a false positive and cause a GPF as reported by lkp [1]. To fix this, make isolate_movable_page() skip pages with the PageSlab flag set. This is a bit tricky as we need to add memory barriers to SLAB and SLUB's page allocation and freeing, and their counterparts to isolate_movable_page(). Based on my RFC from [2]. Added a comment update from Matthew's variant in [3] and, as done there, moved the PageSlab checks to happen before trying to take the page lock. [1] https://lore.kernel.org/all/208c1757-5edd-fd42-67d4-1940cc43b50f@intel.com/ [2] https://lore.kernel.org/all/aec59f53-0e53-1736-5932-25407125d4d4@suse.cz/ [3] https://lore.kernel.org/all/YzsVM8eToHUeTP75@casper.infradead.org/ Reported-by: kernel test robot <yujie.liu@intel.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-11-21mm/slab: move and adjust kernel-doc for kmem_cache_allocVlastimil Babka1-10/+0
Alexander reports an issue with the kmem_cache_alloc() comment in mm/slab.c: > The current comment mentioned that the flags only matters if the > cache has no available objects. It's different for the __GFP_ZERO > flag which will ensure that the returned object is always zeroed > in any case. > I have the feeling I run into this question already two times if > the user need to zero the object or not, but the user does not need > to zero the object afterwards. However another use of __GFP_ZERO > and only zero the object if the cache has no available objects would > also make no sense. and suggests thus mentioning __GFP_ZERO as the exception. But on closer inspection, the part about flags being only relevant if cache has no available objects is misleading. The slab user has no reliable way to determine if there are available objects, and e.g. the might_sleep() debug check can be performed even if objects are available, so passing correct flags given the allocation context always matters. Thus remove that sentence completely, and while at it, move the comment to from SLAB-specific mm/slab.c to the common include/linux/slab.h The comment otherwise refers flags description for kmalloc(), so add __GFP_ZERO comment there and remove a very misleading GFP_HIGHUSER (not applicable to slab) description from there. Mention kzalloc() and kmem_cache_zalloc() shortcuts. Reported-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/all/20221011145413.8025-1-aahringo@redhat.com/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-10mm/slub: only zero requested size of buffer for kzalloc when debug enabledFeng Tang1-3/+4
kzalloc/kmalloc will round up the request size to a fixed size (mostly power of 2), so the allocated memory could be more than requested. Currently kzalloc family APIs will zero all the allocated memory. To detect out-of-bound usage of the extra allocated memory, only zero the requested part, so that redzone sanity check could be added to the extra space later. For kzalloc users who will call ksize() later and utilize this extra space, please be aware that the space is not zeroed any more when debug is enabled. (Thanks to Kees Cook's effort to sanitize all ksize() user cases [1], this won't be a big issue). [1]. https://lore.kernel.org/all/20220922031013.2150682-1-keescook@chromium.org/#r Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-10-24mm/slab: Annotate kmem_cache_node->list_lock as rawJiri Kosina1-45/+45
The list_lock can be taken in hardirq context when do_drain() is being called via IPI on all cores, and therefore lockdep complains about it, because it can't be preempted on PREEMPT_RT. That's not a real issue, as SLAB can't be built on PREEMPT_RT anyway, but we still want to get rid of the warning on non-PREEMPT_RT builds. Annotate it therefore as a raw lock in order to get rid of he lockdep warning below. ============================= [ BUG: Invalid wait context ] 6.1.0-rc1-00134-ge35184f32151 #4 Not tainted ----------------------------- swapper/3/0 is trying to lock: ffff8bc88086dc18 (&parent->list_lock){..-.}-{3:3}, at: do_drain+0x57/0xb0 other info that might help us debug this: context-{2:2} no locks held by swapper/3/0. stack backtrace: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.1.0-rc1-00134-ge35184f32151 #4 Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 Call Trace: <IRQ> dump_stack_lvl+0x6b/0x9d __lock_acquire+0x1519/0x1730 ? build_sched_domains+0x4bd/0x1590 ? __lock_acquire+0xad2/0x1730 lock_acquire+0x294/0x340 ? do_drain+0x57/0xb0 ? sched_clock_tick+0x41/0x60 _raw_spin_lock+0x2c/0x40 ? do_drain+0x57/0xb0 do_drain+0x57/0xb0 __flush_smp_call_function_queue+0x138/0x220 __sysvec_call_function+0x4f/0x210 sysvec_call_function+0x4b/0x90 </IRQ> <TASK> asm_sysvec_call_function+0x16/0x20 RIP: 0010:mwait_idle+0x5e/0x80 Code: 31 d2 65 48 8b 04 25 80 ed 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 14 66 90 0f 00 2d 0b 78 46 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 06 fb 0f 1f 44 00 00 65 48 8b 04 25 80 ed 01 00 f0 80 60 02 df RSP: 0000:ffffa90940217ee0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9bb9f93a RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000001 R10: ffffa90940217ea8 R11: 0000000000000000 R12: ffffffffffffffff R13: 0000000000000000 R14: ffff8bc88127c500 R15: 0000000000000000 ? default_idle_call+0x1a/0xa0 default_idle_call+0x4b/0xa0 do_idle+0x1f1/0x2c0 ? _raw_spin_unlock_irqrestore+0x56/0x70 cpu_startup_entry+0x19/0x20 start_secondary+0x122/0x150 secondary_startup_64_no_verify+0xce/0xdb </TASK> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-10-16Merge tag 'random-6.1-rc1-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull more random number generator updates from Jason Donenfeld: "This time with some large scale treewide cleanups. The intent of this pull is to clean up the way callers fetch random integers. The current rules for doing this right are: - If you want a secure or an insecure random u64, use get_random_u64() - If you want a secure or an insecure random u32, use get_random_u32() The old function prandom_u32() has been deprecated for a while now and is just a wrapper around get_random_u32(). Same for get_random_int(). - If you want a secure or an insecure random u16, use get_random_u16() - If you want a secure or an insecure random u8, use get_random_u8() - If you want secure or insecure random bytes, use get_random_bytes(). The old function prandom_bytes() has been deprecated for a while now and has long been a wrapper around get_random_bytes() - If you want a non-uniform random u32, u16, or u8 bounded by a certain open interval maximum, use prandom_u32_max() I say "non-uniform", because it doesn't do any rejection sampling or divisions. Hence, it stays within the prandom_*() namespace, not the get_random_*() namespace. I'm currently investigating a "uniform" function for 6.2. We'll see what comes of that. By applying these rules uniformly, we get several benefits: - By using prandom_u32_max() with an upper-bound that the compiler can prove at compile-time is ≤65536 or ≤256, internally get_random_u16() or get_random_u8() is used, which wastes fewer batched random bytes, and hence has higher throughput. - By using prandom_u32_max() instead of %, when the upper-bound is not a constant, division is still avoided, because prandom_u32_max() uses a faster multiplication-based trick instead. - By using get_random_u16() or get_random_u8() in cases where the return value is intended to indeed be a u16 or a u8, we waste fewer batched random bytes, and hence have higher throughput. This series was originally done by hand while I was on an airplane without Internet. Later, Kees and I worked on retroactively figuring out what could be done with Coccinelle and what had to be done manually, and then we split things up based on that. So while this touches a lot of files, the actual amount of code that's hand fiddled is comfortably small" * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: remove unused functions treewide: use get_random_bytes() when possible treewide: use get_random_u32() when possible treewide: use get_random_{u8,u16}() when possible, part 2 treewide: use get_random_{u8,u16}() when possible, part 1 treewide: use prandom_u32_max() when possible, part 2 treewide: use prandom_u32_max() when possible, part 1
2022-10-15mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocationHyeonggon Yoo1-18/+19
After commit d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2) requests to buddy like SLUB does. SLAB has been using kmalloc caches to allocate freelist_idx_t array for off slab caches. But after the commit, freelist_size can be bigger than KMALLOC_MAX_CACHE_SIZE. Instead of using pointer to kmalloc cache, use kmalloc_node() and only check if the kmalloc cache is off slab during calculate_slab_order(). If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens as it allocates freelist_idx_t array directly from buddy. Link: https://lore.kernel.org/all/20221014205818.GA1428667@roeck-us.net/ Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Fixes: d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator") Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-10-11treewide: use get_random_u32() when possibleJason A. Donenfeld1-1/+1
The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-01mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not usingHyeonggon Yoo1-5/+3
Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc using TRACE_EVENT() macro. And then this patch does: - Do not pass pointer to struct kmem_cache to trace_kmalloc. gfp flag is enough to know if it's accounted or not. - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event. - Avoid dereferencing s->name in when not using kmem_cache_free event. - Adjust s->size to SLOB_UNITS(s->size) * SLOB_UNIT in SLOB Cc: Vasily Averin <vasily.averin@linux.dev> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-01mm/slab_common: unify NUMA and UMA version of tracepointsHyeonggon Yoo1-5/+4
Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and remove _node postfix for NUMA version of tracepoints. This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node, but at this point maintaining both kmem_alloc and kmem_alloc_node event classes does not makes sense at all. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-01mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()Hyeonggon Yoo1-35/+0
Despite its name, kmem_cache_alloc[_node]_trace() is hook for inlined kmalloc. So rename it to kmalloc[_node]_trace(). Move its implementation to slab_common.c by using __kmem_cache_alloc_node(), but keep CONFIG_TRACING=n varients to save a function call when CONFIG_TRACING=n. Use __assume_kmalloc_alignment for kmalloc[_node]_trace instead of __assume_slab_alignement. Generally kmalloc has larger alignment requirements. Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-01mm/sl[au]b: generalize kmalloc subsystemHyeonggon Yoo1-108/+0
Now everything in kmalloc subsystem can be generalized. Let's do it! Generalize __do_kmalloc_node(), __kmalloc_node_track_caller(), kfree(), __ksize(), __kmalloc(), __kmalloc_node() and move them to slab_common.c. In the meantime, rename kmalloc_large_node_notrace() to __kmalloc_large_node() and make it static as it's now only called in slab_common.c. [ feng.tang@intel.com: adjust kfence skip list to include __kmem_cache_free so that kfence kunit tests do not fail ] Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24mm/sl[au]b: introduce common alloc/free functions without tracepointHyeonggon Yoo1-7/+29
To unify kmalloc functions in later patch, introduce common alloc/free functions that does not have tracepoint. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24mm/slab: kmalloc: pass requests larger than order-1 page to page allocatorHyeonggon Yoo1-18/+42
There is not much benefit for serving large objects in kmalloc(). Let's pass large requests to page allocator like SLUB for better maintenance of common code. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24mm/sl[au]b: factor out __do_kmalloc_node()Hyeonggon Yoo1-29/+1
__kmalloc(), __kmalloc_node(), __kmalloc_node_track_caller() mostly do same job. Factor out common code into __do_kmalloc_node(). Note that this patch also fixes missing kasan_kmalloc() in SLUB's __kmalloc_node_track_caller(). Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24mm/slab_common: cleanup kmalloc_track_caller()Hyeonggon Yoo1-6/+0
Make kmalloc_track_caller() wrapper of kmalloc_node_track_caller(). Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functionsHyeonggon Yoo1-2/+0
Now that slab_alloc_node() is available for SLAB when CONFIG_NUMA=n, remove CONFIG_NUMA ifdefs for common kmalloc functions. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24mm/slab: cleanup slab_alloc() and slab_alloc_node()Hyeonggon Yoo1-36/+13
Make slab_alloc_node() available even when CONFIG_NUMA=n and make slab_alloc() wrapper of slab_alloc_node(). This is necessary for further cleanup. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24mm/slab: move NUMA-related code to __do_cache_alloc()Hyeonggon Yoo1-37/+31
To implement slab_alloc_node() independent of NUMA configuration, move NUMA fallback/alternate allocation code into __do_cache_alloc(). One functional change here is not to check availability of node when allocating from local node. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-05Merge tag 'mm-stable-2022-08-03' of ↵Linus Torvalds1-10/+0
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...
2022-07-20mm/sl[au]b: use own bulk free function when bulk alloc failedHyeonggon Yoo1-1/+1
There is no benefit to call generic bulk free function when kmem_cache_alloc_bulk() failed. Use own kmem_cache_free_bulk() instead of generic function. Note that if kmem_cache_alloc_bulk() fails to allocate first object in SLUB, size is zero. So allow passing size == 0 to kmem_cache_free_bulk() like SLAB's. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-07-04mm: slab: optimize memcg_slab_free_hook()Muchun Song1-2/+2
Most callers of memcg_slab_free_hook() already know the slab, which could be passed to memcg_slab_free_hook() directly to reduce the overhead of an another call of virt_to_slab(). For bulk freeing of objects, the call of slab_objcgs() in the loop in memcg_slab_free_hook() is redundant as well. Rework memcg_slab_free_hook() and build_detached_freelist() to reduce those unnecessary overhead and make memcg_slab_free_hook() can handle bulk freeing in slab_free(). Move the calling site of memcg_slab_free_hook() from do_slab_free() to slab_free() for slub to make the code clearer since the logic is weird (e.g. the caller need to judge whether it needs to call memcg_slab_free_hook()). It is easy to make mistakes like missing calling of memcg_slab_free_hook() like fixes of: commit d1b2cf6cb84a ("mm: memcg/slab: uncharge during kmem_cache_free_bulk()") commit ae085d7f9365 ("mm: kfence: fix missing objcg housekeeping for SLAB") This optimization is mainly for bulk objects freeing. The following numbers is shown for 16-object freeing. before after kmem_cache_free_bulk: ~430 ns ~400 ns The overhead is reduced by about 7% for 16-object freeing. Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Link: https://lore.kernel.org/r/20220429123044.37885-1-songmuchun@bytedance.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-07-04mm/tracing: add 'accounted' entry into output of allocation tracepointsVasily Averin1-5/+5
Slab caches marked with SLAB_ACCOUNT force accounting for every allocation from this cache even if __GFP_ACCOUNT flag is not passed. Unfortunately, at the moment this flag is not visible in ftrace output, and this makes it difficult to analyze the accounted allocations. This patch adds boolean "accounted" entry into trace output, and set it to 'true' for calls used __GFP_ACCOUNT flag and for allocations from caches marked with SLAB_ACCOUNT. Set it to 'false' if accounting is disabled in configs. Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Link: https://lore.kernel.org/r/c418ed25-65fe-f623-fbf8-1676528859ed@openvz.org Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-07-04mm, slab: fix bad alignmentsJiapeng Chong1-2/+2
As reported by coccicheck: ./mm/slab.c:3253:2-59: code aligned with following code on line 3255. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-06-16mm/slab: delete cache_alloc_debugcheck_before()Daniel Vetter1-10/+0
It only does a might_sleep_if(GFP_RECLAIM) check, which is already covered by the might_alloc() in slab_pre_alloc_hook(). And all callers of cache_alloc_debugcheck_before() call that beforehand already. Link: https://lkml.kernel.org/r/20220605152539.3196045-2-daniel.vetter@ffwll.ch Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-26Merge tag 'mm-stable-2022-05-25' of ↵Linus Torvalds1-4/+3
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Almost all of MM here. A few things are still getting finished off, reviewed, etc. - Yang Shi has improved the behaviour of khugepaged collapsing of readonly file-backed transparent hugepages. - Johannes Weiner has arranged for zswap memory use to be tracked and managed on a per-cgroup basis. - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime enablement of the recent huge page vmemmap optimization feature. - Baolin Wang contributes a series to fix some issues around hugetlb pagetable invalidation. - Zhenwei Pi has fixed some interactions between hwpoisoned pages and virtualization. - Tong Tiangen has enabled the use of the presently x86-only page_table_check debugging feature on arm64 and riscv. - David Vernet has done some fixup work on the memcg selftests. - Peter Xu has taught userfaultfd to handle write protection faults against shmem- and hugetlbfs-backed files. - More DAMON development from SeongJae Park - adding online tuning of the feature and support for monitoring of fixed virtual address ranges. Also easier discovery of which monitoring operations are available. - Nadav Amit has done some optimization of TLB flushing during mprotect(). - Neil Brown continues to labor away at improving our swap-over-NFS support. - David Hildenbrand has some fixes to anon page COWing versus get_user_pages(). - Peng Liu fixed some errors in the core hugetlb code. - Joao Martins has reduced the amount of memory consumed by device-dax's compound devmaps. - Some cleanups of the arch-specific pagemap code from Anshuman Khandual. - Muchun Song has found and fixed some errors in the TLB flushing of transparent hugepages. - Roman Gushchin has done more work on the memcg selftests. ... and, of course, many smaller fixes and cleanups. Notably, the customary million cleanup serieses from Miaohe Lin" * tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits) mm: kfence: use PAGE_ALIGNED helper selftests: vm: add the "settings" file with timeout variable selftests: vm: add "test_hmm.sh" to TEST_FILES selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests selftests: vm: add migration to the .gitignore selftests/vm/pkeys: fix typo in comment ksm: fix typo in comment selftests: vm: add process_mrelease tests Revert "mm/vmscan: never demote for memcg reclaim" mm/kfence: print disabling or re-enabling message include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace" include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion" mm: fix a potential infinite loop in start_isolate_page_range() MAINTAINERS: add Muchun as co-maintainer for HugeTLB zram: fix Kconfig dependency warning mm/shmem: fix shmem folio swapoff hang cgroup: fix an error handling path in alloc_pagecache_max_30M() mm: damon: use HPAGE_PMD_SIZE tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate nodemask.h: fix compilation error with GCC12 ...
2022-05-25Merge tag 'slab-for-5.19' of ↵Linus Torvalds1-21/+8
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - Conversion of slub_debug stack traces to stackdepot, allowing more useful debugfs-based inspection for e.g. memory leak debugging. Allocation and free debugfs info now includes full traces and is sorted by the unique trace frequency. The stackdepot conversion was already attempted last year but reverted by ae14c63a9f20. The memory overhead (while not actually enabled on boot) has been meanwhile solved by making the large stackdepot allocation dynamic. The xfstest issues haven't been reproduced on current kernel locally nor in -next, so the slab cache layout changes that originally made that bug manifest were probably not the root cause. - Refactoring of dma-kmalloc caches creation. - Trivial cleanups such as removal of unused parameters, fixes and clarifications of comments. - Hyeonggon Yoo joins as a reviewer. * tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: MAINTAINERS: add myself as reviewer for slab mm/slub: remove unused kmem_cache_order_objects max mm: slab: fix comment for __assume_kmalloc_alignment mm: slab: fix comment for ARCH_KMALLOC_MINALIGN mm/slub: remove unneeded return value of slab_pad_check mm/slab_common: move dma-kmalloc caches creation into new_kmalloc_cache() mm/slub: remove meaningless node check in ___slab_alloc() mm/slub: remove duplicate flag in allocate_slab() mm/slub: remove unused parameter in setup_object*() mm/slab.c: fix comments slab, documentation: add description of debugfs files for SLUB caches mm/slub: sort debugfs output by frequency of stack traces mm/slub: distinguish and print stack traces in debugfs files mm/slub: use stackdepot to save stack trace in objects mm/slub: move struct track init out of set_track() lib/stackdepot: allow requesting early initialization dynamically mm/slub, kunit: Make slub_kunit unaffected by user specified flags mm/slab: remove some unused functions
2022-05-13mm: make minimum slab alignment a runtime propertyPeter Collingbourne1-4/+3
When CONFIG_KASAN_HW_TAGS is enabled we currently increase the minimum slab alignment to 16. This happens even if MTE is not supported in hardware or disabled via kasan=off, which creates an unnecessary memory overhead in those cases. Eliminate this overhead by making the minimum slab alignment a runtime property and only aligning to 16 if KASAN is enabled at runtime. On a DragonBoard 845c (non-MTE hardware) with a kernel built with CONFIG_KASAN_HW_TAGS, waiting for quiescence after a full Android boot I see the following Slab measurements in /proc/meminfo (median of 3 reboots): Before: 169020 kB After: 167304 kB [akpm@linux-foundation.org: make slab alignment type `unsigned int' to avoid casting] Link: https://linux-review.googlesource.com/id/I752e725179b43b144153f4b6f584ceb646473ead Link: https://lkml.kernel.org/r/20220427195820.1716975-2-pcc@google.com Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-15mm, kfence: support kmem_dump_obj() for KFENCE objectsMarco Elver1-1/+1
Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been producing garbage data due to the object not actually being maintained by SLAB or SLUB. Fix this by implementing __kfence_obj_info() that copies relevant information to struct kmem_obj_info when the object was allocated by KFENCE; this is called by a common kmem_obj_info(), which also calls the slab/slub/slob specific variant now called __kmem_obj_info(). For completeness, kmem_dump_obj() now displays if the object was allocated by KFENCE. Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com Fixes: b89fb5ef0ce6 ("mm, kfence: insert KFENCE hooks for SLUB") Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reported-by: kernel test robot <oliver.sang@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> [slab] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-07mm/slab.c: fix commentsYixuan Cao1-6/+6
While reading the source code, I noticed some language errors in the comments, so I fixed them. Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220407080958.3667-1-caoyixuan2019@email.szu.edu.cn
2022-04-05mm/slab: remove some unused functionsMiaohe Lin1-15/+2
alternate_node_alloc and ____cache_alloc_node are always called when CONFIG_NUMA. So we can remove the unused !CONFIG_NUMA variant. Also forward declaration for alternate_node_alloc is unnecessary. Remove it too. [ vbabka@suse.cz: move ____cache_alloc_node() declaration closer to its callers ] Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220322091421.25285-1-linmiaohe@huawei.com
2022-03-27mm: kfence: fix missing objcg housekeeping for SLABMuchun Song1-0/+1
The objcg is not cleared and put for kfence object when it is freed, which could lead to memory leak for struct obj_cgroup and wrong statistics of NR_SLAB_RECLAIMABLE_B or NR_SLAB_UNRECLAIMABLE_B. Since the last freed object's objcg is not cleared, mem_cgroup_from_obj() could return the wrong memcg when this kfence object, which is not charged to any objcgs, is reallocated to other users. A real word issue [1] is caused by this bug. Link: https://lore.kernel.org/all/000000000000cabcb505dae9e577@google.com/ [1] Reported-by: syzbot+f8c45ccc7d5d45fc5965@syzkaller.appspotmail.com Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm: introduce kmem_cache_alloc_lruMuchun Song1-12/+27
We currently allocate scope for every memcg to be able to tracked on every superblock instantiated in the system, regardless of whether that superblock is even accessible to that memcg. These huge memcg counts come from container hosts where memcgs are confined to just a small subset of the total number of superblocks that instantiated at any given point in time. For these systems with huge container counts, list_lru does not need the capability of tracking every memcg on every superblock. What it comes down to is that adding the memcg to the list_lru at the first insert. So introduce kmem_cache_alloc_lru to allocate objects and its list_lru. In the later patch, we will convert all inode and dentry allocation from kmem_cache_alloc to kmem_cache_alloc_lru. Link: https://lkml.kernel.org/r/20220228122126.37293-3-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-06mm/kasan: Convert to struct folio and struct slabMatthew Wilcox (Oracle)1-1/+1
KASAN accesses some slab related struct page fields so we need to convert it to struct slab. Some places are a bit simplified thanks to kasan_addr_to_slab() encapsulating the PageSlab flag check through virt_to_slab(). When resolving object address to either a real slab or a large kmalloc, use struct folio as the intermediate type for testing the slab flag to avoid unnecessary implicit compound_head(). [ vbabka@suse.cz: use struct folio, adjust to differences in previous patches ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Tested-by: Hyeongogn Yoo <42.hyeyoo@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <kasan-dev@googlegroups.com>
2022-01-06mm: Convert struct page to struct slab in functions used by other subsystemsVlastimil Babka1-5/+5
KASAN, KFENCE and memcg interact with SLAB or SLUB internals through functions nearest_obj(), obj_to_index() and objs_per_slab() that use struct page as parameter. This patch converts it to struct slab including all callers, through a coccinelle semantic patch. // Options: --include-headers --no-includes --smpl-spacing include/linux/slab_def.h include/linux/slub_def.h mm/slab.h mm/kasan/*.c mm/kfence/kfence_test.c mm/memcontrol.c mm/slab.c mm/slub.c // Note: needs coccinelle 1.1.1 to avoid breaking whitespace @@ @@ -objs_per_slab_page( +objs_per_slab( ... ) { ... } @@ @@ -objs_per_slab_page( +objs_per_slab( ... ) @@ identifier fn =~ "obj_to_index|objs_per_slab"; @@ fn(..., - const struct page *page + const struct slab *slab ,...) { <... ( - page_address(page) + slab_address(slab) | - page + slab ) ...> } @@ identifier fn =~ "nearest_obj"; @@ fn(..., - struct page *page + const struct slab *slab ,...) { <... ( - page_address(page) + slab_address(slab) | - page + slab ) ...> } @@ identifier fn =~ "nearest_obj|obj_to_index|objs_per_slab"; expression E; @@ fn(..., ( - slab_page(E) + E | - virt_to_page(E) + virt_to_slab(E) | - virt_to_head_page(E) + virt_to_slab(E) | - page + page_slab(page) ) ,...) Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <kasan-dev@googlegroups.com> Cc: <cgroups@vger.kernel.org>
2022-01-06mm/slab: Finish struct page to struct slab conversionVlastimil Babka1-10/+10
Change cache_free_alien() to use slab_nid(virt_to_slab()). Otherwise just update of comments and some remaining variable names. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm/slab: Convert most struct page to struct slab by spatchVlastimil Babka1-180/+180
The majority of conversion from struct page to struct slab in SLAB internals can be delegated to a coccinelle semantic patch. This includes renaming of variables with 'page' in name to 'slab', and similar. Big thanks to Julia Lawall and Luis Chamberlain for help with coccinelle. // Options: --include-headers --no-includes --smpl-spacing mm/slab.c // Note: needs coccinelle 1.1.1 to avoid breaking whitespace, and ocaml for the // embedded script // build list of functions for applying the next rule @initialize:ocaml@ @@ let ok_function p = not (List.mem (List.hd p).current_element ["kmem_getpages";"kmem_freepages"]) // convert the type in selected functions @@ position p : script:ocaml() { ok_function p }; @@ - struct page@p + struct slab @@ @@ -PageSlabPfmemalloc(page) +slab_test_pfmemalloc(slab) @@ @@ -ClearPageSlabPfmemalloc(page) +slab_clear_pfmemalloc(slab) @@ @@ obj_to_index( ..., - page + slab_page(slab) ,...) // for all functions, change any "struct slab *page" parameter to "struct slab // *slab" in the signature, and generally all occurences of "page" to "slab" in // the body - with some special cases. @@ identifier fn; expression E; @@ fn(..., - struct slab *page + struct slab *slab ,...) { <... ( - int page_node; + int slab_node; | - page_node + slab_node | - page_slab(page) + slab | - page_address(page) + slab_address(slab) | - page_size(page) + slab_size(slab) | - page_to_nid(page) + slab_nid(slab) | - virt_to_head_page(E) + virt_to_slab(E) | - page + slab ) ...> } // rename a function parameter @@ identifier fn; expression E; @@ fn(..., - int page_node + int slab_node ,...) { <... - page_node + slab_node ...> } // functions converted by previous rules that were temporarily called using // slab_page(E) so we want to remove the wrapper now that they accept struct // slab ptr directly @@ identifier fn =~ "index_to_obj"; expression E; @@ fn(..., - slab_page(E) + E ,...) // functions that were returning struct page ptr and now will return struct // slab ptr, including slab_page() wrapper removal @@ identifier fn =~ "cache_grow_begin|get_valid_first_slab|get_first_slab"; expression E; @@ fn(...) { <... - slab_page(E) + E ...> } // rename any former struct page * declarations @@ @@ struct slab * -page +slab ; // all functions (with exceptions) with a local "struct slab *page" variable // that will be renamed to "struct slab *slab" @@ identifier fn !~ "kmem_getpages|kmem_freepages"; expression E; @@ fn(...) { <... ( - page_slab(page) + slab | - page_to_nid(page) + slab_nid(slab) | - kasan_poison_slab(page) + kasan_poison_slab(slab_page(slab)) | - page_address(page) + slab_address(slab) | - page_size(page) + slab_size(slab) | - page->pages + slab->slabs | - page = virt_to_head_page(E) + slab = virt_to_slab(E) | - virt_to_head_page(E) + virt_to_slab(E) | - page + slab ) ...> } Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Luis Chamberlain <mcgrof@kernel.org>
2022-01-06mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slabVlastimil Babka1-25/+28
These functions sit at the boundary to page allocator. Also use folio internally to avoid extra compound_head() when dealing with page flags. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06mm: Convert check_heap_object() to use struct slabMatthew Wilcox (Oracle)1-7/+7
Ensure that we're not seeing a tail page inside __check_heap_object() by converting to a slab instead of a page. Take the opportunity to mark the slab as const since we're not modifying it. Also move the declaration of __check_heap_object() to mm/slab.h so it's not available to the wider kernel. [ vbabka@suse.cz: in check_heap_object() only convert to struct slab for actual PageSlab pages; use folio as intermediate step instead of page ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm: Use struct slab in kmem_obj_info()Matthew Wilcox (Oracle)1-6/+6
All three implementations of slab support kmem_obj_info() which reports details of an object allocated from the slab allocator. By using the slab type instead of the page type, we make it obvious that this can only be called for slabs. [ vbabka@suse.cz: also convert the related kmem_valid_obj() to folios ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm: Convert [un]account_slab_page() to struct slabMatthew Wilcox (Oracle)1-2/+2
Convert the parameter of these functions to struct slab instead of struct page and drop _page from the names. For now their callers just convert page to slab. [ vbabka@suse.cz: replace existing functions instead of calling them ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm/slab: Dissolve slab_map_pages() in its callerVlastimil Babka1-13/+2
The function no longer does what its name and comment suggests, and just sets two struct page fields, which can be done directly in its sole caller. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2021-11-20mm: emit the "free" trace report before freeing memory in kmem_cache_free()Yunfeng Ye1-2/+1
After the memory is freed, it can be immediately allocated by other CPUs, before the "free" trace report has been emitted. This causes inaccurate traces. For example, if the following sequence of events occurs: CPU 0 CPU 1 (1) alloc xxxxxx (2) free xxxxxx (3) alloc xxxxxx (4) free xxxxxx Then they will be inaccurately reported via tracing, so that they appear to have happened in this order: CPU 0 CPU 1 (1) alloc xxxxxx (2) alloc xxxxxx (3) free xxxxxx (4) free xxxxxx This makes it look like CPU 1 somehow managed to allocate memory that CPU 0 still had allocated for itself. In order to avoid this, emit the "free xxxxxx" tracing report just before the actual call to free the memory, instead of just after it. Link: https://lkml.kernel.org/r/374eb75d-7404-8721-4e1e-65b0e5b17279@huawei.com Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06mm: remove HARDENED_USERCOPY_FALLBACKStephen Kitt1-13/+0
This has served its purpose and is no longer used. All usercopy violations appear to have been handled by now, any remaining instances (or new bugs) will cause copies to be rejected. This isn't a direct revert of commit 2d891fbc3bb6 ("usercopy: Allow strict enforcement of whitelists"); since usercopy_fallback is effectively 0, the fallback handling is removed too. This also removes the usercopy_fallback module parameter on slab_common. Link: https://github.com/KSPP/linux/issues/153 Link: https://lkml.kernel.org/r/20210921061149.1091163-1-steve@sk2.org Signed-off-by: Stephen Kitt <steve@sk2.org> Suggested-by: Kees Cook <keescook@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Joel Stanley <joel@jms.id.au> [defconfig change] Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: James Morris <jmorris@namei.org> Cc: "Serge E . Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06mm/slab.c: remove useless lines in enable_cpucache()Shi Lei1-3/+0
These lines are useless, so remove them. Link: https://lkml.kernel.org/r/20210930034845.2539-1-shi_lei@massclouds.com Fixes: 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") Signed-off-by: Shi Lei <shi_lei@massclouds.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-18mm/migrate: add CPU hotplug to demotion #ifdefDave Hansen1-2/+2
Once upon a time, the node demotion updates were driven solely by memory hotplug events. But now, there are handlers for both CPU and memory hotplug. However, the #ifdef around the code checks only memory hotplug. A system that has HOTPLUG_CPU=y but MEMORY_HOTPLUG=n would miss CPU hotplug events. Update the #ifdef around the common code. Add memory and CPU-specific #ifdefs for their handlers. These memory/CPU #ifdefs avoid unused function warnings when their Kconfig option is off. [arnd@arndb.de: rework hotplug_memory_notifier() stub] Link: https://lkml.kernel.org/r/20211013144029.2154629-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20210924161255.E5FE8F7E@davehans-spike.ostc.intel.com Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Xu <weixugc@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07mm: fix typos in commentsIngo Molnar1-3/+3
Fix ~94 single-word typos in locking code comments, plus a few very obvious grammar mistakes. Link: https://lkml.kernel.org/r/20210322212624.GA1963421@gmail.com Link: https://lore.kernel.org/r/20210322205203.GB1959563@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>