From e17f1dfba37b84b574ae91e809ae3804fe5b29b9 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Thu, 6 Aug 2020 23:18:35 -0700 Subject: mm, slub: extend slub_debug syntax for multiple blocks Patch series "slub_debug fixes and improvements". The slub_debug kernel boot parameter can either apply a single set of options to all caches or a list of caches. There is a use case where debugging is applied for all caches and then disabled at runtime for specific caches, for performance and memory consumption reasons [1]. As runtime changes are dangerous, extend the boot parameter syntax so that multiple blocks of either global or slab-specific options can be specified, with blocks delimited by ';'. This will also support the use case of [1] without runtime changes. For details see the updated Documentation/vm/slub.rst [1] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org [weiyongjun1@huawei.com: make parse_slub_debug_flags() static] Link: http://lkml.kernel.org/r/20200702150522.4940-1-weiyongjun1@huawei.com Signed-off-by: Vlastimil Babka Signed-off-by: Andrew Morton Reviewed-by: Kees Cook Cc: Vlastimil Babka Cc: Christoph Lameter Cc: Jann Horn Cc: Roman Gushchin Cc: Vijayanand Jitta Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Link: http://lkml.kernel.org/r/20200610163135.17364-2-vbabka@suse.cz Signed-off-by: Linus Torvalds --- Documentation/vm/slub.rst | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'Documentation/vm') diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst index 4eee598555c9..cfccb258cf42 100644 --- a/Documentation/vm/slub.rst +++ b/Documentation/vm/slub.rst @@ -41,6 +41,11 @@ slub_debug=,,,... Enable options only for select slabs (no spaces after a comma) +Multiple blocks of options for all slabs or selected slabs can be given, with +blocks of options delimited by ';'. The last of "all slabs" blocks is applied +to all slabs except those that match one of the "select slabs" block. Options +of the first "select slabs" blocks that matches the slab's name are applied. + Possible debug options are:: F Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS @@ -83,6 +88,19 @@ switch off debugging for such caches by default, use:: slub_debug=O +You can apply different options to different list of slab names, using blocks +of options. This will enable red zoning for dentry and user tracking for +kmalloc. All other slabs will not get any debugging enabled:: + + slub_debug=Z,dentry;U,kmalloc-* + +You can also enable options (e.g. sanity checks and poisoning) for all caches +except some that are deemed too performance critical and don't need to be +debugged by specifying global debug options followed by a list of slab names +with "-" as options:: + + slub_debug=FZ;-,zs_handle,zspage + In case you forgot to enable debugging on the kernel command line: It is possible to enable debugging manually when the kernel is up. Look at the contents of:: -- cgit v1.2.3 From ad38b5b1131e2a0e5c46724847da2e1eba31fb68 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Thu, 6 Aug 2020 23:18:38 -0700 Subject: mm, slub: make some slub_debug related attributes read-only SLUB_DEBUG creates several files under /sys/kernel/slab// that can be read to check if the respective debugging options are enabled for given cache. The options can be also toggled at runtime by writing into the files. Some of those, namely red_zone, poison, and store_user can be toggled only when no objects yet exist in the cache. Vijayanand reports [1] that there is a problem with freelist randomization if changing the debugging option's state results in different number of objects per page, and the random sequence cache needs thus needs to be recomputed. However, another problem is that the check for "no objects yet exist in the cache" is racy, as noted by Jann [2] and fixing that would add overhead or otherwise complicate the allocation/freeing paths. Thus it would be much simpler just to remove the runtime toggling support. The documentation describes it's "In case you forgot to enable debugging on the kernel command line", but the neccessity of having no objects limits its usefulness anyway for many caches. Vijayanand describes an use case [3] where debugging is enabled for all but zram caches for memory overhead reasons, and using the runtime toggles was the only way to achieve such configuration. After the previous patch it's now possible to do that directly from the kernel boot option, so we can remove the dangerous runtime toggles by making the /sys attribute files read-only. While updating it, also improve the documentation of the debugging /sys files. [1] https://lkml.kernel.org/r/1580379523-32272-1-git-send-email-vjitta@codeaurora.org [2] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com [3] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org Reported-by: Vijayanand Jitta Reported-by: Jann Horn Signed-off-by: Vlastimil Babka Signed-off-by: Andrew Morton Reviewed-by: Kees Cook Acked-by: Roman Gushchin Cc: Christoph Lameter Cc: David Rientjes Cc: Joonsoo Kim Cc: Pekka Enberg Link: http://lkml.kernel.org/r/20200610163135.17364-3-vbabka@suse.cz Signed-off-by: Linus Torvalds --- Documentation/vm/slub.rst | 28 +++++++++++++++++----------- mm/slub.c | 46 +++------------------------------------------- 2 files changed, 20 insertions(+), 54 deletions(-) (limited to 'Documentation/vm') diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst index cfccb258cf42..36241dfba024 100644 --- a/Documentation/vm/slub.rst +++ b/Documentation/vm/slub.rst @@ -101,20 +101,26 @@ with "-" as options:: slub_debug=FZ;-,zs_handle,zspage -In case you forgot to enable debugging on the kernel command line: It is -possible to enable debugging manually when the kernel is up. Look at the -contents of:: +The state of each debug option for a slab can be found in the respective files +under:: /sys/kernel/slab// -Look at the writable files. Writing 1 to them will enable the -corresponding debug option. All options can be set on a slab that does -not contain objects. If the slab already contains objects then sanity checks -and tracing may only be enabled. The other options may cause the realignment -of objects. - -Careful with tracing: It may spew out lots of information and never stop if -used on the wrong slab. +If the file contains 1, the option is enabled, 0 means disabled. The debug +options from the ``slub_debug`` parameter translate to the following files:: + + F sanity_checks + Z red_zone + P poison + U store_user + T trace + A failslab + +The sanity_checks, trace and failslab files are writable, so writing 1 or 0 +will enable or disable the option at runtime. The writes to trace and failslab +may return -EINVAL if the cache is subject to slab merging. Careful with +tracing: It may spew out lots of information and never stop if used on the +wrong slab. Slab merging ============ diff --git a/mm/slub.c b/mm/slub.c index 829985d7c7c5..fd0b196fdaaf 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5335,61 +5335,21 @@ static ssize_t red_zone_show(struct kmem_cache *s, char *buf) return sprintf(buf, "%d\n", !!(s->flags & SLAB_RED_ZONE)); } -static ssize_t red_zone_store(struct kmem_cache *s, - const char *buf, size_t length) -{ - if (any_slab_objects(s)) - return -EBUSY; - - s->flags &= ~SLAB_RED_ZONE; - if (buf[0] == '1') { - s->flags |= SLAB_RED_ZONE; - } - calculate_sizes(s, -1); - return length; -} -SLAB_ATTR(red_zone); +SLAB_ATTR_RO(red_zone); static ssize_t poison_show(struct kmem_cache *s, char *buf) { return sprintf(buf, "%d\n", !!(s->flags & SLAB_POISON)); } -static ssize_t poison_store(struct kmem_cache *s, - const char *buf, size_t length) -{ - if (any_slab_objects(s)) - return -EBUSY; - - s->flags &= ~SLAB_POISON; - if (buf[0] == '1') { - s->flags |= SLAB_POISON; - } - calculate_sizes(s, -1); - return length; -} -SLAB_ATTR(poison); +SLAB_ATTR_RO(poison); static ssize_t store_user_show(struct kmem_cache *s, char *buf) { return sprintf(buf, "%d\n", !!(s->flags & SLAB_STORE_USER)); } -static ssize_t store_user_store(struct kmem_cache *s, - const char *buf, size_t length) -{ - if (any_slab_objects(s)) - return -EBUSY; - - s->flags &= ~SLAB_STORE_USER; - if (buf[0] == '1') { - s->flags &= ~__CMPXCHG_DOUBLE; - s->flags |= SLAB_STORE_USER; - } - calculate_sizes(s, -1); - return length; -} -SLAB_ATTR(store_user); +SLAB_ATTR_RO(store_user); static ssize_t validate_show(struct kmem_cache *s, char *buf) { -- cgit v1.2.3 From 060807f841ac94d3826ce6fa3b4f3831cd0c015b Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Thu, 6 Aug 2020 23:18:45 -0700 Subject: mm, slub: make remaining slub_debug related attributes read-only SLUB_DEBUG creates several files under /sys/kernel/slab// that can be read to check if the respective debugging options are enabled for given cache. Some options, namely sanity_checks, trace, and failslab can be also enabled and disabled at runtime by writing into the files. The runtime toggling is racy. Some options disable __CMPXCHG_DOUBLE when enabled, which means that in case of concurrent allocations, some can still use __CMPXCHG_DOUBLE and some not, leading to potential corruption. The s->flags field is also not updated or checked atomically. The simplest solution is to remove the runtime toggling. The extended slub_debug boot parameter syntax introduced by earlier patch should allow to fine-tune the debugging configuration during boot with same granularity. Signed-off-by: Vlastimil Babka Signed-off-by: Andrew Morton Reviewed-by: Kees Cook Acked-by: Roman Gushchin Cc: Christoph Lameter Cc: Jann Horn Cc: Vijayanand Jitta Cc: David Rientjes Cc: Joonsoo Kim Cc: Pekka Enberg Link: http://lkml.kernel.org/r/20200610163135.17364-5-vbabka@suse.cz Signed-off-by: Linus Torvalds --- Documentation/vm/slub.rst | 7 ++---- mm/slub.c | 62 +++-------------------------------------------- 2 files changed, 5 insertions(+), 64 deletions(-) (limited to 'Documentation/vm') diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst index 36241dfba024..289d231cee97 100644 --- a/Documentation/vm/slub.rst +++ b/Documentation/vm/slub.rst @@ -116,11 +116,8 @@ options from the ``slub_debug`` parameter translate to the following files:: T trace A failslab -The sanity_checks, trace and failslab files are writable, so writing 1 or 0 -will enable or disable the option at runtime. The writes to trace and failslab -may return -EINVAL if the cache is subject to slab merging. Careful with -tracing: It may spew out lots of information and never stop if used on the -wrong slab. +Careful with tracing: It may spew out lots of information and never stop if +used on the wrong slab. Slab merging ============ diff --git a/mm/slub.c b/mm/slub.c index f03b1073bdcf..5848b7c3575a 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5040,20 +5040,6 @@ static ssize_t show_slab_objects(struct kmem_cache *s, return x + sprintf(buf + x, "\n"); } -#ifdef CONFIG_SLUB_DEBUG -static int any_slab_objects(struct kmem_cache *s) -{ - int node; - struct kmem_cache_node *n; - - for_each_kmem_cache_node(s, node, n) - if (atomic_long_read(&n->total_objects)) - return 1; - - return 0; -} -#endif - #define to_slab_attr(n) container_of(n, struct slab_attribute, attr) #define to_slab(n) container_of(n, struct kmem_cache, kobj) @@ -5275,43 +5261,13 @@ static ssize_t sanity_checks_show(struct kmem_cache *s, char *buf) { return sprintf(buf, "%d\n", !!(s->flags & SLAB_CONSISTENCY_CHECKS)); } - -static ssize_t sanity_checks_store(struct kmem_cache *s, - const char *buf, size_t length) -{ - s->flags &= ~SLAB_CONSISTENCY_CHECKS; - if (buf[0] == '1') { - s->flags &= ~__CMPXCHG_DOUBLE; - s->flags |= SLAB_CONSISTENCY_CHECKS; - } - return length; -} -SLAB_ATTR(sanity_checks); +SLAB_ATTR_RO(sanity_checks); static ssize_t trace_show(struct kmem_cache *s, char *buf) { return sprintf(buf, "%d\n", !!(s->flags & SLAB_TRACE)); } - -static ssize_t trace_store(struct kmem_cache *s, const char *buf, - size_t length) -{ - /* - * Tracing a merged cache is going to give confusing results - * as well as cause other issues like converting a mergeable - * cache into an umergeable one. - */ - if (s->refcount > 1) - return -EINVAL; - - s->flags &= ~SLAB_TRACE; - if (buf[0] == '1') { - s->flags &= ~__CMPXCHG_DOUBLE; - s->flags |= SLAB_TRACE; - } - return length; -} -SLAB_ATTR(trace); +SLAB_ATTR_RO(trace); static ssize_t red_zone_show(struct kmem_cache *s, char *buf) { @@ -5375,19 +5331,7 @@ static ssize_t failslab_show(struct kmem_cache *s, char *buf) { return sprintf(buf, "%d\n", !!(s->flags & SLAB_FAILSLAB)); } - -static ssize_t failslab_store(struct kmem_cache *s, const char *buf, - size_t length) -{ - if (s->refcount > 1) - return -EINVAL; - - s->flags &= ~SLAB_FAILSLAB; - if (buf[0] == '1') - s->flags |= SLAB_FAILSLAB; - return length; -} -SLAB_ATTR(failslab); +SLAB_ATTR_RO(failslab); #endif static ssize_t shrink_show(struct kmem_cache *s, char *buf) -- cgit v1.2.3 From b1d00007f2126e045c465daf754df43ba74ddcdd Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Thu, 6 Aug 2020 23:19:28 -0700 Subject: Documentation/mm: add descriptions for arch page table helpers This adds a specific description file for all arch page table helpers which is in sync with the semantics being tested via CONFIG_DEBUG_VM_PGTABLE. All future changes either to these descriptions here or the debug test should always remain in sync. [anshuman.khandual@arm.com: fold in Mike's patch for the rst document, fix typos in the rst document] Link: http://lkml.kernel.org/r/1594610587-4172-5-git-send-email-anshuman.khandual@arm.com Suggested-by: Mike Rapoport Signed-off-by: Anshuman Khandual Signed-off-by: Andrew Morton Acked-by: Mike Rapoport Cc: Jonathan Corbet Cc: Mike Rapoport Cc: Vineet Gupta Cc: Catalin Marinas Cc: Will Deacon Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Christian Borntraeger Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Kirill A. Shutemov Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Zi Yan Link: http://lkml.kernel.org/r/1593996516-7186-5-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds --- Documentation/vm/arch_pgtable_helpers.rst | 258 ++++++++++++++++++++++++++++++ mm/debug_vm_pgtable.c | 6 + 2 files changed, 264 insertions(+) create mode 100644 Documentation/vm/arch_pgtable_helpers.rst (limited to 'Documentation/vm') diff --git a/Documentation/vm/arch_pgtable_helpers.rst b/Documentation/vm/arch_pgtable_helpers.rst new file mode 100644 index 000000000000..f3591ee3aaa8 --- /dev/null +++ b/Documentation/vm/arch_pgtable_helpers.rst @@ -0,0 +1,258 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _arch_page_table_helpers: + +=============================== +Architecture Page Table Helpers +=============================== + +Generic MM expects architectures (with MMU) to provide helpers to create, access +and modify page table entries at various level for different memory functions. +These page table helpers need to conform to a common semantics across platforms. +Following tables describe the expected semantics which can also be tested during +boot via CONFIG_DEBUG_VM_PGTABLE option. All future changes in here or the debug +test need to be in sync. + +====================== +PTE Page Table Helpers +====================== + ++---------------------------+--------------------------------------------------+ +| pte_same | Tests whether both PTE entries are the same | ++---------------------------+--------------------------------------------------+ +| pte_bad | Tests a non-table mapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_present | Tests a valid mapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_young | Tests a young PTE | ++---------------------------+--------------------------------------------------+ +| pte_dirty | Tests a dirty PTE | ++---------------------------+--------------------------------------------------+ +| pte_write | Tests a writable PTE | ++---------------------------+--------------------------------------------------+ +| pte_special | Tests a special PTE | ++---------------------------+--------------------------------------------------+ +| pte_protnone | Tests a PROT_NONE PTE | ++---------------------------+--------------------------------------------------+ +| pte_devmap | Tests a ZONE_DEVICE mapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_soft_dirty | Tests a soft dirty PTE | ++---------------------------+--------------------------------------------------+ +| pte_swp_soft_dirty | Tests a soft dirty swapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkyoung | Creates a young PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkold | Creates an old PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkdirty | Creates a dirty PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkclean | Creates a clean PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkwrite | Creates a writable PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkwrprotect | Creates a write protected PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkspecial | Creates a special PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkdevmap | Creates a ZONE_DEVICE mapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_mksoft_dirty | Creates a soft dirty PTE | ++---------------------------+--------------------------------------------------+ +| pte_clear_soft_dirty | Clears a soft dirty PTE | ++---------------------------+--------------------------------------------------+ +| pte_swp_mksoft_dirty | Creates a soft dirty swapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_swp_clear_soft_dirty | Clears a soft dirty swapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_mknotpresent | Invalidates a mapped PTE | ++---------------------------+--------------------------------------------------+ +| ptep_get_and_clear | Clears a PTE | ++---------------------------+--------------------------------------------------+ +| ptep_get_and_clear_full | Clears a PTE | ++---------------------------+--------------------------------------------------+ +| ptep_test_and_clear_young | Clears young from a PTE | ++---------------------------+--------------------------------------------------+ +| ptep_set_wrprotect | Converts into a write protected PTE | ++---------------------------+--------------------------------------------------+ +| ptep_set_access_flags | Converts into a more permissive PTE | ++---------------------------+--------------------------------------------------+ + +====================== +PMD Page Table Helpers +====================== + ++---------------------------+--------------------------------------------------+ +| pmd_same | Tests whether both PMD entries are the same | ++---------------------------+--------------------------------------------------+ +| pmd_bad | Tests a non-table mapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_leaf | Tests a leaf mapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_huge | Tests a HugeTLB mapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_trans_huge | Tests a Transparent Huge Page (THP) at PMD | ++---------------------------+--------------------------------------------------+ +| pmd_present | Tests a valid mapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_young | Tests a young PMD | ++---------------------------+--------------------------------------------------+ +| pmd_dirty | Tests a dirty PMD | ++---------------------------+--------------------------------------------------+ +| pmd_write | Tests a writable PMD | ++---------------------------+--------------------------------------------------+ +| pmd_special | Tests a special PMD | ++---------------------------+--------------------------------------------------+ +| pmd_protnone | Tests a PROT_NONE PMD | ++---------------------------+--------------------------------------------------+ +| pmd_devmap | Tests a ZONE_DEVICE mapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_soft_dirty | Tests a soft dirty PMD | ++---------------------------+--------------------------------------------------+ +| pmd_swp_soft_dirty | Tests a soft dirty swapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkyoung | Creates a young PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkold | Creates an old PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkdirty | Creates a dirty PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkclean | Creates a clean PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkwrite | Creates a writable PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkwrprotect | Creates a write protected PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkspecial | Creates a special PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkdevmap | Creates a ZONE_DEVICE mapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mksoft_dirty | Creates a soft dirty PMD | ++---------------------------+--------------------------------------------------+ +| pmd_clear_soft_dirty | Clears a soft dirty PMD | ++---------------------------+--------------------------------------------------+ +| pmd_swp_mksoft_dirty | Creates a soft dirty swapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_swp_clear_soft_dirty | Clears a soft dirty swapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkinvalid | Invalidates a mapped PMD [1] | ++---------------------------+--------------------------------------------------+ +| pmd_set_huge | Creates a PMD huge mapping | ++---------------------------+--------------------------------------------------+ +| pmd_clear_huge | Clears a PMD huge mapping | ++---------------------------+--------------------------------------------------+ +| pmdp_get_and_clear | Clears a PMD | ++---------------------------+--------------------------------------------------+ +| pmdp_get_and_clear_full | Clears a PMD | ++---------------------------+--------------------------------------------------+ +| pmdp_test_and_clear_young | Clears young from a PMD | ++---------------------------+--------------------------------------------------+ +| pmdp_set_wrprotect | Converts into a write protected PMD | ++---------------------------+--------------------------------------------------+ +| pmdp_set_access_flags | Converts into a more permissive PMD | ++---------------------------+--------------------------------------------------+ + +====================== +PUD Page Table Helpers +====================== + ++---------------------------+--------------------------------------------------+ +| pud_same | Tests whether both PUD entries are the same | ++---------------------------+--------------------------------------------------+ +| pud_bad | Tests a non-table mapped PUD | ++---------------------------+--------------------------------------------------+ +| pud_leaf | Tests a leaf mapped PUD | ++---------------------------+--------------------------------------------------+ +| pud_huge | Tests a HugeTLB mapped PUD | ++---------------------------+--------------------------------------------------+ +| pud_trans_huge | Tests a Transparent Huge Page (THP) at PUD | ++---------------------------+--------------------------------------------------+ +| pud_present | Tests a valid mapped PUD | ++---------------------------+--------------------------------------------------+ +| pud_young | Tests a young PUD | ++---------------------------+--------------------------------------------------+ +| pud_dirty | Tests a dirty PUD | ++---------------------------+--------------------------------------------------+ +| pud_write | Tests a writable PUD | ++---------------------------+--------------------------------------------------+ +| pud_devmap | Tests a ZONE_DEVICE mapped PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkyoung | Creates a young PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkold | Creates an old PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkdirty | Creates a dirty PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkclean | Creates a clean PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkwrite | Creates a writable PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkwrprotect | Creates a write protected PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkdevmap | Creates a ZONE_DEVICE mapped PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkinvalid | Invalidates a mapped PUD [1] | ++---------------------------+--------------------------------------------------+ +| pud_set_huge | Creates a PUD huge mapping | ++---------------------------+--------------------------------------------------+ +| pud_clear_huge | Clears a PUD huge mapping | ++---------------------------+--------------------------------------------------+ +| pudp_get_and_clear | Clears a PUD | ++---------------------------+--------------------------------------------------+ +| pudp_get_and_clear_full | Clears a PUD | ++---------------------------+--------------------------------------------------+ +| pudp_test_and_clear_young | Clears young from a PUD | ++---------------------------+--------------------------------------------------+ +| pudp_set_wrprotect | Converts into a write protected PUD | ++---------------------------+--------------------------------------------------+ +| pudp_set_access_flags | Converts into a more permissive PUD | ++---------------------------+--------------------------------------------------+ + +========================== +HugeTLB Page Table Helpers +========================== + ++---------------------------+--------------------------------------------------+ +| pte_huge | Tests a HugeTLB | ++---------------------------+--------------------------------------------------+ +| pte_mkhuge | Creates a HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_pte_dirty | Tests a dirty HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_pte_write | Tests a writable HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_pte_mkdirty | Creates a dirty HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_pte_mkwrite | Creates a writable HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_pte_mkwrprotect | Creates a write protected HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_ptep_get_and_clear | Clears a HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_ptep_set_wrprotect | Converts into a write protected HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_ptep_set_access_flags | Converts into a more permissive HugeTLB | ++---------------------------+--------------------------------------------------+ + +======================== +SWAP Page Table Helpers +======================== + ++---------------------------+--------------------------------------------------+ +| __pte_to_swp_entry | Creates a swapped entry (arch) from a mapped PTE | ++---------------------------+--------------------------------------------------+ +| __swp_to_pte_entry | Creates a mapped PTE from a swapped entry (arch) | ++---------------------------+--------------------------------------------------+ +| __pmd_to_swp_entry | Creates a swapped entry (arch) from a mapped PMD | ++---------------------------+--------------------------------------------------+ +| __swp_to_pmd_entry | Creates a mapped PMD from a swapped entry (arch) | ++---------------------------+--------------------------------------------------+ +| is_migration_entry | Tests a migration (read or write) swapped entry | ++---------------------------+--------------------------------------------------+ +| is_write_migration_entry | Tests a write migration swapped entry | ++---------------------------+--------------------------------------------------+ +| make_migration_entry_read | Converts into read migration swapped entry | ++---------------------------+--------------------------------------------------+ +| make_migration_entry | Creates a migration swapped entry (read or write)| ++---------------------------+--------------------------------------------------+ + +[1] https://lore.kernel.org/linux-mm/20181017020930.GN30832@redhat.com/ diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 8d389d071968..086309fb9b6f 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -31,6 +31,12 @@ #include #include +/* + * Please refer Documentation/vm/arch_pgtable_helpers.rst for the semantics + * expectations that are being validated here. All future changes in here + * or the documentation need to be in sync. + */ + #define VMFLAGS (VM_READ|VM_WRITE|VM_EXEC) /* -- cgit v1.2.3 From 56993b4e147e9f2ba91ac15ef9ae5ee0626a6850 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Thu, 6 Aug 2020 23:23:24 -0700 Subject: mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf() There are many instances where vmemap allocation is often switched between regular memory and device memory just based on whether altmap is available or not. vmemmap_alloc_block_buf() is used in various platforms to allocate vmemmap mappings. Lets also enable it to handle altmap based device memory allocation along with existing regular memory allocations. This will help in avoiding the altmap based allocation switch in many places. To summarize there are two different methods to call vmemmap_alloc_block_buf(). vmemmap_alloc_block_buf(size, node, NULL) /* Allocate from system RAM */ vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */ This converts altmap_alloc_block_buf() into a static function, drops it's entry from the header and updates Documentation/vm/memory-model.rst. Suggested-by: Robin Murphy Signed-off-by: Anshuman Khandual Signed-off-by: Andrew Morton Tested-by: Jia He Reviewed-by: Catalin Marinas Cc: Jonathan Corbet Cc: Will Deacon Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Dan Williams Cc: David Hildenbrand Cc: Fenghua Yu Cc: Hsin-Yi Wang Cc: "Kirill A. Shutemov" Cc: Mark Rutland Cc: "Matthew Wilcox (Oracle)" Cc: Michal Hocko Cc: Mike Rapoport Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Pavel Tatashin Cc: Steve Capper Cc: Tony Luck Cc: Yu Zhao Link: http://lkml.kernel.org/r/1594004178-8861-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds --- Documentation/vm/memory-model.rst | 2 +- arch/arm64/mm/mmu.c | 2 +- arch/powerpc/mm/init_64.c | 4 ++-- arch/x86/mm/init_64.c | 5 +---- include/linux/mm.h | 4 ++-- mm/sparse-vmemmap.c | 28 +++++++++++++--------------- 6 files changed, 20 insertions(+), 25 deletions(-) (limited to 'Documentation/vm') diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst index cc65bc85d260..2b898a27b346 100644 --- a/Documentation/vm/memory-model.rst +++ b/Documentation/vm/memory-model.rst @@ -178,7 +178,7 @@ for persistent memory devices in pre-allocated storage on those devices. This storage is represented with :c:type:`struct vmem_altmap` that is eventually passed to vmemmap_populate() through a long chain of function calls. The vmemmap_populate() implementation may use the -`vmem_altmap` along with :c:func:`altmap_alloc_block_buf` helper to +`vmem_altmap` along with :c:func:`vmemmap_alloc_block_buf` helper to allocate memory map on the persistent memory device. ZONE_DEVICE diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index a32ddd021fe9..f3d9ff323c8f 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1102,7 +1102,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, if (pmd_none(READ_ONCE(*pmdp))) { void *p = NULL; - p = vmemmap_alloc_block_buf(PMD_SIZE, node); + p = vmemmap_alloc_block_buf(PMD_SIZE, node, NULL); if (!p) return -ENOMEM; diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index bc73abf0bc25..3fd504d72c5e 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -225,12 +225,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, * fall back to system memory if the altmap allocation fail. */ if (altmap && !altmap_cross_boundary(altmap, start, page_size)) { - p = altmap_alloc_block_buf(page_size, altmap); + p = vmemmap_alloc_block_buf(page_size, node, altmap); if (!p) pr_debug("altmap block allocation failed, falling back to system memory"); } if (!p) - p = vmemmap_alloc_block_buf(page_size, node); + p = vmemmap_alloc_block_buf(page_size, node, NULL); if (!p) return -ENOMEM; diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 1acc5627b21c..53e1d4f4ed9d 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1515,10 +1515,7 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start, if (pmd_none(*pmd)) { void *p; - if (altmap) - p = altmap_alloc_block_buf(PMD_SIZE, altmap); - else - p = vmemmap_alloc_block_buf(PMD_SIZE, node); + p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap); if (p) { pte_t entry; diff --git a/include/linux/mm.h b/include/linux/mm.h index a7ff98738126..98e35df8d88e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2982,8 +2982,8 @@ pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node, struct vmem_altmap *altmap); void *vmemmap_alloc_block(unsigned long size, int node); struct vmem_altmap; -void *vmemmap_alloc_block_buf(unsigned long size, int node); -void *altmap_alloc_block_buf(unsigned long size, struct vmem_altmap *altmap); +void *vmemmap_alloc_block_buf(unsigned long size, int node, + struct vmem_altmap *altmap); void vmemmap_verify(pte_t *, int, unsigned long, unsigned long); int vmemmap_populate_basepages(unsigned long start, unsigned long end, int node, struct vmem_altmap *altmap); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index ceed10dec31e..41eeac67723b 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -69,11 +69,19 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node) __pa(MAX_DMA_ADDRESS)); } +static void * __meminit altmap_alloc_block_buf(unsigned long size, + struct vmem_altmap *altmap); + /* need to make sure size is all the same during early stage */ -void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node) +void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node, + struct vmem_altmap *altmap) { - void *ptr = sparse_buffer_alloc(size); + void *ptr; + + if (altmap) + return altmap_alloc_block_buf(size, altmap); + ptr = sparse_buffer_alloc(size); if (!ptr) ptr = vmemmap_alloc_block(size, node); return ptr; @@ -94,15 +102,8 @@ static unsigned long __meminit vmem_altmap_nr_free(struct vmem_altmap *altmap) return 0; } -/** - * altmap_alloc_block_buf - allocate pages from the device page map - * @altmap: device page map - * @size: size (in bytes) of the allocation - * - * Allocations are aligned to the size of the request. - */ -void * __meminit altmap_alloc_block_buf(unsigned long size, - struct vmem_altmap *altmap) +static void * __meminit altmap_alloc_block_buf(unsigned long size, + struct vmem_altmap *altmap) { unsigned long pfn, nr_pfns, nr_align; @@ -147,10 +148,7 @@ pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node, pte_t entry; void *p; - if (altmap) - p = altmap_alloc_block_buf(PAGE_SIZE, altmap); - else - p = vmemmap_alloc_block_buf(PAGE_SIZE, node); + p = vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap); if (!p) return NULL; entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL); -- cgit v1.2.3 From c89ab04febf97d2db8ca4ef8e2866fadc474351b Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Thu, 6 Aug 2020 23:24:02 -0700 Subject: mm/sparse: cleanup the code surrounding memory_present() After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent functions that call memory_present() for each region in memblock.memory: sparse_memory_present_with_active_regions() and membocks_present(). Moreover, all architectures have a call to either of these functions preceding the call to sparse_init() and in the most cases they are called one after the other. Mark the regions from memblock.memory as present during sparce_init() by making sparse_init() call memblocks_present(), make memblocks_present() and memory_present() functions static and remove redundant sparse_memory_present_with_active_regions() function. Also remove no longer required HAVE_MEMORY_PRESENT configuration option. Signed-off-by: Mike Rapoport Signed-off-by: Andrew Morton Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.org Signed-off-by: Linus Torvalds --- Documentation/vm/memory-model.rst | 7 ++----- arch/arm/mm/init.c | 9 ++------- arch/arm64/mm/init.c | 6 ++---- arch/ia64/mm/discontig.c | 1 - arch/microblaze/mm/init.c | 3 --- arch/mips/kernel/setup.c | 8 -------- arch/mips/loongson64/numa.c | 1 - arch/mips/sgi-ip27/ip27-memory.c | 2 -- arch/parisc/mm/init.c | 5 ----- arch/powerpc/mm/mem.c | 2 -- arch/powerpc/mm/numa.c | 1 - arch/riscv/mm/init.c | 1 - arch/s390/mm/init.c | 1 - arch/sh/mm/init.c | 6 ------ arch/sh/mm/numa.c | 3 --- arch/sparc/mm/init_64.c | 1 - arch/x86/mm/init_32.c | 2 -- arch/x86/mm/init_64.c | 1 - include/linux/mm.h | 4 ---- include/linux/mmzone.h | 14 -------------- mm/Kconfig | 6 +----- mm/page_alloc.c | 16 ---------------- mm/sparse.c | 20 ++++++++++++-------- 23 files changed, 19 insertions(+), 101 deletions(-) (limited to 'Documentation/vm') diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst index 2b898a27b346..769449734573 100644 --- a/Documentation/vm/memory-model.rst +++ b/Documentation/vm/memory-model.rst @@ -141,11 +141,8 @@ sections: `mem_section` objects and the number of rows is calculated to fit all the memory sections. -The architecture setup code should call :c:func:`memory_present` for -each active memory range or use :c:func:`memblocks_present` or -:c:func:`sparse_memory_present_with_active_regions` wrappers to -initialize the memory sections. Next, the actual memory maps should be -set up using :c:func:`sparse_init`. +The architecture setup code should call sparse_init() to +initialize the memory sections and the memory maps. With SPARSEMEM there are two possible ways to convert a PFN to the corresponding `struct page` - a "classic sparse" and "sparse diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 01e18e43b174..000c1b48e973 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -243,13 +243,8 @@ void __init bootmem_init(void) (phys_addr_t)max_low_pfn << PAGE_SHIFT); /* - * Sparsemem tries to allocate bootmem in memory_present(), - * so must be done after the fixed reservations - */ - memblocks_present(); - - /* - * sparse_init() needs the bootmem allocator up and running. + * sparse_init() tries to allocate memory from memblock, so must be + * done after the fixed reservations */ sparse_init(); diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index f8c19c6c8e71..481d22c32a2e 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -430,11 +430,9 @@ void __init bootmem_init(void) #endif /* - * Sparsemem tries to allocate bootmem in memory_present(), so must be - * done after the fixed reservations. + * sparse_init() tries to allocate memory from memblock, so must be + * done after the fixed reservations */ - memblocks_present(); - sparse_init(); zone_sizes_init(min, max); diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c index 2ba2127335a7..dbe829fc5298 100644 --- a/arch/ia64/mm/discontig.c +++ b/arch/ia64/mm/discontig.c @@ -600,7 +600,6 @@ void __init paging_init(void) max_dma = virt_to_phys((void *) MAX_DMA_ADDRESS) >> PAGE_SHIFT; - sparse_memory_present_with_active_regions(MAX_NUMNODES); sparse_init(); #ifdef CONFIG_VIRTUAL_MEM_MAP diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c index 521b59ba716c..0880a003573d 100644 --- a/arch/microblaze/mm/init.c +++ b/arch/microblaze/mm/init.c @@ -172,9 +172,6 @@ void __init setup_memory(void) &memblock.memory, 0); } - /* XXX need to clip this if using highmem? */ - sparse_memory_present_with_active_regions(0); - paging_init(); } diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index 588b21245e00..bf5f5acab0a8 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -371,14 +371,6 @@ static void __init bootmem_init(void) #endif } - - /* - * In any case the added to the memblock memory regions - * (highmem/lowmem, available/reserved, etc) are considered - * as present, so inform sparsemem about them. - */ - memblocks_present(); - /* * Reserve initrd memory if needed. */ diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c index 901f5be5ee76..ea8bb1bc667e 100644 --- a/arch/mips/loongson64/numa.c +++ b/arch/mips/loongson64/numa.c @@ -220,7 +220,6 @@ static __init void prom_meminit(void) cpumask_clear(&__node_cpumask[node]); } } - memblocks_present(); max_low_pfn = PHYS_PFN(memblock_end_of_DRAM()); for (cpu = 0; cpu < loongson_sysconf.nr_cpus; cpu++) { diff --git a/arch/mips/sgi-ip27/ip27-memory.c b/arch/mips/sgi-ip27/ip27-memory.c index 1213215ea965..d411e0a90a5b 100644 --- a/arch/mips/sgi-ip27/ip27-memory.c +++ b/arch/mips/sgi-ip27/ip27-memory.c @@ -402,8 +402,6 @@ void __init prom_meminit(void) } __node_data[node] = &null_node; } - - memblocks_present(); } void __init prom_free_prom_memory(void) diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c index 39ea464c8bd9..4381b65ae1e0 100644 --- a/arch/parisc/mm/init.c +++ b/arch/parisc/mm/init.c @@ -689,11 +689,6 @@ void __init paging_init(void) flush_cache_all_local(); /* start with known state */ flush_tlb_all_local(NULL); - /* - * Mark all memblocks as present for sparsemem using - * memory_present() and then initialize sparsemem. - */ - memblocks_present(); sparse_init(); parisc_bootmem_free(); } diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index ab12916ec1a7..ec68c9eeac0e 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -183,8 +183,6 @@ void __init mem_topology_setup(void) void __init initmem_init(void) { - /* XXX need to clip this if using highmem? */ - sparse_memory_present_with_active_regions(0); sparse_init(); } diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 9fcf2d195830..03a81d65095b 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -949,7 +949,6 @@ void __init initmem_init(void) get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); setup_node_data(nid, start_pfn, end_pfn); - sparse_memory_present_with_active_regions(nid); } sparse_init(); diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index 416e520d07d3..f6e6286b3d15 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -544,7 +544,6 @@ void mark_rodata_ro(void) void __init paging_init(void) { setup_vm_final(); - memblocks_present(); sparse_init(); setup_zero_page(); zone_sizes_init(); diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 6dc7c3b60ef6..0d282081dc1f 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -115,7 +115,6 @@ void __init paging_init(void) __load_psw_mask(psw.mask); kasan_free_early_identity(); - sparse_memory_present_with_active_regions(MAX_NUMNODES); sparse_init(); zone_dma_bits = 31; memset(max_zone_pfns, 0, sizeof(max_zone_pfns)); diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c index a86ce13f392c..613de8096335 100644 --- a/arch/sh/mm/init.c +++ b/arch/sh/mm/init.c @@ -241,12 +241,6 @@ static void __init do_init_bootmem(void) plat_mem_setup(); - for_each_memblock(memory, reg) { - int nid = memblock_get_region_node(reg); - - memory_present(nid, memblock_region_memory_base_pfn(reg), - memblock_region_memory_end_pfn(reg)); - } sparse_init(); } diff --git a/arch/sh/mm/numa.c b/arch/sh/mm/numa.c index f7e4439deb17..50f0dc1744d0 100644 --- a/arch/sh/mm/numa.c +++ b/arch/sh/mm/numa.c @@ -53,7 +53,4 @@ void __init setup_bootmem_node(int nid, unsigned long start, unsigned long end) /* It's up */ node_set_online(nid); - - /* Kick sparsemem */ - sparse_memory_present_with_active_regions(nid); } diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 02e6e5e0f106..fad6d3129904 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -1610,7 +1610,6 @@ static unsigned long __init bootmem_init(unsigned long phys_base) /* XXX cpu notifier XXX */ - sparse_memory_present_with_active_regions(MAX_NUMNODES); sparse_init(); return end_pfn; diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 8b4afad84f4a..4cb958419fb0 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -678,7 +678,6 @@ void __init initmem_init(void) #endif memblock_set_node(0, PHYS_ADDR_MAX, &memblock.memory, 0); - sparse_memory_present_with_active_regions(0); #ifdef CONFIG_FLATMEM max_mapnr = IS_ENABLED(CONFIG_HIGHMEM) ? highend_pfn : max_low_pfn; @@ -718,7 +717,6 @@ void __init paging_init(void) * NOTE: at this point the bootmem allocator is fully available. */ olpc_dt_build_devicetree(); - sparse_memory_present_with_active_regions(MAX_NUMNODES); sparse_init(); zone_sizes_init(); } diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 53e1d4f4ed9d..1c1209da55fe 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -817,7 +817,6 @@ void __init initmem_init(void) void __init paging_init(void) { - sparse_memory_present_with_active_regions(MAX_NUMNODES); sparse_init(); /* diff --git a/include/linux/mm.h b/include/linux/mm.h index cf7e4605ff3f..392016ce5878 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2382,9 +2382,6 @@ static inline unsigned long get_num_physpages(void) * for_each_valid_physical_page_range() * memblock_add_node(base, size, nid) * free_area_init(max_zone_pfns); - * - * sparse_memory_present_with_active_regions() calls memory_present() for - * each range when SPARSEMEM is enabled. */ void free_area_init(unsigned long *max_zone_pfn); unsigned long node_map_pfn_alignment(void); @@ -2395,7 +2392,6 @@ extern unsigned long absent_pages_in_range(unsigned long start_pfn, extern void get_pfn_range_for_nid(unsigned int nid, unsigned long *start_pfn, unsigned long *end_pfn); extern unsigned long find_min_pfn_with_active_regions(void); -extern void sparse_memory_present_with_active_regions(int nid); #ifndef CONFIG_NEED_MULTIPLE_NODES static inline int early_pfn_to_nid(unsigned long pfn) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index a3bd54139a30..2eef8afd3a0f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -839,18 +839,6 @@ static inline struct pglist_data *lruvec_pgdat(struct lruvec *lruvec) extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx); -#ifdef CONFIG_HAVE_MEMORY_PRESENT -void memory_present(int nid, unsigned long start, unsigned long end); -#else -static inline void memory_present(int nid, unsigned long start, unsigned long end) {} -#endif - -#if defined(CONFIG_SPARSEMEM) -void memblocks_present(void); -#else -static inline void memblocks_present(void) {} -#endif - #ifdef CONFIG_HAVE_MEMORYLESS_NODES int local_memory_node(int node_id); #else @@ -1407,8 +1395,6 @@ struct mminit_pfnnid_cache { #define early_pfn_valid(pfn) (1) #endif -void memory_present(int nid, unsigned long start, unsigned long end); - /* * If it is possible to have holes within a MAX_ORDER_NR_PAGES, then we * need to check pfn validity within that MAX_ORDER_NR_PAGES block. diff --git a/mm/Kconfig b/mm/Kconfig index d41f3fa7e923..6c974888f86f 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -88,13 +88,9 @@ config NEED_MULTIPLE_NODES def_bool y depends on DISCONTIGMEM || NUMA -config HAVE_MEMORY_PRESENT - def_bool y - depends on ARCH_HAVE_MEMORY_PRESENT || SPARSEMEM - # # SPARSEMEM_EXTREME (which is the default) does some bootmem -# allocations when memory_present() is called. If this cannot +# allocations when sparse_init() is called. If this cannot # be done on your architecture, select this option. However, # statically allocating the mem_section[] array can potentially # consume vast quantities of .bss, so be careful. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8d5d8526c2f3..f49de9e97bf2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6324,22 +6324,6 @@ void __meminit init_currently_empty_zone(struct zone *zone, zone->initialized = 1; } -/** - * sparse_memory_present_with_active_regions - Call memory_present for each active range - * @nid: The node to call memory_present for. If MAX_NUMNODES, all nodes will be used. - * - * If an architecture guarantees that all ranges registered contain no holes and may - * be freed, this function may be used instead of calling memory_present() manually. - */ -void __init sparse_memory_present_with_active_regions(int nid) -{ - unsigned long start_pfn, end_pfn; - int i, this_nid; - - for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, &this_nid) - memory_present(this_nid, start_pfn, end_pfn); -} - /** * get_pfn_range_for_nid - Return the start and end page frames for a node * @nid: The nid to return the range for. If MAX_NUMNODES, the min and max PFN are returned. diff --git a/mm/sparse.c b/mm/sparse.c index 1b5e0385f419..fcc3d176f1ea 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -249,7 +249,7 @@ void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages) #endif /* Record a memory area against a node. */ -void __init memory_present(int nid, unsigned long start, unsigned long end) +static void __init memory_present(int nid, unsigned long start, unsigned long end) { unsigned long pfn; @@ -285,11 +285,11 @@ void __init memory_present(int nid, unsigned long start, unsigned long end) } /* - * Mark all memblocks as present using memory_present(). This is a - * convenience function that is useful for a number of arches - * to mark all of the systems memory as present during initialization. + * Mark all memblocks as present using memory_present(). + * This is a convenience function that is useful to mark all of the systems + * memory as present during initialization. */ -void __init memblocks_present(void) +static void __init memblocks_present(void) { struct memblock_region *reg; @@ -574,9 +574,13 @@ failed: */ void __init sparse_init(void) { - unsigned long pnum_begin = first_present_section_nr(); - int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin)); - unsigned long pnum_end, map_count = 1; + unsigned long pnum_end, pnum_begin, map_count = 1; + int nid_begin; + + memblocks_present(); + + pnum_begin = first_present_section_nr(); + nid_begin = sparse_early_nid(__nr_to_section(pnum_begin)); /* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */ set_pageblock_order(); -- cgit v1.2.3