From 527f6750d92beb9c787d8aba48477b1e834d64e5 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Tue, 13 Oct 2020 16:47:51 -0700 Subject: kasan: remove mentions of unsupported Clang versions Since the kernel now requires at least Clang 10.0.1, remove any mention of old Clang versions and simplify the documentation. Signed-off-by: Marco Elver Signed-off-by: Nick Desaulniers Signed-off-by: Andrew Morton Reviewed-by: Andrey Konovalov Reviewed-by: Kees Cook Reviewed-by: Nathan Chancellor Cc: Fangrui Song Cc: Miguel Ojeda Cc: Sedat Dilek Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Masahiro Yamada Cc: Vincenzo Frascino Cc: Will Deacon Link: https://lkml.kernel.org/r/20200902225911.209899-7-ndesaulniers@google.com Signed-off-by: Linus Torvalds --- Documentation/dev-tools/kasan.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index 38fd5681fade..4abc84b1798c 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -13,10 +13,10 @@ KASAN uses compile-time instrumentation to insert validity checks before every memory access, and therefore requires a compiler version that supports that. Generic KASAN is supported in both GCC and Clang. With GCC it requires version -8.3.0 or later. With Clang it requires version 7.0.0 or later, but detection of +8.3.0 or later. Any supported Clang version is compatible, but detection of out-of-bounds accesses for global variables is only supported since Clang 11. -Tag-based KASAN is only supported in Clang and requires version 7.0.0 or later. +Tag-based KASAN is only supported in Clang. Currently generic KASAN is supported for the x86_64, arm64, xtensa, s390 and riscv architectures, and tag-based KASAN is supported only for arm64. -- cgit v1.2.3 From eb38f37c3cee08a0197bdc7bbb9b4e02e40e2300 Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Tue, 13 Oct 2020 16:48:05 -0700 Subject: kbuild: doc: describe proper script invocation During an investigation to fix up the execute bits of scripts in the kernel repository, Andrew Morton and Kees Cook pointed out that the execute bit should not matter, and that build scripts cannot rely on that. Kees could not point to any documentation, though. Masahiro Yamada explained the convention of setting execute bits to make it easier for manual script invocation. Provide some basic documentation how the build shall invoke scripts, such that the execute bits do not matter, and acknowledge that execute bits are useful nonetheless. This serves as reference for further clean-up patches in the future. Suggested-by: Andrew Morton Suggested-by: Kees Cook Signed-off-by: Lukas Bulwahn Signed-off-by: Andrew Morton Cc: Masahiro Yamada Cc: Michal Marek Cc: Jonathan Corbet Cc: Ujjwal Kumar Cc: Lukas Bulwahn Link: https://lore.kernel.org/lkml/20200830174409.c24c3f67addcce0cea9a9d4c@linux-foundation.org/ Link: https://lore.kernel.org/lkml/202008271102.FEB906C88@keescook/ Link: https://lore.kernel.org/linux-kbuild/CAK7LNAQdrvMkDA6ApDJCGr+5db8SiPo=G+p8EiOvnnGvEN80gA@mail.gmail.com/ Link: https://lkml.kernel.org/r/20201001075723.24246-1-lukas.bulwahn@gmail.com Signed-off-by: Linus Torvalds --- Documentation/kbuild/makefiles.rst | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kbuild/makefiles.rst b/Documentation/kbuild/makefiles.rst index 58d513a0fa95..0d5dd5413af0 100644 --- a/Documentation/kbuild/makefiles.rst +++ b/Documentation/kbuild/makefiles.rst @@ -21,6 +21,7 @@ This document describes the Linux kernel Makefiles. --- 3.10 Special Rules --- 3.11 $(CC) support functions --- 3.12 $(LD) support functions + --- 3.13 Script Invocation === 4 Host Program support --- 4.1 Simple Host Program @@ -605,6 +606,25 @@ more details, with real examples. #Makefile LDFLAGS_vmlinux += $(call ld-option, -X) +3.13 Script invocation +---------------------- + + Make rules may invoke scripts to build the kernel. The rules shall + always provide the appropriate interpreter to execute the script. They + shall not rely on the execute bits being set, and shall not invoke the + script directly. For the convenience of manual script invocation, such + as invoking ./scripts/checkpatch.pl, it is recommended to set execute + bits on the scripts nonetheless. + + Kbuild provides variables $(CONFIG_SHELL), $(AWK), $(PERL), + $(PYTHON) and $(PYTHON3) to refer to interpreters for the respective + scripts. + + Example:: + + #Makefile + cmd_depmod = $(CONFIG_SHELL) $(srctree)/scripts/depmod.sh $(DEPMOD) \ + $(KERNELRELEASE) 4 Host Program support ====================== -- cgit v1.2.3 From 1abbef4f51724fb11f09adf0e75275f7cb422a8a Mon Sep 17 00:00:00 2001 From: Hui Su Date: Tue, 13 Oct 2020 16:48:53 -0700 Subject: mm,kmemleak-test.c: move kmemleak-test.c to samples dir kmemleak-test.c is just a kmemleak test module, which also can not be used as a built-in kernel module. Thus, i think it may should not be in mm dir, and move the kmemleak-test.c to samples/kmemleak/kmemleak-test.c. Fix the spelling of built-in by the way. Signed-off-by: Hui Su Signed-off-by: Andrew Morton Cc: Catalin Marinas Cc: Jonathan Corbet Cc: Mauro Carvalho Chehab Cc: David S. Miller Cc: Rob Herring Cc: Masahiro Yamada Cc: Sam Ravnborg Cc: Josh Poimboeuf Cc: Steven Rostedt (VMware) Cc: Miguel Ojeda Cc: Divya Indi Cc: Tomas Winkler Cc: David Howells Link: https://lkml.kernel.org/r/20200925183729.GA172837@rlk Signed-off-by: Linus Torvalds --- Documentation/dev-tools/kmemleak.rst | 2 +- MAINTAINERS | 2 +- mm/Makefile | 1 - mm/kmemleak-test.c | 99 ------------------------------------ samples/Makefile | 1 + samples/kmemleak/Makefile | 3 ++ samples/kmemleak/kmemleak-test.c | 99 ++++++++++++++++++++++++++++++++++++ 7 files changed, 105 insertions(+), 102 deletions(-) delete mode 100644 mm/kmemleak-test.c create mode 100644 samples/kmemleak/Makefile create mode 100644 samples/kmemleak/kmemleak-test.c (limited to 'Documentation') diff --git a/Documentation/dev-tools/kmemleak.rst b/Documentation/dev-tools/kmemleak.rst index a41a2d238af2..1c935f41cd3a 100644 --- a/Documentation/dev-tools/kmemleak.rst +++ b/Documentation/dev-tools/kmemleak.rst @@ -229,7 +229,7 @@ Testing with kmemleak-test To check if you have all set up to use kmemleak, you can use the kmemleak-test module, a module that deliberately leaks memory. Set CONFIG_DEBUG_KMEMLEAK_TEST -as module (it can't be used as bult-in) and boot the kernel with kmemleak +as module (it can't be used as built-in) and boot the kernel with kmemleak enabled. Load the module and perform a scan with:: # modprobe kmemleak-test diff --git a/MAINTAINERS b/MAINTAINERS index 71291bb26a07..42f1230d1add 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9727,8 +9727,8 @@ M: Catalin Marinas S: Maintained F: Documentation/dev-tools/kmemleak.rst F: include/linux/kmemleak.h -F: mm/kmemleak-test.c F: mm/kmemleak.c +F: samples/kmemleak/kmemleak-test.c KMOD KERNEL MODULE LOADER - USERMODE HELPER M: Luis Chamberlain diff --git a/mm/Makefile b/mm/Makefile index d5649f1c12c0..d73aed0fc99c 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -94,7 +94,6 @@ obj-$(CONFIG_GUP_BENCHMARK) += gup_benchmark.o obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o -obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o obj-$(CONFIG_DEBUG_VM_PGTABLE) += debug_vm_pgtable.o obj-$(CONFIG_PAGE_OWNER) += page_owner.o diff --git a/mm/kmemleak-test.c b/mm/kmemleak-test.c deleted file mode 100644 index e19279ff6aa3..000000000000 --- a/mm/kmemleak-test.c +++ /dev/null @@ -1,99 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * mm/kmemleak-test.c - * - * Copyright (C) 2008 ARM Limited - * Written by Catalin Marinas - */ - -#define pr_fmt(fmt) "kmemleak: " fmt - -#include -#include -#include -#include -#include -#include -#include -#include - -#include - -struct test_node { - long header[25]; - struct list_head list; - long footer[25]; -}; - -static LIST_HEAD(test_list); -static DEFINE_PER_CPU(void *, kmemleak_test_pointer); - -/* - * Some very simple testing. This function needs to be extended for - * proper testing. - */ -static int __init kmemleak_test_init(void) -{ - struct test_node *elem; - int i; - - pr_info("Kmemleak testing\n"); - - /* make some orphan objects */ - pr_info("kmalloc(32) = %p\n", kmalloc(32, GFP_KERNEL)); - pr_info("kmalloc(32) = %p\n", kmalloc(32, GFP_KERNEL)); - pr_info("kmalloc(1024) = %p\n", kmalloc(1024, GFP_KERNEL)); - pr_info("kmalloc(1024) = %p\n", kmalloc(1024, GFP_KERNEL)); - pr_info("kmalloc(2048) = %p\n", kmalloc(2048, GFP_KERNEL)); - pr_info("kmalloc(2048) = %p\n", kmalloc(2048, GFP_KERNEL)); - pr_info("kmalloc(4096) = %p\n", kmalloc(4096, GFP_KERNEL)); - pr_info("kmalloc(4096) = %p\n", kmalloc(4096, GFP_KERNEL)); -#ifndef CONFIG_MODULES - pr_info("kmem_cache_alloc(files_cachep) = %p\n", - kmem_cache_alloc(files_cachep, GFP_KERNEL)); - pr_info("kmem_cache_alloc(files_cachep) = %p\n", - kmem_cache_alloc(files_cachep, GFP_KERNEL)); -#endif - pr_info("vmalloc(64) = %p\n", vmalloc(64)); - pr_info("vmalloc(64) = %p\n", vmalloc(64)); - pr_info("vmalloc(64) = %p\n", vmalloc(64)); - pr_info("vmalloc(64) = %p\n", vmalloc(64)); - pr_info("vmalloc(64) = %p\n", vmalloc(64)); - - /* - * Add elements to a list. They should only appear as orphan - * after the module is removed. - */ - for (i = 0; i < 10; i++) { - elem = kzalloc(sizeof(*elem), GFP_KERNEL); - pr_info("kzalloc(sizeof(*elem)) = %p\n", elem); - if (!elem) - return -ENOMEM; - INIT_LIST_HEAD(&elem->list); - list_add_tail(&elem->list, &test_list); - } - - for_each_possible_cpu(i) { - per_cpu(kmemleak_test_pointer, i) = kmalloc(129, GFP_KERNEL); - pr_info("kmalloc(129) = %p\n", - per_cpu(kmemleak_test_pointer, i)); - } - - return 0; -} -module_init(kmemleak_test_init); - -static void __exit kmemleak_test_exit(void) -{ - struct test_node *elem, *tmp; - - /* - * Remove the list elements without actually freeing the - * memory. - */ - list_for_each_entry_safe(elem, tmp, &test_list, list) - list_del(&elem->list); -} -module_exit(kmemleak_test_exit); - -MODULE_LICENSE("GPL"); diff --git a/samples/Makefile b/samples/Makefile index 754553597581..c3392a595e4b 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -28,3 +28,4 @@ subdir-$(CONFIG_SAMPLE_VFS) += vfs obj-$(CONFIG_SAMPLE_INTEL_MEI) += mei/ subdir-$(CONFIG_SAMPLE_WATCHDOG) += watchdog subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue +obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak/ diff --git a/samples/kmemleak/Makefile b/samples/kmemleak/Makefile new file mode 100644 index 000000000000..16b6132c540c --- /dev/null +++ b/samples/kmemleak/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only + +obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o diff --git a/samples/kmemleak/kmemleak-test.c b/samples/kmemleak/kmemleak-test.c new file mode 100644 index 000000000000..7b476eb8285f --- /dev/null +++ b/samples/kmemleak/kmemleak-test.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * samples/kmemleak/kmemleak-test.c + * + * Copyright (C) 2008 ARM Limited + * Written by Catalin Marinas + */ + +#define pr_fmt(fmt) "kmemleak: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +struct test_node { + long header[25]; + struct list_head list; + long footer[25]; +}; + +static LIST_HEAD(test_list); +static DEFINE_PER_CPU(void *, kmemleak_test_pointer); + +/* + * Some very simple testing. This function needs to be extended for + * proper testing. + */ +static int __init kmemleak_test_init(void) +{ + struct test_node *elem; + int i; + + pr_info("Kmemleak testing\n"); + + /* make some orphan objects */ + pr_info("kmalloc(32) = %p\n", kmalloc(32, GFP_KERNEL)); + pr_info("kmalloc(32) = %p\n", kmalloc(32, GFP_KERNEL)); + pr_info("kmalloc(1024) = %p\n", kmalloc(1024, GFP_KERNEL)); + pr_info("kmalloc(1024) = %p\n", kmalloc(1024, GFP_KERNEL)); + pr_info("kmalloc(2048) = %p\n", kmalloc(2048, GFP_KERNEL)); + pr_info("kmalloc(2048) = %p\n", kmalloc(2048, GFP_KERNEL)); + pr_info("kmalloc(4096) = %p\n", kmalloc(4096, GFP_KERNEL)); + pr_info("kmalloc(4096) = %p\n", kmalloc(4096, GFP_KERNEL)); +#ifndef CONFIG_MODULES + pr_info("kmem_cache_alloc(files_cachep) = %p\n", + kmem_cache_alloc(files_cachep, GFP_KERNEL)); + pr_info("kmem_cache_alloc(files_cachep) = %p\n", + kmem_cache_alloc(files_cachep, GFP_KERNEL)); +#endif + pr_info("vmalloc(64) = %p\n", vmalloc(64)); + pr_info("vmalloc(64) = %p\n", vmalloc(64)); + pr_info("vmalloc(64) = %p\n", vmalloc(64)); + pr_info("vmalloc(64) = %p\n", vmalloc(64)); + pr_info("vmalloc(64) = %p\n", vmalloc(64)); + + /* + * Add elements to a list. They should only appear as orphan + * after the module is removed. + */ + for (i = 0; i < 10; i++) { + elem = kzalloc(sizeof(*elem), GFP_KERNEL); + pr_info("kzalloc(sizeof(*elem)) = %p\n", elem); + if (!elem) + return -ENOMEM; + INIT_LIST_HEAD(&elem->list); + list_add_tail(&elem->list, &test_list); + } + + for_each_possible_cpu(i) { + per_cpu(kmemleak_test_pointer, i) = kmalloc(129, GFP_KERNEL); + pr_info("kmalloc(129) = %p\n", + per_cpu(kmemleak_test_pointer, i)); + } + + return 0; +} +module_init(kmemleak_test_init); + +static void __exit kmemleak_test_exit(void) +{ + struct test_node *elem, *tmp; + + /* + * Remove the list elements without actually freeing the + * memory. + */ + list_for_each_entry_safe(elem, tmp, &test_list, list) + list_del(&elem->list); +} +module_exit(kmemleak_test_exit); + +MODULE_LICENSE("GPL"); -- cgit v1.2.3 From 3b0d31011d39759e3ba7214f75f77bb31983b5a4 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Tue, 13 Oct 2020 16:49:02 -0700 Subject: x86/numa: add 'nohmat' option MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Disable parsing of the HMAT for debug, to workaround broken platform instances, or cases where it is otherwise not wanted. [rdunlap@infradead.org: fix build when CONFIG_ACPI is not set] Link: https://lkml.kernel.org/r/70e5ee34-9809-a997-7b49-499e4be61307@infradead.org Signed-off-by: Dan Williams Signed-off-by: Randy Dunlap Signed-off-by: Andrew Morton Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Ard Biesheuvel Cc: Ard Biesheuvel Cc: Benjamin Herrenschmidt Cc: Ben Skeggs Cc: Brice Goglin Cc: Catalin Marinas Cc: Daniel Vetter Cc: Dave Jiang Cc: David Airlie Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Ira Weiny Cc: Jason Gunthorpe Cc: Jeff Moyer Cc: Jia He Cc: Joao Martins Cc: Jonathan Cameron Cc: Michael Ellerman Cc: Mike Rapoport Cc: Paul Mackerras Cc: Pavel Tatashin Cc: "Rafael J. Wysocki" Cc: Tom Lendacky Cc: Vishal Verma Cc: Wei Yang Cc: Will Deacon Cc: Bjorn Helgaas Cc: Boris Ostrovsky Cc: Hulk Robot Cc: Jason Yan Cc: "Jérôme Glisse" Cc: Juergen Gross Cc: kernel test robot Cc: Randy Dunlap Cc: Stefano Stabellini Cc: Vivek Goyal Link: https://lkml.kernel.org/r/159643095540.4062302.732962081968036212.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds --- Documentation/x86/x86_64/boot-options.rst | 4 ++++ arch/x86/mm/numa.c | 2 ++ drivers/acpi/numa/hmat.c | 8 +++++++- include/acpi/acpi_numa.h | 8 ++++++++ include/linux/acpi.h | 2 ++ 5 files changed, 23 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst index 2b98efb5ba7f..324cefff92e7 100644 --- a/Documentation/x86/x86_64/boot-options.rst +++ b/Documentation/x86/x86_64/boot-options.rst @@ -173,6 +173,10 @@ NUMA numa=noacpi Don't parse the SRAT table for NUMA setup + numa=nohmat + Don't parse the HMAT table for NUMA setup, or soft-reserved memory + partitioning. + numa=fake=[MG] If given as a memory unit, fills all system RAM with nodes of size interleaved over physical nodes. diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 87c52822cc44..f3805bbaa784 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -41,6 +41,8 @@ static __init int numa_setup(char *opt) return numa_emu_cmdline(opt + 5); if (!strncmp(opt, "noacpi", 6)) disable_srat(); + if (!strncmp(opt, "nohmat", 6)) + disable_hmat(); return 0; } early_param("numa", numa_setup); diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c index 2c32cfb72370..a12e36a12618 100644 --- a/drivers/acpi/numa/hmat.c +++ b/drivers/acpi/numa/hmat.c @@ -26,6 +26,12 @@ #include static u8 hmat_revision; +static int hmat_disable __initdata; + +void __init disable_hmat(void) +{ + hmat_disable = 1; +} static LIST_HEAD(targets); static LIST_HEAD(initiators); @@ -814,7 +820,7 @@ static __init int hmat_init(void) enum acpi_hmat_type i; acpi_status status; - if (srat_disabled()) + if (srat_disabled() || hmat_disable) return 0; status = acpi_get_table(ACPI_SIG_SRAT, 0, &tbl); diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h index 8784183b2204..0e9302285f14 100644 --- a/include/acpi/acpi_numa.h +++ b/include/acpi/acpi_numa.h @@ -27,4 +27,12 @@ static inline void disable_srat(void) { } #endif /* CONFIG_ACPI_NUMA */ + +#ifdef CONFIG_ACPI_HMAT +extern void disable_hmat(void); +#else /* CONFIG_ACPI_HMAT */ +static inline void disable_hmat(void) +{ +} +#endif /* CONFIG_ACPI_HMAT */ #endif /* __ACP_NUMA_H */ diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 64ae25c59d55..cfa8c0015863 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -709,6 +709,8 @@ static inline u64 acpi_arch_get_root_pointer(void) #define ACPI_HANDLE_FWNODE(fwnode) (NULL) #define ACPI_DEVICE_CLASS(_cls, _msk) .cls = (0), .cls_msk = (0), +#include + struct fwnode_handle; static inline bool acpi_dev_found(const char *hid) -- cgit v1.2.3 From 5f9a4f4a709608fc15197368464a6c8ed4e3630a Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 13 Oct 2020 16:52:59 -0700 Subject: mm: memcontrol: add the missing numa_stat interface for cgroup v2 In the cgroup v1, we have a numa_stat interface. This is useful for providing visibility into the numa locality information within an memcg since the pages are allowed to be allocated from any physical node. One of the use cases is evaluating application performance by combining this information with the application's CPU allocation. But the cgroup v2 does not. So this patch adds the missing information. Suggested-by: Shakeel Butt Signed-off-by: Muchun Song Signed-off-by: Andrew Morton Reviewed-by: Shakeel Butt Cc: Zefan Li Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Michal Hocko Cc: Vladimir Davydov Cc: Roman Gushchin Cc: Randy Dunlap Link: https://lkml.kernel.org/r/20200916100030.71698-2-songmuchun@bytedance.com Signed-off-by: Linus Torvalds --- Documentation/admin-guide/cgroup-v2.rst | 69 +++++++++---- mm/memcontrol.c | 170 +++++++++++++++++++++----------- 2 files changed, 159 insertions(+), 80 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index baa07b30845e..608d7c279396 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1259,6 +1259,10 @@ PAGE_SIZE multiple when read back. can show up in the middle. Don't rely on items remaining in a fixed position; use the keys to look up specific values! + If the entry has no per-node counter(or not show in the + mempry.numa_stat). We use 'npn'(non-per-node) as the tag + to indicate that it will not show in the mempry.numa_stat. + anon Amount of memory used in anonymous mappings such as brk(), sbrk(), and mmap(MAP_ANONYMOUS) @@ -1270,15 +1274,11 @@ PAGE_SIZE multiple when read back. kernel_stack Amount of memory allocated to kernel stacks. - slab - Amount of memory used for storing in-kernel data - structures. - - percpu + percpu(npn) Amount of memory used for storing per-cpu kernel data structures. - sock + sock(npn) Amount of memory used in network transmission buffers shmem @@ -1318,11 +1318,9 @@ PAGE_SIZE multiple when read back. Part of "slab" that cannot be reclaimed on memory pressure. - pgfault - Total number of page faults incurred - - pgmajfault - Number of major page faults incurred + slab(npn) + Amount of memory used for storing in-kernel data + structures. workingset_refault_anon Number of refaults of previously evicted anonymous pages. @@ -1348,37 +1346,68 @@ PAGE_SIZE multiple when read back. workingset_nodereclaim Number of times a shadow node has been reclaimed - pgrefill + pgfault(npn) + Total number of page faults incurred + + pgmajfault(npn) + Number of major page faults incurred + + pgrefill(npn) Amount of scanned pages (in an active LRU list) - pgscan + pgscan(npn) Amount of scanned pages (in an inactive LRU list) - pgsteal + pgsteal(npn) Amount of reclaimed pages - pgactivate + pgactivate(npn) Amount of pages moved to the active LRU list - pgdeactivate + pgdeactivate(npn) Amount of pages moved to the inactive LRU list - pglazyfree + pglazyfree(npn) Amount of pages postponed to be freed under memory pressure - pglazyfreed + pglazyfreed(npn) Amount of reclaimed lazyfree pages - thp_fault_alloc + thp_fault_alloc(npn) Number of transparent hugepages which were allocated to satisfy a page fault. This counter is not present when CONFIG_TRANSPARENT_HUGEPAGE is not set. - thp_collapse_alloc + thp_collapse_alloc(npn) Number of transparent hugepages which were allocated to allow collapsing an existing range of pages. This counter is not present when CONFIG_TRANSPARENT_HUGEPAGE is not set. + memory.numa_stat + A read-only nested-keyed file which exists on non-root cgroups. + + This breaks down the cgroup's memory footprint into different + types of memory, type-specific details, and other information + per node on the state of the memory management system. + + This is useful for providing visibility into the NUMA locality + information within an memcg since the pages are allowed to be + allocated from any physical node. One of the use case is evaluating + application performance by combining this information with the + application's CPU allocation. + + All memory amounts are in bytes. + + The output format of memory.numa_stat is:: + + type N0= N1= ... + + The entries are ordered to be human readable, and new entries + can show up in the middle. Don't rely on items remaining in a + fixed position; use the keys to look up specific values! + + The entries can refer to the memory.stat. + memory.swap.current A read-only single value file which exists on non-root cgroups. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a0bfc92804b7..e9fa32a943c5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1448,6 +1448,70 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) return false; } +struct memory_stat { + const char *name; + unsigned int ratio; + unsigned int idx; +}; + +static struct memory_stat memory_stats[] = { + { "anon", PAGE_SIZE, NR_ANON_MAPPED }, + { "file", PAGE_SIZE, NR_FILE_PAGES }, + { "kernel_stack", 1024, NR_KERNEL_STACK_KB }, + { "percpu", 1, MEMCG_PERCPU_B }, + { "sock", PAGE_SIZE, MEMCG_SOCK }, + { "shmem", PAGE_SIZE, NR_SHMEM }, + { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED }, + { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY }, + { "file_writeback", PAGE_SIZE, NR_WRITEBACK }, +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* + * The ratio will be initialized in memory_stats_init(). Because + * on some architectures, the macro of HPAGE_PMD_SIZE is not + * constant(e.g. powerpc). + */ + { "anon_thp", 0, NR_ANON_THPS }, +#endif + { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON }, + { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON }, + { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE }, + { "active_file", PAGE_SIZE, NR_ACTIVE_FILE }, + { "unevictable", PAGE_SIZE, NR_UNEVICTABLE }, + + /* + * Note: The slab_reclaimable and slab_unreclaimable must be + * together and slab_reclaimable must be in front. + */ + { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B }, + { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B }, + + /* The memory events */ + { "workingset_refault_anon", 1, WORKINGSET_REFAULT_ANON }, + { "workingset_refault_file", 1, WORKINGSET_REFAULT_FILE }, + { "workingset_activate_anon", 1, WORKINGSET_ACTIVATE_ANON }, + { "workingset_activate_file", 1, WORKINGSET_ACTIVATE_FILE }, + { "workingset_restore_anon", 1, WORKINGSET_RESTORE_ANON }, + { "workingset_restore_file", 1, WORKINGSET_RESTORE_FILE }, + { "workingset_nodereclaim", 1, WORKINGSET_NODERECLAIM }, +}; + +static int __init memory_stats_init(void) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (memory_stats[i].idx == NR_ANON_THPS) + memory_stats[i].ratio = HPAGE_PMD_SIZE; +#endif + VM_BUG_ON(!memory_stats[i].ratio); + VM_BUG_ON(memory_stats[i].idx >= MEMCG_NR_STAT); + } + + return 0; +} +pure_initcall(memory_stats_init); + static char *memory_stat_format(struct mem_cgroup *memcg) { struct seq_buf s; @@ -1468,52 +1532,19 @@ static char *memory_stat_format(struct mem_cgroup *memcg) * Current memory state: */ - seq_buf_printf(&s, "anon %llu\n", - (u64)memcg_page_state(memcg, NR_ANON_MAPPED) * - PAGE_SIZE); - seq_buf_printf(&s, "file %llu\n", - (u64)memcg_page_state(memcg, NR_FILE_PAGES) * - PAGE_SIZE); - seq_buf_printf(&s, "kernel_stack %llu\n", - (u64)memcg_page_state(memcg, NR_KERNEL_STACK_KB) * - 1024); - seq_buf_printf(&s, "slab %llu\n", - (u64)(memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B) + - memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE_B))); - seq_buf_printf(&s, "percpu %llu\n", - (u64)memcg_page_state(memcg, MEMCG_PERCPU_B)); - seq_buf_printf(&s, "sock %llu\n", - (u64)memcg_page_state(memcg, MEMCG_SOCK) * - PAGE_SIZE); - - seq_buf_printf(&s, "shmem %llu\n", - (u64)memcg_page_state(memcg, NR_SHMEM) * - PAGE_SIZE); - seq_buf_printf(&s, "file_mapped %llu\n", - (u64)memcg_page_state(memcg, NR_FILE_MAPPED) * - PAGE_SIZE); - seq_buf_printf(&s, "file_dirty %llu\n", - (u64)memcg_page_state(memcg, NR_FILE_DIRTY) * - PAGE_SIZE); - seq_buf_printf(&s, "file_writeback %llu\n", - (u64)memcg_page_state(memcg, NR_WRITEBACK) * - PAGE_SIZE); - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - seq_buf_printf(&s, "anon_thp %llu\n", - (u64)memcg_page_state(memcg, NR_ANON_THPS) * - HPAGE_PMD_SIZE); -#endif + for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { + u64 size; - for (i = 0; i < NR_LRU_LISTS; i++) - seq_buf_printf(&s, "%s %llu\n", lru_list_name(i), - (u64)memcg_page_state(memcg, NR_LRU_BASE + i) * - PAGE_SIZE); + size = memcg_page_state(memcg, memory_stats[i].idx); + size *= memory_stats[i].ratio; + seq_buf_printf(&s, "%s %llu\n", memory_stats[i].name, size); - seq_buf_printf(&s, "slab_reclaimable %llu\n", - (u64)memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B)); - seq_buf_printf(&s, "slab_unreclaimable %llu\n", - (u64)memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE_B)); + if (unlikely(memory_stats[i].idx == NR_SLAB_UNRECLAIMABLE_B)) { + size = memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B) + + memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE_B); + seq_buf_printf(&s, "slab %llu\n", size); + } + } /* Accumulated memory events */ @@ -1521,22 +1552,6 @@ static char *memory_stat_format(struct mem_cgroup *memcg) memcg_events(memcg, PGFAULT)); seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGMAJFAULT), memcg_events(memcg, PGMAJFAULT)); - - seq_buf_printf(&s, "workingset_refault_anon %lu\n", - memcg_page_state(memcg, WORKINGSET_REFAULT_ANON)); - seq_buf_printf(&s, "workingset_refault_file %lu\n", - memcg_page_state(memcg, WORKINGSET_REFAULT_FILE)); - seq_buf_printf(&s, "workingset_activate_anon %lu\n", - memcg_page_state(memcg, WORKINGSET_ACTIVATE_ANON)); - seq_buf_printf(&s, "workingset_activate_file %lu\n", - memcg_page_state(memcg, WORKINGSET_ACTIVATE_FILE)); - seq_buf_printf(&s, "workingset_restore_anon %lu\n", - memcg_page_state(memcg, WORKINGSET_RESTORE_ANON)); - seq_buf_printf(&s, "workingset_restore_file %lu\n", - memcg_page_state(memcg, WORKINGSET_RESTORE_FILE)); - seq_buf_printf(&s, "workingset_nodereclaim %lu\n", - memcg_page_state(memcg, WORKINGSET_NODERECLAIM)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGREFILL), memcg_events(memcg, PGREFILL)); seq_buf_printf(&s, "pgscan %lu\n", @@ -6374,6 +6389,35 @@ static int memory_stat_show(struct seq_file *m, void *v) return 0; } +#ifdef CONFIG_NUMA +static int memory_numa_stat_show(struct seq_file *m, void *v) +{ + int i; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { + int nid; + + if (memory_stats[i].idx >= NR_VM_NODE_STAT_ITEMS) + continue; + + seq_printf(m, "%s", memory_stats[i].name); + for_each_node_state(nid, N_MEMORY) { + u64 size; + struct lruvec *lruvec; + + lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(nid)); + size = lruvec_page_state(lruvec, memory_stats[i].idx); + size *= memory_stats[i].ratio; + seq_printf(m, " N%d=%llu", nid, size); + } + seq_putc(m, '\n'); + } + + return 0; +} +#endif + static int memory_oom_group_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_seq(m); @@ -6451,6 +6495,12 @@ static struct cftype memory_files[] = { .name = "stat", .seq_show = memory_stat_show, }, +#ifdef CONFIG_NUMA + { + .name = "numa_stat", + .seq_show = memory_numa_stat_show, + }, +#endif { .name = "oom.group", .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, -- cgit v1.2.3 From 25356cfad69cd67cfaf0e703df6f888e9727b947 Mon Sep 17 00:00:00 2001 From: Alexander Gordeev Date: Tue, 13 Oct 2020 16:54:54 -0700 Subject: docs/vm: fix 'mm_count' vs 'mm_users' counter confusion In the context of the anonymous address space lifespan description the 'mm_users' reference counter is confused with 'mm_count'. I.e a "zombie" mm gets released when "mm_count" becomes zero, not "mm_users". Signed-off-by: Alexander Gordeev Signed-off-by: Andrew Morton Cc: Jonathan Corbet Link: https://lkml.kernel.org/r/1597040695-32633-1-git-send-email-agordeev@linux.ibm.com Signed-off-by: Linus Torvalds --- Documentation/vm/active_mm.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/vm/active_mm.rst b/Documentation/vm/active_mm.rst index c84471b180f8..6f8269c284ed 100644 --- a/Documentation/vm/active_mm.rst +++ b/Documentation/vm/active_mm.rst @@ -64,7 +64,7 @@ Active MM actually get cases where you have a address space that is _only_ used by lazy users. That is often a short-lived state, because once that thread gets scheduled away in favour of a real thread, the "zombie" mm gets - released because "mm_users" becomes zero. + released because "mm_count" becomes zero. Also, a new rule is that _nobody_ ever has "init_mm" as a real MM any more. "init_mm" should be considered just a "lazy context when no other -- cgit v1.2.3 From 9ab5be976898860f70f67257be725b891ded10ea Mon Sep 17 00:00:00 2001 From: Patricia Alfonso Date: Tue, 13 Oct 2020 16:55:09 -0700 Subject: KASAN: Testing Documentation Include documentation on how to test KASAN using CONFIG_TEST_KASAN_KUNIT and CONFIG_TEST_KASAN_MODULE. Signed-off-by: Patricia Alfonso Signed-off-by: David Gow Signed-off-by: Andrew Morton Tested-by: Andrey Konovalov Reviewed-by: Andrey Konovalov Reviewed-by: Dmitry Vyukov Acked-by: Brendan Higgins Cc: Andrey Ryabinin Cc: Ingo Molnar Cc: Juri Lelli Cc: Peter Zijlstra Cc: Shuah Khan Cc: Vincent Guittot Link: https://lkml.kernel.org/r/20200915035828.570483-5-davidgow@google.com Link: https://lkml.kernel.org/r/20200910070331.3358048-5-davidgow@google.com Signed-off-by: Linus Torvalds --- Documentation/dev-tools/kasan.rst | 70 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) (limited to 'Documentation') diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index 4abc84b1798c..c09c9ca2ff1c 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -281,3 +281,73 @@ unmapped. This will require changes in arch-specific code. This allows ``VMAP_STACK`` support on x86, and can simplify support of architectures that do not have a fixed module region. + +CONFIG_KASAN_KUNIT_TEST & CONFIG_TEST_KASAN_MODULE +-------------------------------------------------- + +``CONFIG_KASAN_KUNIT_TEST`` utilizes the KUnit Test Framework for testing. +This means each test focuses on a small unit of functionality and +there are a few ways these tests can be run. + +Each test will print the KASAN report if an error is detected and then +print the number of the test and the status of the test: + +pass:: + + ok 28 - kmalloc_double_kzfree +or, if kmalloc failed:: + + # kmalloc_large_oob_right: ASSERTION FAILED at lib/test_kasan.c:163 + Expected ptr is not null, but is + not ok 4 - kmalloc_large_oob_right +or, if a KASAN report was expected, but not found:: + + # kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:629 + Expected kasan_data->report_expected == kasan_data->report_found, but + kasan_data->report_expected == 1 + kasan_data->report_found == 0 + not ok 28 - kmalloc_double_kzfree + +All test statuses are tracked as they run and an overall status will +be printed at the end:: + + ok 1 - kasan + +or:: + + not ok 1 - kasan + +(1) Loadable Module +~~~~~~~~~~~~~~~~~~~~ + +With ``CONFIG_KUNIT`` enabled, ``CONFIG_KASAN_KUNIT_TEST`` can be built as +a loadable module and run on any architecture that supports KASAN +using something like insmod or modprobe. The module is called ``test_kasan``. + +(2) Built-In +~~~~~~~~~~~~~ + +With ``CONFIG_KUNIT`` built-in, ``CONFIG_KASAN_KUNIT_TEST`` can be built-in +on any architecure that supports KASAN. These and any other KUnit +tests enabled will run and print the results at boot as a late-init +call. + +(3) Using kunit_tool +~~~~~~~~~~~~~~~~~~~~~ + +With ``CONFIG_KUNIT`` and ``CONFIG_KASAN_KUNIT_TEST`` built-in, we can also +use kunit_tool to see the results of these along with other KUnit +tests in a more readable way. This will not print the KASAN reports +of tests that passed. Use `KUnit documentation `_ for more up-to-date +information on kunit_tool. + +.. _KUnit: https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html + +``CONFIG_TEST_KASAN_MODULE`` is a set of KASAN tests that could not be +converted to KUnit. These tests can be run only as a module with +``CONFIG_TEST_KASAN_MODULE`` built as a loadable module and +``CONFIG_KASAN`` built-in. The type of error expected and the +function being run is printed before the expression expected to give +an error. Then the error is printed, if found, and that test +should be interpretted to pass only if the error was the one expected +by the test. -- cgit v1.2.3 From 540809be526724cfa8703d08b029d323fc123a41 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Tue, 13 Oct 2020 16:56:17 -0700 Subject: doc/vm: fix typo in the hugetlb admin documentation Change 'pecify' to 'Specify'. Signed-off-by: Baoquan He Signed-off-by: Andrew Morton Reviewed-by: Mike Kravetz Reviewed-by: David Hildenbrand Cc: Anshuman Khandual Link: https://lkml.kernel.org/r/20200723032248.24772-4-bhe@redhat.com Signed-off-by: Linus Torvalds --- Documentation/admin-guide/mm/hugetlbpage.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst index 015a5f7d7854..f7b1c7462991 100644 --- a/Documentation/admin-guide/mm/hugetlbpage.rst +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -131,7 +131,7 @@ hugepages parameter is preceded by an invalid hugepagesz parameter, it will be ignored. default_hugepagesz - pecify the default huge page size. This parameter can + Specify the default huge page size. This parameter can only be specified once on the command line. default_hugepagesz can optionally be followed by the hugepages parameter to preallocate a specific number of huge pages of default size. The number of default -- cgit v1.2.3