From ad98b6023786647660f438ea6f8fe4e3ce923a2e Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Tue, 11 Sep 2018 19:24:11 +0300 Subject: docs/boot-time-mm: fix kernel-doc directive for including all but DOC: There were several rounds of the patches that enabled "functions" directive with no parameters in kerneldoc.py to allow including all the kernel-doc comments except the DOC: sections. Yet, the boot-time-mm.rst sneaked in with the older version of that directive and was not updated. Update it now. Signed-off-by: Mike Rapoport Tested-by: Randy Dunlap Signed-off-by: Jonathan Corbet --- Documentation/core-api/boot-time-mm.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/core-api') diff --git a/Documentation/core-api/boot-time-mm.rst b/Documentation/core-api/boot-time-mm.rst index 03cb1643f46f..6e12e89a03e0 100644 --- a/Documentation/core-api/boot-time-mm.rst +++ b/Documentation/core-api/boot-time-mm.rst @@ -76,7 +76,7 @@ These interfaces available only with bootmem, i.e when ``CONFIG_NO_BOOTMEM=n`` .. kernel-doc:: include/linux/bootmem.h .. kernel-doc:: mm/bootmem.c - :nodocs: + :functions: Memblock specific API --------------------- @@ -89,4 +89,4 @@ really happens under the hood. .. kernel-doc:: include/linux/memblock.h .. kernel-doc:: mm/memblock.c - :nodocs: + :functions: -- cgit v1.2.3 From 8ff7e072880ee48592ce0e4ce7488097205e544e Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Fri, 14 Sep 2018 12:27:56 +0300 Subject: docs: core-api/gfp_mask-from-fs-io: add a label for cross-referencing Signed-off-by: Mike Rapoport Signed-off-by: Jonathan Corbet --- Documentation/core-api/gfp_mask-from-fs-io.rst | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation/core-api') diff --git a/Documentation/core-api/gfp_mask-from-fs-io.rst b/Documentation/core-api/gfp_mask-from-fs-io.rst index e0df8f416582..e7c32a8de126 100644 --- a/Documentation/core-api/gfp_mask-from-fs-io.rst +++ b/Documentation/core-api/gfp_mask-from-fs-io.rst @@ -1,3 +1,5 @@ +.. _gfp_mask_from_fs_io: + ================================= GFP masks used from FS/IO context ================================= -- cgit v1.2.3 From 09700f8a503ac8e76733387e2bab1a199b44236a Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Fri, 14 Sep 2018 12:27:57 +0300 Subject: docs: core-api/mm-api: add a lable for GFP flags section Signed-off-by: Mike Rapoport Signed-off-by: Jonathan Corbet --- Documentation/core-api/mm-api.rst | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation/core-api') diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst index 46ae3537fb12..5ce1ec1dd066 100644 --- a/Documentation/core-api/mm-api.rst +++ b/Documentation/core-api/mm-api.rst @@ -14,6 +14,8 @@ User Space Memory Access .. kernel-doc:: mm/util.c :functions: get_user_pages_fast +.. _mm-api-gfp-flags: + Memory Allocation Controls ========================== -- cgit v1.2.3 From 52272c923af09bdeaf94392d9856f07c30b032e5 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Fri, 14 Sep 2018 12:27:58 +0300 Subject: docs: core-api: add memory allocation guide Signed-off-by: Mike Rapoport Acked-by: Michal Hocko Acked-by: Randy Dunlap Signed-off-by: Jonathan Corbet --- Documentation/core-api/index.rst | 1 + Documentation/core-api/memory-allocation.rst | 122 +++++++++++++++++++++++++++ 2 files changed, 123 insertions(+) create mode 100644 Documentation/core-api/memory-allocation.rst (limited to 'Documentation/core-api') diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 26b735cefb93..165d76886d73 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -27,6 +27,7 @@ Core utilities errseq printk-formats circular-buffers + memory-allocation mm-api gfp_mask-from-fs-io timekeeping diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst new file mode 100644 index 000000000000..f8bb9aa120c4 --- /dev/null +++ b/Documentation/core-api/memory-allocation.rst @@ -0,0 +1,122 @@ +======================= +Memory Allocation Guide +======================= + +Linux provides a variety of APIs for memory allocation. You can +allocate small chunks using `kmalloc` or `kmem_cache_alloc` families, +large virtually contiguous areas using `vmalloc` and its derivatives, +or you can directly request pages from the page allocator with +`alloc_pages`. It is also possible to use more specialized allocators, +for instance `cma_alloc` or `zs_malloc`. + +Most of the memory allocation APIs use GFP flags to express how that +memory should be allocated. The GFP acronym stands for "get free +pages", the underlying memory allocation function. + +Diversity of the allocation APIs combined with the numerous GFP flags +makes the question "How should I allocate memory?" not that easy to +answer, although very likely you should use + +:: + + kzalloc(, GFP_KERNEL); + +Of course there are cases when other allocation APIs and different GFP +flags must be used. + +Get Free Page flags +=================== + +The GFP flags control the allocators behavior. They tell what memory +zones can be used, how hard the allocator should try to find free +memory, whether the memory can be accessed by the userspace etc. The +:ref:`Documentation/core-api/mm-api.rst ` provides +reference documentation for the GFP flags and their combinations and +here we briefly outline their recommended usage: + + * Most of the time ``GFP_KERNEL`` is what you need. Memory for the + kernel data structures, DMAable memory, inode cache, all these and + many other allocations types can use ``GFP_KERNEL``. Note, that + using ``GFP_KERNEL`` implies ``GFP_RECLAIM``, which means that + direct reclaim may be triggered under memory pressure; the calling + context must be allowed to sleep. + * If the allocation is performed from an atomic context, e.g interrupt + handler, use ``GFP_NOWAIT``. This flag prevents direct reclaim and + IO or filesystem operations. Consequently, under memory pressure + ``GFP_NOWAIT`` allocation is likely to fail. Allocations which + have a reasonable fallback should be using ``GFP_NOWARN``. + * If you think that accessing memory reserves is justified and the kernel + will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``. + * Untrusted allocations triggered from userspace should be a subject + of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There + is the handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL`` + allocations that should be accounted. + * Userspace allocations should use either of the ``GFP_USER``, + ``GFP_HIGHUSER`` or ``GFP_HIGHUSER_MOVABLE`` flags. The longer + the flag name the less restrictive it is. + + ``GFP_HIGHUSER_MOVABLE`` does not require that allocated memory + will be directly accessible by the kernel and implies that the + data is movable. + + ``GFP_HIGHUSER`` means that the allocated memory is not movable, + but it is not required to be directly accessible by the kernel. An + example may be a hardware allocation that maps data directly into + userspace but has no addressing limitations. + + ``GFP_USER`` means that the allocated memory is not movable and it + must be directly accessible by the kernel. + +You may notice that quite a few allocations in the existing code +specify ``GFP_NOIO`` or ``GFP_NOFS``. Historically, they were used to +prevent recursion deadlocks caused by direct memory reclaim calling +back into the FS or IO paths and blocking on already held +resources. Since 4.12 the preferred way to address this issue is to +use new scope APIs described in +:ref:`Documentation/core-api/gfp_mask-from-fs-io.rst `. + +Other legacy GFP flags are ``GFP_DMA`` and ``GFP_DMA32``. They are +used to ensure that the allocated memory is accessible by hardware +with limited addressing capabilities. So unless you are writing a +driver for a device with such restrictions, avoid using these flags. +And even with hardware with restrictions it is preferable to use +`dma_alloc*` APIs. + +Selecting memory allocator +========================== + +The most straightforward way to allocate memory is to use a function +from the :c:func:`kmalloc` family. And, to be on the safe size it's +best to use routines that set memory to zero, like +:c:func:`kzalloc`. If you need to allocate memory for an array, there +are :c:func:`kmalloc_array` and :c:func:`kcalloc` helpers. + +The maximal size of a chunk that can be allocated with `kmalloc` is +limited. The actual limit depends on the hardware and the kernel +configuration, but it is a good practice to use `kmalloc` for objects +smaller than page size. + +For large allocations you can use :c:func:`vmalloc` and +:c:func:`vzalloc`, or directly request pages from the page +allocator. The memory allocated by `vmalloc` and related functions is +not physically contiguous. + +If you are not sure whether the allocation size is too large for +`kmalloc`, it is possible to use :c:func:`kvmalloc` and its +derivatives. It will try to allocate memory with `kmalloc` and if the +allocation fails it will be retried with `vmalloc`. There are +restrictions on which GFP flags can be used with `kvmalloc`; please +see :c:func:`kvmalloc_node` reference documentation. Note that +`kvmalloc` may return memory that is not physically contiguous. + +If you need to allocate many identical objects you can use the slab +cache allocator. The cache should be set up with +:c:func:`kmem_cache_create` before it can be used. Afterwards +:c:func:`kmem_cache_alloc` and its convenience wrappers can allocate +memory from that cache. + +When the allocated memory is no longer needed it must be freed. You +can use :c:func:`kvfree` for the memory allocated with `kmalloc`, +`vmalloc` and `kvmalloc`. The slab caches should be freed with +:c:func:`kmem_cache_free`. And don't forget to destroy the cache with +:c:func:`kmem_cache_destroy`. -- cgit v1.2.3 From 98cee6742c80e35274bab06c5fa1141fe0abd910 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Fri, 5 Oct 2018 01:11:01 +0300 Subject: docs/vm: split memory hotplug notifier description to Documentation/core-api The memory hotplug notifier description is about kernel internals rather than admin/user visible API. Place it appropriately. Signed-off-by: Mike Rapoport Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/mm/memory-hotplug.rst | 83 --------------------- Documentation/core-api/index.rst | 2 + Documentation/core-api/memory-hotplug-notifier.rst | 84 ++++++++++++++++++++++ 3 files changed, 86 insertions(+), 83 deletions(-) create mode 100644 Documentation/core-api/memory-hotplug-notifier.rst (limited to 'Documentation/core-api') diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index a33090c179d6..0b9c83effaa4 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -31,7 +31,6 @@ be changed often. 6.1 Memory offline and ZONE_MOVABLE 6.2. How to offline memory 7. Physical memory remove - 8. Memory hotplug event notifier 9. Future Work List @@ -414,88 +413,6 @@ Need more implementation yet.... - Notification completion of remove works by OS to firmware. - Guard from remove if not yet. -Memory hotplug event notifier -============================= - -Hotplugging events are sent to a notification queue. - -There are six types of notification defined in ``include/linux/memory.h``: - -MEM_GOING_ONLINE - Generated before new memory becomes available in order to be able to - prepare subsystems to handle memory. The page allocator is still unable - to allocate from the new memory. - -MEM_CANCEL_ONLINE - Generated if MEMORY_GOING_ONLINE fails. - -MEM_ONLINE - Generated when memory has successfully brought online. The callback may - allocate pages from the new memory. - -MEM_GOING_OFFLINE - Generated to begin the process of offlining memory. Allocations are no - longer possible from the memory but some of the memory to be offlined - is still in use. The callback can be used to free memory known to a - subsystem from the indicated memory block. - -MEM_CANCEL_OFFLINE - Generated if MEMORY_GOING_OFFLINE fails. Memory is available again from - the memory block that we attempted to offline. - -MEM_OFFLINE - Generated after offlining memory is complete. - -A callback routine can be registered by calling:: - - hotplug_memory_notifier(callback_func, priority) - -Callback functions with higher values of priority are called before callback -functions with lower values. - -A callback function must have the following prototype:: - - int callback_func( - struct notifier_block *self, unsigned long action, void *arg); - -The first argument of the callback function (self) is a pointer to the block -of the notifier chain that points to the callback function itself. -The second argument (action) is one of the event types described above. -The third argument (arg) passes a pointer of struct memory_notify:: - - struct memory_notify { - unsigned long start_pfn; - unsigned long nr_pages; - int status_change_nid_normal; - int status_change_nid_high; - int status_change_nid; - } - -- start_pfn is start_pfn of online/offline memory. -- nr_pages is # of pages of online/offline memory. -- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask - is (will be) set/clear, if this is -1, then nodemask status is not changed. -- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask - is (will be) set/clear, if this is -1, then nodemask status is not changed. -- status_change_nid is set node id when N_MEMORY of nodemask is (will be) - set/clear. It means a new(memoryless) node gets new memory by online and a - node loses all memory. If this is -1, then nodemask status is not changed. - - If status_changed_nid* >= 0, callback should create/discard structures for the - node if necessary. - -The callback routine shall return one of the values -NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP -defined in ``include/linux/notifier.h`` - -NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. - -NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, -MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops -further processing of the notification queue. - -NOTIFY_STOP stops further processing of the notification queue. - Future Work =========== diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 165d76886d73..4f8a4266eb7c 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -32,6 +32,8 @@ Core utilities gfp_mask-from-fs-io timekeeping boot-time-mm + memory-hotplug-notifier + Interfaces for kernel debugging =============================== diff --git a/Documentation/core-api/memory-hotplug-notifier.rst b/Documentation/core-api/memory-hotplug-notifier.rst new file mode 100644 index 000000000000..35347cc3a43a --- /dev/null +++ b/Documentation/core-api/memory-hotplug-notifier.rst @@ -0,0 +1,84 @@ +.. _memory_hotplug_notifier: + +============================= +Memory hotplug event notifier +============================= + +Hotplugging events are sent to a notification queue. + +There are six types of notification defined in ``include/linux/memory.h``: + +MEM_GOING_ONLINE + Generated before new memory becomes available in order to be able to + prepare subsystems to handle memory. The page allocator is still unable + to allocate from the new memory. + +MEM_CANCEL_ONLINE + Generated if MEM_GOING_ONLINE fails. + +MEM_ONLINE + Generated when memory has successfully brought online. The callback may + allocate pages from the new memory. + +MEM_GOING_OFFLINE + Generated to begin the process of offlining memory. Allocations are no + longer possible from the memory but some of the memory to be offlined + is still in use. The callback can be used to free memory known to a + subsystem from the indicated memory block. + +MEM_CANCEL_OFFLINE + Generated if MEM_GOING_OFFLINE fails. Memory is available again from + the memory block that we attempted to offline. + +MEM_OFFLINE + Generated after offlining memory is complete. + +A callback routine can be registered by calling:: + + hotplug_memory_notifier(callback_func, priority) + +Callback functions with higher values of priority are called before callback +functions with lower values. + +A callback function must have the following prototype:: + + int callback_func( + struct notifier_block *self, unsigned long action, void *arg); + +The first argument of the callback function (self) is a pointer to the block +of the notifier chain that points to the callback function itself. +The second argument (action) is one of the event types described above. +The third argument (arg) passes a pointer of struct memory_notify:: + + struct memory_notify { + unsigned long start_pfn; + unsigned long nr_pages; + int status_change_nid_normal; + int status_change_nid_high; + int status_change_nid; + } + +- start_pfn is start_pfn of online/offline memory. +- nr_pages is # of pages of online/offline memory. +- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask + is (will be) set/clear, if this is -1, then nodemask status is not changed. +- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask + is (will be) set/clear, if this is -1, then nodemask status is not changed. +- status_change_nid is set node id when N_MEMORY of nodemask is (will be) + set/clear. It means a new(memoryless) node gets new memory by online and a + node loses all memory. If this is -1, then nodemask status is not changed. + + If status_changed_nid* >= 0, callback should create/discard structures for the + node if necessary. + +The callback routine shall return one of the values +NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP +defined in ``include/linux/notifier.h`` + +NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. + +NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, +MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops +further processing of the notification queue. + +NOTIFY_STOP stops further processing of the notification queue. -- cgit v1.2.3 From 52d7e21fd5677829353f7490723adf5f61999d84 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Thu, 11 Oct 2018 07:58:16 +0300 Subject: docs/core-api: rename memory-hotplug-notifier to memory-hotplug to allow additions of new documentation about memory hotplug under the same roof. Signed-off-by: Mike Rapoport Reviewed-by: David Hildenbrand Signed-off-by: Jonathan Corbet --- Documentation/core-api/index.rst | 2 +- Documentation/core-api/memory-hotplug-notifier.rst | 84 --------------------- Documentation/core-api/memory-hotplug.rst | 87 ++++++++++++++++++++++ 3 files changed, 88 insertions(+), 85 deletions(-) delete mode 100644 Documentation/core-api/memory-hotplug-notifier.rst create mode 100644 Documentation/core-api/memory-hotplug.rst (limited to 'Documentation/core-api') diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 4f8a4266eb7c..29c790f571a5 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -32,7 +32,7 @@ Core utilities gfp_mask-from-fs-io timekeeping boot-time-mm - memory-hotplug-notifier + memory-hotplug Interfaces for kernel debugging diff --git a/Documentation/core-api/memory-hotplug-notifier.rst b/Documentation/core-api/memory-hotplug-notifier.rst deleted file mode 100644 index 35347cc3a43a..000000000000 --- a/Documentation/core-api/memory-hotplug-notifier.rst +++ /dev/null @@ -1,84 +0,0 @@ -.. _memory_hotplug_notifier: - -============================= -Memory hotplug event notifier -============================= - -Hotplugging events are sent to a notification queue. - -There are six types of notification defined in ``include/linux/memory.h``: - -MEM_GOING_ONLINE - Generated before new memory becomes available in order to be able to - prepare subsystems to handle memory. The page allocator is still unable - to allocate from the new memory. - -MEM_CANCEL_ONLINE - Generated if MEM_GOING_ONLINE fails. - -MEM_ONLINE - Generated when memory has successfully brought online. The callback may - allocate pages from the new memory. - -MEM_GOING_OFFLINE - Generated to begin the process of offlining memory. Allocations are no - longer possible from the memory but some of the memory to be offlined - is still in use. The callback can be used to free memory known to a - subsystem from the indicated memory block. - -MEM_CANCEL_OFFLINE - Generated if MEM_GOING_OFFLINE fails. Memory is available again from - the memory block that we attempted to offline. - -MEM_OFFLINE - Generated after offlining memory is complete. - -A callback routine can be registered by calling:: - - hotplug_memory_notifier(callback_func, priority) - -Callback functions with higher values of priority are called before callback -functions with lower values. - -A callback function must have the following prototype:: - - int callback_func( - struct notifier_block *self, unsigned long action, void *arg); - -The first argument of the callback function (self) is a pointer to the block -of the notifier chain that points to the callback function itself. -The second argument (action) is one of the event types described above. -The third argument (arg) passes a pointer of struct memory_notify:: - - struct memory_notify { - unsigned long start_pfn; - unsigned long nr_pages; - int status_change_nid_normal; - int status_change_nid_high; - int status_change_nid; - } - -- start_pfn is start_pfn of online/offline memory. -- nr_pages is # of pages of online/offline memory. -- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask - is (will be) set/clear, if this is -1, then nodemask status is not changed. -- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask - is (will be) set/clear, if this is -1, then nodemask status is not changed. -- status_change_nid is set node id when N_MEMORY of nodemask is (will be) - set/clear. It means a new(memoryless) node gets new memory by online and a - node loses all memory. If this is -1, then nodemask status is not changed. - - If status_changed_nid* >= 0, callback should create/discard structures for the - node if necessary. - -The callback routine shall return one of the values -NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP -defined in ``include/linux/notifier.h`` - -NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. - -NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, -MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops -further processing of the notification queue. - -NOTIFY_STOP stops further processing of the notification queue. diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst new file mode 100644 index 000000000000..a99f2f264725 --- /dev/null +++ b/Documentation/core-api/memory-hotplug.rst @@ -0,0 +1,87 @@ +.. _memory_hotplug: + +============== +Memory hotplug +============== + +Memory hotplug event notifier +============================= + +Hotplugging events are sent to a notification queue. + +There are six types of notification defined in ``include/linux/memory.h``: + +MEM_GOING_ONLINE + Generated before new memory becomes available in order to be able to + prepare subsystems to handle memory. The page allocator is still unable + to allocate from the new memory. + +MEM_CANCEL_ONLINE + Generated if MEM_GOING_ONLINE fails. + +MEM_ONLINE + Generated when memory has successfully brought online. The callback may + allocate pages from the new memory. + +MEM_GOING_OFFLINE + Generated to begin the process of offlining memory. Allocations are no + longer possible from the memory but some of the memory to be offlined + is still in use. The callback can be used to free memory known to a + subsystem from the indicated memory block. + +MEM_CANCEL_OFFLINE + Generated if MEM_GOING_OFFLINE fails. Memory is available again from + the memory block that we attempted to offline. + +MEM_OFFLINE + Generated after offlining memory is complete. + +A callback routine can be registered by calling:: + + hotplug_memory_notifier(callback_func, priority) + +Callback functions with higher values of priority are called before callback +functions with lower values. + +A callback function must have the following prototype:: + + int callback_func( + struct notifier_block *self, unsigned long action, void *arg); + +The first argument of the callback function (self) is a pointer to the block +of the notifier chain that points to the callback function itself. +The second argument (action) is one of the event types described above. +The third argument (arg) passes a pointer of struct memory_notify:: + + struct memory_notify { + unsigned long start_pfn; + unsigned long nr_pages; + int status_change_nid_normal; + int status_change_nid_high; + int status_change_nid; + } + +- start_pfn is start_pfn of online/offline memory. +- nr_pages is # of pages of online/offline memory. +- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask + is (will be) set/clear, if this is -1, then nodemask status is not changed. +- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask + is (will be) set/clear, if this is -1, then nodemask status is not changed. +- status_change_nid is set node id when N_MEMORY of nodemask is (will be) + set/clear. It means a new(memoryless) node gets new memory by online and a + node loses all memory. If this is -1, then nodemask status is not changed. + + If status_changed_nid* >= 0, callback should create/discard structures for the + node if necessary. + +The callback routine shall return one of the values +NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP +defined in ``include/linux/notifier.h`` + +NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. + +NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, +MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops +further processing of the notification queue. + +NOTIFY_STOP stops further processing of the notification queue. -- cgit v1.2.3 From 3a7452c5a72bd8098f6d4b37341e25a8725d790b Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Thu, 11 Oct 2018 07:58:17 +0300 Subject: docs/core-api: memory-hotplug: add some details about locking internals Let's document the magic a bit, especially why device_hotplug_lock is required when adding/removing memory and how it all play together with requests to online/offline memory from user space. [ rppt: moved the text to Documentation/core-api/memory-hotplug.rst ] Link: http://lkml.kernel.org/r/20180925091457.28651-7-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Pavel Tatashin Reviewed-by: Rashmica Gupta Cc: Jonathan Corbet Cc: Michal Hocko Cc: Balbir Singh Cc: Benjamin Herrenschmidt Cc: Boris Ostrovsky Cc: Dan Williams Cc: Greg Kroah-Hartman Cc: Haiyang Zhang Cc: Heiko Carstens Cc: John Allen Cc: Joonsoo Kim Cc: Juergen Gross Cc: Kate Stewart Cc: "K. Y. Srinivasan" Cc: Len Brown Cc: Martin Schwidefsky Cc: Mathieu Malaterre Cc: Michael Ellerman Cc: Michael Neuling Cc: Nathan Fontenot Cc: Oscar Salvador Cc: Paul Mackerras Cc: Philippe Ombredanne Cc: Rafael J. Wysocki Cc: "Rafael J. Wysocki" Cc: Stephen Hemminger Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: YASUAKI ISHIMATSU Signed-off-by: Andrew Morton Signed-off-by: Mike Rapoport Signed-off-by: Jonathan Corbet --- Documentation/core-api/memory-hotplug.rst | 38 +++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) (limited to 'Documentation/core-api') diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst index a99f2f264725..de7467e48067 100644 --- a/Documentation/core-api/memory-hotplug.rst +++ b/Documentation/core-api/memory-hotplug.rst @@ -85,3 +85,41 @@ MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops further processing of the notification queue. NOTIFY_STOP stops further processing of the notification queue. + +Locking Internals +================= + +When adding/removing memory that uses memory block devices (i.e. ordinary RAM), +the device_hotplug_lock should be held to: + +- synchronize against online/offline requests (e.g. via sysfs). This way, memory + block devices can only be accessed (.online/.state attributes) by user + space once memory has been fully added. And when removing memory, we + know nobody is in critical sections. +- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC) + +Especially, there is a possible lock inversion that is avoided using +device_hotplug_lock when adding memory and user space tries to online that +memory faster than expected: + +- device_online() will first take the device_lock(), followed by + mem_hotplug_lock +- add_memory_resource() will first take the mem_hotplug_lock, followed by + the device_lock() (while creating the devices, during bus_add_device()). + +As the device is visible to user space before taking the device_lock(), this +can result in a lock inversion. + +onlining/offlining of memory should be done via device_online()/ +device_offline() - to make sure it is properly synchronized to actions +via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type) + +When adding/removing/onlining/offlining memory or adding/removing +heterogeneous/device memory, we should always hold the mem_hotplug_lock in +write mode to serialise memory hotplug (e.g. access to global/zone +variables). + +In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read +mode allows for a quite efficient get_online_mems/put_online_mems +implementation, so code accessing memory can protect from that memory +vanishing. -- cgit v1.2.3 From 94ac8f2074b22465f75e93ecbb98060d7960f4b6 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Mon, 8 Oct 2018 13:08:48 +0200 Subject: doc: printk-formats: Remove bogus kobject references for device nodes When converting from text to rst, the kobjects section and its sole subsection about device tree nodes were coalesced into a single section, yielding an inconsistent result. Remove all references to kobjects, as 1. Device tree object pointers are not compatible to kobject pointers (the former may embed the latter, though), and 2. there are no printk formats defined for kobject types. Update the vsprintf() source code comments to match the above. Fixes: b3ed23213eab1e08 ("doc: convert printk-formats.txt to rst") Signed-off-by: Geert Uytterhoeven Signed-off-by: Jonathan Corbet --- Documentation/core-api/printk-formats.rst | 6 +++--- lib/vsprintf.c | 20 +++++++++----------- 2 files changed, 12 insertions(+), 14 deletions(-) (limited to 'Documentation/core-api') diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst index 25dc591cb110..86023c33906f 100644 --- a/Documentation/core-api/printk-formats.rst +++ b/Documentation/core-api/printk-formats.rst @@ -376,15 +376,15 @@ correctness of the format string and va_list arguments. Passed by reference. -kobjects --------- +Device tree nodes +----------------- :: %pOF[fnpPcCF] -For printing kobject based structs (device nodes). Default behaviour is +For printing device tree node structures. Default behaviour is equivalent to %pOFf. - f - device node full_name diff --git a/lib/vsprintf.c b/lib/vsprintf.c index d5b3a3f95c01..c8005105e2d6 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -1833,17 +1833,15 @@ static char *ptr_to_id(char *buf, char *end, void *ptr, struct printf_spec spec) * p page flags (see struct page) given as pointer to unsigned long * g gfp flags (GFP_* and __GFP_*) given as pointer to gfp_t * v vma flags (VM_*) given as pointer to unsigned long - * - 'O' For a kobject based struct. Must be one of the following: - * - 'OF[fnpPcCF]' For a device tree object - * Without any optional arguments prints the full_name - * f device node full_name - * n device node name - * p device node phandle - * P device node path spec (name + @unit) - * F device node flags - * c major compatible string - * C full compatible string - * + * - 'OF[fnpPcCF]' For a device tree object + * Without any optional arguments prints the full_name + * f device node full_name + * n device node name + * p device node phandle + * P device node path spec (name + @unit) + * F device node flags + * c major compatible string + * C full compatible string * - 'x' For printing the address. Equivalent to "%lx". * * ** When making changes please also update: -- cgit v1.2.3