summaryrefslogtreecommitdiffstats
path: root/Documentation/vm
AgeCommit message (Collapse)AuthorFilesLines
2009-09-22ksm: add some documentationHugh Dickins2-0/+91
Add Documentation/vm/ksm.txt: how to use the Kernel Samepage Merging feature Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22hugetlb: clean up and update huge pages documentationLee Schermerhorn1-46/+87
Attempt to clarify huge page administration and usage, and updates the doucmentation to mention the balancing of huge pages across nodes when allocating and freeing. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-10slub: add option to disable higher order debugging slabsDavid Rientjes1-0/+10
When debugging is enabled, slub requires that additional metadata be stored in slabs for certain options: SLAB_RED_ZONE, SLAB_POISON, and SLAB_STORE_USER. Consequently, it may require that the minimum possible slab order needed to allocate a single object be greater when using these options. The most notable example is for objects that are PAGE_SIZE bytes in size. Higher minimum slab orders may cause page allocation failures when oom or under heavy fragmentation. This patch adds a new slub_debug option, which disables debugging by default for caches that would have resulted in higher minimum orders: slub_debug=O When this option is used on systems with 4K pages, kmalloc-4096, for example, will not have debugging enabled by default even if CONFIG_SLUB_DEBUG_ON is defined because it would have resulted in a order-1 minimum slab order. Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-17Documentation/vm/Makefile: don't try to build slqbinfoAndrew Morton1-1/+1
For it is only in linux-next at this stage. Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16pagemap: add page-types toolWu Fengguang2-1/+699
Add page-types, a handy tool for querying page flags. It will expand some of the overloaded flags: PG_slob_free = PG_private PG_slub_frozen = PG_active PG_slub_debug = PG_error PG_readahead = PG_reclaim and mask out obscure flags except in -raw mode: PG_reserved PG_mlocked PG_mappedtodisk PG_private PG_private_2 PG_owner_priv_1 PG_arch_1 PG_uncached PG_compound* for non hugeTLB pages [akpm@linux-foundation.org: fix warning] Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16pagemap: document 9 more exported page flagsWu Fengguang1-0/+62
Also add short descriptions for all of the 20 exported page flags. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16pagemap: document clarificationsWu Fengguang1-3/+3
Some bit ranges were inclusive and some not. Fix them to be consistently inclusive. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16page allocator: use allocation flags as an index to the zone watermarkMel Gorman1-9/+9
ALLOC_WMARK_MIN, ALLOC_WMARK_LOW and ALLOC_WMARK_HIGH determin whether pages_min, pages_low or pages_high is used as the zone watermark when allocating the pages. Two branches in the allocator hotpath determine which watermark to use. This patch uses the flags as an array index into a watermark array that is indexed with WMARK_* defines accessed via helpers. All call sites that use zone->pages_* are updated to use the helpers for accessing the values and the array offsets for setting. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-13mm: add documentation describing what tsk->active_mm means vs tsk->mmMichael Ellerman2-0/+85
I'm sure everyone knows this, but I didn't, so I googled it, and found a nice explanation from Linus. Might be worth sticking in Documentation. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-13mm: reformat the Unevictable-LRU documentationDavid Howells1-469/+572
Do a bit of reformatting on the Unevictable-LRU documentation. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-09tracing: consolidate documentsLi Zefan1-126/+0
Move kmemtrace.txt, tracepoints.txt, ftrace.txt and mmiotrace.txt to the new trace/ directory. I didnt find any references to those documents in both source files and documents, so no extra work needs to be done. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Pekka Paalanen <pq@iki.fi> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> LKML-Reference: <49DD6E2B.6090200@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-05Merge branch 'tracing-for-linus' of ↵Linus Torvalds1-0/+126
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits) tracing, net: fix net tree and tracing tree merge interaction tracing, powerpc: fix powerpc tree and tracing tree interaction ring-buffer: do not remove reader page from list on ring buffer free function-graph: allow unregistering twice trace: make argument 'mem' of trace_seq_putmem() const tracing: add missing 'extern' keywords to trace_output.h tracing: provide trace_seq_reserve() blktrace: print out BLK_TN_MESSAGE properly blktrace: extract duplidate code blktrace: fix memory leak when freeing struct blk_io_trace blktrace: fix blk_probes_ref chaos blktrace: make classic output more classic blktrace: fix off-by-one bug blktrace: fix the original blktrace blktrace: fix a race when creating blk_tree_root in debugfs blktrace: fix timestamp in binary output tracing, Text Edit Lock: cleanup tracing: filter fix for TRACE_EVENT_FORMAT events ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release() x86: kretprobe-booster interrupt emulation code fix ... Fix up trivial conflicts in arch/parisc/include/asm/ftrace.h include/linux/memory.h kernel/extable.c kernel/module.c
2009-03-30trivial: fix where cgroup documentation is not correctly referred toThadeu Lima de Souza Cascardo2-2/+4
cgroup documentation was moved to Documentation/cgroups/. There are some places that still refer to Documentation/controllers/, Documentation/cgroups.txt and Documentation/cpusets.txt. Fix those. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-11Merge commit 'v2.6.29-rc1' into tracing/urgentIngo Molnar1-45/+18
2009-01-06mm: remove try_to_munlock from vmscanHugh Dickins1-45/+18
An unfortunate feature of the Unevictable LRU work was that reclaiming an anonymous page involved an extra scan through the anon_vma: to check that the page is evictable before allocating swap, because the swap could not be freed reliably soon afterwards. Now try_to_free_swap() has replaced remove_exclusive_swap_page(), that's not an issue any more: remove try_to_munlock() call from shrink_page_list(), leaving it to try_to_munmap() to discover if the page is one to be culled to the unevictable list - in which case then try_to_free_swap(). Update unevictable-lru.txt to remove comments on the try_to_munlock() in shrink_page_list(), and shorten some lines over 80 columns. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-29kmemtrace: remove config option for enabling tracing at bootPekka Enberg1-1/+1
Users can pass kmemtrace.enabled=yes as a kernel parameter to enable kmemtrace at boot so remove the useless CONFIG_KMEMTRACE_DEFAULT_ENABLED config option. Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-12-29kmemtrace: Fix typos in documentation.Eduard - Gabriel Munteanu1-1/+1
Corrected the ABI description and the kmemtrace usage guide. Thanks to Randy Dunlap for noticing these errors. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-12-29kmemtrace: Additional documentation.Eduard - Gabriel Munteanu1-0/+126
Documented kmemtrace's ABI, purpose and design. Also includes a short usage guide, FAQ, as well as a link to the userspace application's Git repository, which is currently hosted at repo.or.cz. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-10-30.gitignore updatesAlexey Dobriyan1-0/+1
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20doc: unevictable LRU and mlocked pages documentationLee Schermerhorn1-0/+615
Documentation for unevictable lru list and its usage. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15Documentation/vm/page_migration: update reference to numa_maps + fix ↵Michael Kerrisk1-4/+5
download URI With man-pages-3.07, the numa_maps documentation home is now proc(5), so the reference in Documentation/vm/page_migration needs updating. (Cliff/Lee are removing numa_maps.5 from the numactl package.) Also, the download location for the numactl package changed a while back. This patch fixes both things, as well as a typo (provided-->provides). Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Cliff Wickman <cpw@sgi.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-12docsrc: build Documentation/ sourcesRandy Dunlap1-0/+8
Currently source files in the Documentation/ sub-dir can easily bit-rot since they are not generally buildable, either because they are hidden in text files or because there are no Makefile rules for them. This needs to be fixed so that the source files remain usable and good examples of code instead of bad examples. Add the ability to build source files that are in the Documentation/ dir. Add to Kconfig as "BUILD_DOCSRC" config symbol. Use "CONFIG_BUILD_DOCSRC=1 make ..." to build objects from the Documentation/ sources. Or enable BUILD_DOCSRC in the *config system. However, this symbol depends on HEADERS_CHECK since the header files need to be installed (for userspace builds). Built (using cross-tools) for x86-64, i386, alpha, ia64, sparc32, sparc64, powerpc, sh, m68k, & mips. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26Documentation cleanup: trivial misspelling, punctuation, and grammar ↵Matt LaPlante2-3/+3
corrections. Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24hugetlb: new sysfs interfaceNishanth Aravamudan1-0/+23
Provide new hugepages user APIs that are more suited to multiple hstates in sysfs. There is a new directory, /sys/kernel/hugepages. Underneath that directory there will be a directory per-supported hugepage size, e.g.: /sys/kernel/hugepages/hugepages-64kB /sys/kernel/hugepages/hugepages-16384kB /sys/kernel/hugepages/hugepages-16777216kB corresponding to 64k, 16m and 16g respectively. Within each hugepages-size directory there are a number of files, corresponding to the tracked counters in the hstate, e.g.: /sys/kernel/hugepages/hugepages-64/nr_hugepages /sys/kernel/hugepages/hugepages-64/nr_overcommit_hugepages /sys/kernel/hugepages/hugepages-64/free_hugepages /sys/kernel/hugepages/hugepages-64/resv_hugepages /sys/kernel/hugepages/hugepages-64/surplus_hugepages Of these files, the first two are read-write and the latter three are read-only. The size of the hugepage being manipulated is trivially deducible from the enclosing directory and is always expressed in kB (to match meminfo). [dave@linux.vnet.ibm.com: fix build] [nacc@us.ibm.com: hugetlb: hang off of /sys/kernel/mm rather than /sys/kernel] [nacc@us.ibm.com: hugetlb: remove CONFIG_SYSFS dependency] Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04Christoph has movedChristoph Lameter2-3/+3
Remove all clameter@sgi.com addresses from the kernel tree since they will become invalid on June 27th. Change my maintainer email address for the slab allocators to cl@linux-foundation.org (which will be the new email address for the future). Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-06pagemap: add documentation for pagemapThomas Tuttle1-0/+77
Just a quick explanation of the pagemap interface from a userspace point of view, and an example of how to use it (in English, not code). Signed-off-by: Thomas Tuttle <ttuttle@google.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-02slabinfo: Support printout of the number of fallbacksChristoph Lameter1-4/+6
Add functionality to slabinfo to print out the number of fallbacks that have occurred for each slab cache when the -D option is specified. Also widen the allocation / free field since the numbers became too big after a week. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-04-28Merge branch 'for-linus' of ↵Linus Torvalds1-16/+11
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: pack objects denser slub: Calculate min_objects based on number of processors. slub: Drop DEFAULT_MAX_ORDER / DEFAULT_MIN_OBJECTS slub: Simplify any_slab_object checks slub: Make the order configurable for each slab cache slub: Drop fallback to page allocator method slub: Fallback to minimal order during slab page allocation slub: Update statistics handling for variable order slabs slub: Add kmem_cache_order_objects struct slub: for_each_object must be passed the number of objects in a slab slub: Store max number of objects in the page struct. slub: Dump list of objects not freed on kmem_cache_close() slub: free_list() cleanup slub: improve kmem_cache_destroy() error message slob: fix bug - when slob allocates "struct kmem_cache", it does not force alignment.
2008-04-28mempolicy: use MPOL_F_LOCAL to Indicate Preferred Local PolicyLee Schermerhorn1-6/+5
Now that we're using "preferred local" policy for system default, we need to make this as fast as possible. Because of the variable size of the mempolicy structure [based on size of nodemasks], the preferred_node may be in a different cacheline from the mode. This can result in accessing an extra cacheline in the normal case of system default policy. Suspect this is the cause of an observed 2-3% slowdown in page fault testing relative to kernel without this patch series. To alleviate this, use an internal mode flag, MPOL_F_LOCAL in the mempolicy flags member which is guaranteed [?] to be in the same cacheline as the mode itself. Verified that reworked mempolicy now performs slightly better on 25-rc8-mm1 for both anon and shmem segments with system default and vma [preferred local] policy. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28mempolicy: use MPOL_PREFERRED for system-wide default policyLee Schermerhorn1-36/+18
Currently, when one specifies MPOL_DEFAULT via a NUMA memory policy API [set_mempolicy(), mbind() and internal versions], the kernel simply installs a NULL struct mempolicy pointer in the appropriate context: task policy, vma policy, or shared policy. This causes any use of that policy to "fall back" to the next most specific policy scope. The only use of MPOL_DEFAULT to mean "local allocation" is in the system default policy. This requires extra checks/cases for MPOL_DEFAULT in many mempolicy.c functions. There is another, "preferred" way to specify local allocation via the APIs. That is using the MPOL_PREFERRED policy mode with an empty nodemask. Internally, the empty nodemask gets converted to a preferred_node id of '-1'. All internal usage of MPOL_PREFERRED will convert the '-1' to the id of the node local to the cpu where the allocation occurs. System default policy, except during boot, is hard-coded to "local allocation". By using the MPOL_PREFERRED mode with a negative value of preferred node for system default policy, MPOL_DEFAULT will never occur in the 'policy' member of a struct mempolicy. Thus, we can remove all checks for MPOL_DEFAULT when converting policy to a node id/zonelist in the allocation paths. In slab_node() return local node id when policy pointer is NULL. No need to set a pol value to take the switch default. Replace switch default with BUG()--i.e., shouldn't happen. With this patch MPOL_DEFAULT is only used in the APIs, including internal calls to do_set_mempolicy() and in the display of policy in /proc/<pid>/numa_maps. It always means "fall back" to the the next most specific policy scope. This simplifies the description of memory policies quite a bit, with no visible change in behavior. get_mempolicy() continues to return MPOL_DEFAULT and an empty nodemask when the requested policy [task or vma/shared] is NULL. These are the values one would supply via set_mempolicy() or mbind() to achieve that condition--default behavior. This patch updates Documentation to reflect this change. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28mempolicy: rework mempolicy Reference Counting [yet again]Lee Schermerhorn1-0/+68
After further discussion with Christoph Lameter, it has become clear that my earlier attempts to clean up the mempolicy reference counting were a bit of overkill in some areas, resulting in superflous ref/unref in what are usually fast paths. In other areas, further inspection reveals that I botched the unref for interleave policies. A separate patch, suitable for upstream/stable trees, fixes up the known errors in the previous attempt to fix reference counting. This patch reworks the memory policy referencing counting and, one hopes, simplifies the code. Maybe I'll get it right this time. See the update to the numa_memory_policy.txt document for a discussion of memory policy reference counting that motivates this patch. Summary: Lookup of mempolicy, based on (vma, address) need only add a reference for shared policy, and we need only unref the policy when finished for shared policies. So, this patch backs out all of the unneeded extra reference counting added by my previous attempt. It then unrefs only shared policies when we're finished with them, using the mpol_cond_put() [conditional put] helper function introduced by this patch. Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma containing just the policy. read_swap_cache_async() can call alloc_page_vma() multiple times, so we can't let alloc_page_vma() unref the shared policy in this case. To avoid this, we make a copy of any non-null shared policy and remove the MPOL_F_SHARED flag from the copy. This copy occurs before reading a page [or multiple pages] from swap, so the overhead should not be an issue here. I introduced a new static inline function "mpol_cond_copy()" to copy the shared policy to an on-stack policy and remove the flags that would require a conditional free. The current implementation of mpol_cond_copy() assumes that the struct mempolicy contains no pointers to dynamically allocated structures that must be duplicated or reference counted during copy. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28mempolicy: rename struct mempolicy 'policy' member to 'mode'Lee Schermerhorn1-4/+0
The terms 'policy' and 'mode' are both used in various places to describe the semantics of the value stored in the 'policy' member of struct mempolicy. Furthermore, the term 'policy' is used to refer to that member, to the entire struct mempolicy and to the more abstract concept of the tuple consisting of a "mode" and an optional node or set of nodes. Recently, we have added "mode flags" that are passed in the upper bits of the 'mode' [or sometimes, 'policy'] member of the numa APIs. I'd like to resolve this confusion, which perhaps only exists in my mind, by renaming the 'policy' member to 'mode' throughout, and fixing up the Documentation. Man pages will be updated separately. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28mempolicy: disallow static or relative flags for local preferred modeDavid Rientjes1-2/+14
MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES don't mean anything for MPOL_PREFERRED policies that were created with an empty nodemask (for purely local allocations). They'll never be invalidated because the allowed mems of a task changes or need to be rebound relative to a cpuset's placement. Also fixes a bug identified by Lee Schermerhorn that disallowed empty nodemasks to be passed to MPOL_PREFERRED to specify local allocations. [A different, somewhat incomplete, patch already existed in 25-rc5-mm1.] Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <ak@suse.de> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28mempolicy: update NUMA memory policy documentationDavid Rientjes1-31/+100
Updates Documentation/vm/numa_memory_policy.txt and Documentation/filesystems/tmpfs.txt to describe optional mempolicy mode flags. Cc: Christoph Lameter <clameter@sgi.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <ak@suse.de> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28mm: filter based on a nodemask as well as a gfp_maskMel Gorman1-8/+3
The MPOL_BIND policy creates a zonelist that is used for allocations controlled by that mempolicy. As the per-node zonelist is already being filtered based on a zone id, this patch adds a version of __alloc_pages() that takes a nodemask for further filtering. This eliminates the need for MPOL_BIND to create a custom zonelist. A positive benefit of this is that allocations using MPOL_BIND now use the local node's distance-ordered zonelist instead of a custom node-id-ordered zonelist. I.e., pages will be allocated from the closest allowed node with available memory. [Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments] [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask] [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-27slub: Update statistics handling for variable order slabsChristoph Lameter1-16/+11
Change the statistics to consider that slabs of the same slabcache can have different number of objects in them since they may be of different order. Provide a new sysfs field total_objects which shows the total objects that the allocated slabs of a slabcache could hold. Add a max field that holds the largest slab order that was ever used for a slab cache. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-04-15Documentation: correct overcommit caveat in hugetlbpage.txtNishanth Aravamudan1-4/+3
As shown by Gurudas Pai recently, we can put hugepages into the surplus state (by echo 0 > /proc/sys/vm/nr_hugepages), even when /proc/sys/vm/nr_overcommit_hugepages is 0. This is actually correct, to allow the original goal (shrink the static pool to 0) to succeed (we are converting hugepages to surplus because they are in use). However, the documentation does not accurately reflect this case. Update it. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-06slub: fix typo in Documentation/vm/slub.txtItaru Kitayama1-2/+2
slub_debug=,dentry is correct, not dentry_cache. Signed-off-by: Itaru Kitayama <i-kitayama@ap.jp.nec.com> Signed-off-by: Christoph Lameter <clameter@sgi.com>
2008-02-21slabinfo: fall back from /sys/kernel/slab to /sys/slabChristoph Lameter1-1/+1
I keep running upstream and mm kernels and the location of the slab directory is different since upstream still uses /sys/slab. This patch makes slabinfo check /sys/slab if /sys/kernel/slab is not there. Makes slabinfo work on any kernel. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-07SLUB: Support for performance statisticsChristoph Lameter1-11/+138
The statistics provided here allow the monitoring of allocator behavior but at the cost of some (minimal) loss of performance. Counters are placed in SLUB's per cpu data structure. The per cpu structure may be extended by the statistics to grow larger than one cacheline which will increase the cache footprint of SLUB. There is a compile option to enable/disable the inclusion of the runtime statistics and its off by default. The slabinfo tool is enhanced to support these statistics via two options: -D Switches the line of information displayed for a slab from size mode to activity mode. -A Sorts the slabs displayed by activity. This allows the display of the slabs most important to the performance of a certain load. -r Report option will report detailed statistics on Example (tbench load): slabinfo -AD ->Shows the most active slabs Name Objects Alloc Free %Fast skbuff_fclone_cache 33 111953835 111953835 99 99 :0000192 2666 5283688 5281047 99 99 :0001024 849 5247230 5246389 83 83 vm_area_struct 1349 119642 118355 91 22 :0004096 15 66753 66751 98 98 :0000064 2067 25297 23383 98 78 dentry 10259 28635 18464 91 45 :0000080 11004 18950 8089 98 98 :0000096 1703 12358 10784 99 98 :0000128 762 10582 9875 94 18 :0000512 184 9807 9647 95 81 :0002048 479 9669 9195 83 65 anon_vma 777 9461 9002 99 71 kmalloc-8 6492 9981 5624 99 97 :0000768 258 7174 6931 58 15 So the skbuff_fclone_cache is of highest importance for the tbench load. Pretty high load on the 192 sized slab. Look for the aliases slabinfo -a | grep 000192 :0000192 <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili Likely skbuff_head_cache. Looking into the statistics of the skbuff_fclone_cache is possible through slabinfo skbuff_fclone_cache ->-r option implied if cache name is mentioned .... Usual output ... Slab Perf Counter Alloc Free %Al %Fr -------------------------------------------------- Fastpath 111953360 111946981 99 99 Slowpath 1044 7423 0 0 Page Alloc 272 264 0 0 Add partial 25 325 0 0 Remove partial 86 264 0 0 RemoteObj/SlabFrozen 350 4832 0 0 Total 111954404 111954404 Flushes 49 Refill 0 Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%) Looks good because the fastpath is overwhelmingly taken. skbuff_head_cache: Slab Perf Counter Alloc Free %Al %Fr -------------------------------------------------- Fastpath 5297262 5259882 99 99 Slowpath 4477 39586 0 0 Page Alloc 937 824 0 0 Add partial 0 2515 0 0 Remove partial 1691 824 0 0 RemoteObj/SlabFrozen 2621 9684 0 0 Total 5301739 5299468 Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%) Descriptions of the output: Total: The total number of allocation and frees that occurred for a slab Fastpath: The number of allocations/frees that used the fastpath. Slowpath: Other allocations Page Alloc: Number of calls to the page allocator as a result of slowpath processing Add Partial: Number of slabs added to the partial list through free or alloc (occurs during cpuslab flushes) Remove Partial: Number of slabs removed from the partial list as a result of allocations retrieving a partial slab or by a free freeing the last object of a slab. RemoteObj/Froz: How many times were remotely freed object encountered when a slab was about to be deactivated. Frozen: How many times was free able to skip list processing because the slab was in use as the cpuslab of another processor. Flushes: Number of times the cpuslab was flushed on request (kmem_cache_shrink, may result from races in __slab_alloc) Refill: Number of times we were able to refill the cpuslab from remotely freed objects for the same slab. Deactivate: Statistics how slabs were deactivated. Shows how they were put onto the partial list. In general fastpath is very good. Slowpath without partial list processing is also desirable. Any touching of partial list uses node specific locks which may potentially cause list lock contention. Signed-off-by: Christoph Lameter <clameter@sgi.com>
2008-01-24kset: move /sys/slab to /sys/kernel/slabGreg Kroah-Hartman2-2/+2
/sys/kernel is where these things should go. Also updated the documentation and tool that used this directory. Cc: Kay Sievers <kay.sievers@vrfy.org> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-17Documentation: update hugetlb informationNishanth Aravamudan1-6/+29
The hugetlb documentation has gotten a bit out of sync with the current code. Updated the sysctl file to refer to Documentation/vm/hugetlbpage.txt. Update that file to contain the current state of affairs (with the newer named sysctl in place). Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Adam Litke <agl@us.ibm.com> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Documentation/vm/slabinfo.c: clean up this codeWANG Cong1-12/+15
This patch does the following cleanups for Documentation/vm/slabinfo.c: - Fix two memory leaks; - Constify some char pointers; - Use snprintf instead of sprintf in case of buffer overflow; - Fix some indentations; - Other little improvements. Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17doc: move vm/00-INDEX to Documentation/vmDavid Rientjes1-0/+20
Looks like the 00-INDEX file lost its parent directory in -rc6-mm1. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16Mem Policy: add MPOL_F_MEMS_ALLOWED get_mempolicy() flagLee Schermerhorn1-17/+16
Allow an application to query the memories allowed by its context. Updated numa_memory_policy.txt to mention that applications can use this to obtain allowed memories for constructing valid policies. TODO: update out-of-tree libnuma wrapper[s], or maybe add a new wrapper--e.g., numa_get_mems_allowed() ? Also, update numa syscall man pages. Tested with memtoy V>=0.13. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-22Document Linux Memory PolicyLee Schermerhorn1-0/+332
I couldn't find any memory policy documentation in the Documentation directory, so here is my attempt to document it. There's lots more that could be written about the internal design--including data structures, functions, etc. However, if you agree that this is better that the nothing that exists now, perhaps it could be merged. This will provide a baseline for updates to document the many policy patches that are currently being worked. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Acked-by: Rob Landley <rob@landley.net> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-09SLUB: Fix format specifier in Documentation/vm/slabinfo.cJesper Juhl1-1/+1
There's a little problem in Documentation/vm/slabinfo.c The code is using "%d" in a printf() call to print an 'unsigned long'. This patch corrects it to use "%lu" instead. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Christoph Lameter <clameter@sgi.com>
2007-07-17SLUB: change error reporting format to follow lockdep looselyChristoph Lameter1-48/+89
Changes the error reporting format to loosely follow lockdep. If data corruption is detected then we generate the following lines: ============================================ BUG <slab-cache>: <problem> -------------------------------------------- INFO: <more information> [possibly multiple times] <object dump> FIX <slab-cache>: <remedial action> This also adds some more intelligence to the data corruption detection. Its now capable of figuring out the start and end. Add a comment on how to configure SLUB so that a production system may continue to operate even though occasional slab corruption occur through a misbehaving kernel component. See "Emergency operations" in Documentation/vm/slub.txt. [akpm@linux-foundation.org: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16hugetlbfs: use lib/parser, fix docsRandy Dunlap1-5/+5
Use lib/parser.c to parse hugetlbfs mount options. Correct docs in hugetlbpage.txt. old size of hugetlbfs_fill_super: 675 bytes new size of hugetlbfs_fill_super: 686 bytes (hugetlbfs_parse_options() is inlined) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Adam Litke <agl@us.ibm.com> Acked-by: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16SLUB: support slub_debug on by defaultChristoph Lameter1-0/+2
Add a new configuration variable CONFIG_SLUB_DEBUG_ON If set then the kernel will be booted by default with slab debugging switched on. Similar to CONFIG_SLAB_DEBUG. By default slab debugging is available but must be enabled by specifying "slub_debug" as a kernel parameter. Also add support to switch off slab debugging for a kernel that was built with CONFIG_SLUB_DEBUG_ON. This works by specifying slub_debug=- as a kernel parameter. Dave Jones wanted this feature. http://marc.info/?l=linux-kernel&m=118072189913045&w=2 [akpm@linux-foundation.org: clean up switch statement] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>