<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/mempolicy.c, branch v2.6.31-rc2</title>
<subtitle>Linux Kernel (branches are rebased on master from time to time)</subtitle>
<id>https://sre.ring0.de/linux/atom?h=v2.6.31-rc2</id>
<link rel='self' href='https://sre.ring0.de/linux/atom?h=v2.6.31-rc2'/>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/'/>
<updated>2009-06-17T02:47:32Z</updated>
<entry>
<title>page allocator: do not check NUMA node ID when the caller knows the node is valid</title>
<updated>2009-06-17T02:47:32Z</updated>
<author>
<name>Mel Gorman</name>
<email>mel@csn.ul.ie</email>
</author>
<published>2009-06-16T22:31:54Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=6484eb3e2a81807722c5f28efef94d8338b7b996'/>
<id>urn:sha1:6484eb3e2a81807722c5f28efef94d8338b7b996</id>
<content type='text'>
Callers of alloc_pages_node() can optionally specify -1 as a node to mean
"allocate from the current node".  However, a number of the callers in
fast paths know for a fact their node is valid.  To avoid a comparison and
branch, this patch adds alloc_pages_exact_node() that only checks the nid
with VM_BUG_ON().  Callers that know their node is valid are then
converted.

Signed-off-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Reviewed-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Acked-by: Paul Mundt &lt;lethal@linux-sh.org&gt;	[for the SLOB NUMA bits]
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Dave Hansen &lt;dave@linux.vnet.ibm.com&gt;
Cc: Lee Schermerhorn &lt;Lee.Schermerhorn@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cpuset,mm: update tasks' mems_allowed in time</title>
<updated>2009-06-17T02:47:31Z</updated>
<author>
<name>Miao Xie</name>
<email>miaox@cn.fujitsu.com</email>
</author>
<published>2009-06-16T22:31:49Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=58568d2a8215cb6f55caf2332017d7bdff954e1c'/>
<id>urn:sha1:58568d2a8215cb6f55caf2332017d7bdff954e1c</id>
<content type='text'>
Fix allocating page cache/slab object on the unallowed node when memory
spread is set by updating tasks' mems_allowed after its cpuset's mems is
changed.

In order to update tasks' mems_allowed in time, we must modify the code of
memory policy.  Because the memory policy is applied in the process's
context originally.  After applying this patch, one task directly
manipulates anothers mems_allowed, and we use alloc_lock in the
task_struct to protect mems_allowed and memory policy of the task.

But in the fast path, we didn't use lock to protect them, because adding a
lock may lead to performance regression.  But if we don't add a lock,the
task might see no nodes when changing cpuset's mems_allowed to some
non-overlapping set.  In order to avoid it, we set all new allowed nodes,
then clear newly disallowed ones.

[lee.schermerhorn@hp.com:
  The rework of mpol_new() to extract the adjusting of the node mask to
  apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind()
  with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local
  allocation.  Fix this by adding the check for MPOL_PREFERRED and empty
  node mask to mpol_new_mpolicy().

  Remove the now unneeded 'nodes = NULL' from mpol_new().

  Note that mpol_new_mempolicy() is always called with a non-NULL
  'nodes' parameter now that it has been removed from mpol_new().
  Therefore, we don't need to test nodes for NULL before testing it for
  'empty'.  However, just to be extra paranoid, add a VM_BUG_ON() to
  verify this assumption.]
[lee.schermerhorn@hp.com:

  I don't think the function name 'mpol_new_mempolicy' is descriptive
  enough to differentiate it from mpol_new().

  This function applies cpuset set context, usually constraining nodes
  to those allowed by the cpuset.  However, when the 'RELATIVE_NODES flag
  is set, it also translates the nodes.  So I settled on
  'mpol_set_nodemask()', because the comment block for mpol_new() mentions
  that we need to call this function to "set nodes".

  Some additional minor line length, whitespace and typo cleanup.]
Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>[CVE-2009-0029] System call wrappers part 28</title>
<updated>2009-01-14T13:15:30Z</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2009-01-14T13:14:30Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=938bb9f5e840eddbf54e4f62f6c5ba9b3ae12c9d'/>
<id>urn:sha1:938bb9f5e840eddbf54e4f62f6c5ba9b3ae12c9d</id>
<content type='text'>
Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'master' into next</title>
<updated>2008-11-14T00:29:12Z</updated>
<author>
<name>James Morris</name>
<email>jmorris@namei.org</email>
</author>
<published>2008-11-14T00:29:12Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=2b828925652340277a889cbc11b2d0637f7cdaf7'/>
<id>urn:sha1:2b828925652340277a889cbc11b2d0637f7cdaf7</id>
<content type='text'>
Conflicts:
	security/keys/internal.h
	security/keys/process_keys.c
	security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris &lt;jmorris@namei.org&gt;
</content>
</entry>
<entry>
<title>CRED: Use RCU to access another task's creds and to release a task's own creds</title>
<updated>2008-11-13T23:39:19Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2008-11-13T23:39:19Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=c69e8d9c01db2adc503464993c358901c9af9de4'/>
<id>urn:sha1:c69e8d9c01db2adc503464993c358901c9af9de4</id>
<content type='text'>
Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: James Morris &lt;jmorris@namei.org&gt;
Acked-by: Serge Hallyn &lt;serue@us.ibm.com&gt;
Signed-off-by: James Morris &lt;jmorris@namei.org&gt;
</content>
</entry>
<entry>
<title>CRED: Separate task security context from task_struct</title>
<updated>2008-11-13T23:39:16Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2008-11-13T23:39:16Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=b6dff3ec5e116e3af6f537d4caedcad6b9e5082a'/>
<id>urn:sha1:b6dff3ec5e116e3af6f537d4caedcad6b9e5082a</id>
<content type='text'>
Separate the task security context from task_struct.  At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.

Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.

With comment fixes Signed-off-by: Marc Dionne &lt;marc.c.dionne@gmail.com&gt;

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: James Morris &lt;jmorris@namei.org&gt;
Acked-by: Serge Hallyn &lt;serue@us.ibm.com&gt;
Signed-off-by: James Morris &lt;jmorris@namei.org&gt;
</content>
</entry>
<entry>
<title>CRED: Wrap task credential accesses in the core kernel</title>
<updated>2008-11-13T23:39:12Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2008-11-13T23:39:12Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=76aac0e9a17742e60d408be1a706e9aaad370891'/>
<id>urn:sha1:76aac0e9a17742e60d408be1a706e9aaad370891</id>
<content type='text'>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current-&gt;(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task-&gt;e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: James Morris &lt;jmorris@namei.org&gt;
Acked-by: Serge Hallyn &lt;serue@us.ibm.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-audit@redhat.com
Cc: containers@lists.linux-foundation.org
Cc: linux-mm@kvack.org
Signed-off-by: James Morris &lt;jmorris@namei.org&gt;
</content>
</entry>
<entry>
<title>mm: move migrate_prep out from under mmap_sem</title>
<updated>2008-11-06T23:41:18Z</updated>
<author>
<name>Christoph Lameter</name>
<email>cl@linux-foundation.org</email>
</author>
<published>2008-11-06T20:53:30Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=0aedadf91a70a11c4a3e7c7d99b21e5528af8d5d'/>
<id>urn:sha1:0aedadf91a70a11c4a3e7c7d99b21e5528af8d5d</id>
<content type='text'>
Move the migrate_prep outside the mmap_sem for the following system calls

1. sys_move_pages
2. sys_migrate_pages
3. sys_mbind()

It really does not matter when we flush the lru.  The system is free to
add pages onto the lru even during migration which will make the page
migration either skip the page (mbind, migrate_pages) or return a busy
state (move_pages).

Fixes this lockdep warning (and potential deadlock):

Some VM place has
      mmap_sem -&gt; kevent_wq via lru_add_drain_all()

net/core/dev.c::dev_ioctl()  has
     rtnl_lock  -&gt;  mmap_sem        (*) the ioctl has copy_from_user() and it can do page fault.

linkwatch_event has
     kevent_wq -&gt; rtnl_lock

Signed-off-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reported-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Hugh Dickins &lt;hugh@veritas.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Unevictable LRU Infrastructure</title>
<updated>2008-10-20T15:50:26Z</updated>
<author>
<name>Lee Schermerhorn</name>
<email>Lee.Schermerhorn@hp.com</email>
</author>
<published>2008-10-19T03:26:39Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=894bc310419ac95f4fa4142dc364401a7e607f65'/>
<id>urn:sha1:894bc310419ac95f4fa4142dc364401a7e607f65</id>
<content type='text'>
When the system contains lots of mlocked or otherwise unevictable pages,
the pageout code (kswapd) can spend lots of time scanning over these
pages.  Worse still, the presence of lots of unevictable pages can confuse
kswapd into thinking that more aggressive pageout modes are required,
resulting in all kinds of bad behaviour.

Infrastructure to manage pages excluded from reclaim--i.e., hidden from
vmscan.  Based on a patch by Larry Woodman of Red Hat.  Reworked to
maintain "unevictable" pages on a separate per-zone LRU list, to "hide"
them from vmscan.

Kosaki Motohiro added the support for the memory controller unevictable
lru list.

Pages on the unevictable list have both PG_unevictable and PG_lru set.
Thus, PG_unevictable is analogous to and mutually exclusive with
PG_active--it specifies which LRU list the page is on.

The unevictable infrastructure is enabled by a new mm Kconfig option
[CONFIG_]UNEVICTABLE_LRU.

A new function 'page_evictable(page, vma)' in vmscan.c tests whether or
not a page may be evictable.  Subsequent patches will add the various
!evictable tests.  We'll want to keep these tests light-weight for use in
shrink_active_list() and, possibly, the fault path.

To avoid races between tasks putting pages [back] onto an LRU list and
tasks that might be moving the page from non-evictable to evictable state,
the new function 'putback_lru_page()' -- inverse to 'isolate_lru_page()'
-- tests the "evictability" of a page after placing it on the LRU, before
dropping the reference.  If the page has become unevictable,
putback_lru_page() will redo the 'putback', thus moving the page to the
unevictable list.  This way, we avoid "stranding" evictable pages on the
unevictable list.

[akpm@linux-foundation.org: fix fallout from out-of-order merge]
[riel@redhat.com: fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build]
[nishimura@mxp.nes.nec.co.jp: remove redundant mapping check]
[kosaki.motohiro@jp.fujitsu.com: unevictable-lru-infrastructure: putback_lru_page()/unevictable page handling rework]
[kosaki.motohiro@jp.fujitsu.com: kill unnecessary lock_page() in vmscan.c]
[kosaki.motohiro@jp.fujitsu.com: revert migration change of unevictable lru infrastructure]
[kosaki.motohiro@jp.fujitsu.com: revert to unevictable-lru-infrastructure-kconfig-fix.patch]
[kosaki.motohiro@jp.fujitsu.com: restore patch failure of vmstat-unevictable-and-mlocked-pages-vm-events.patch]
Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Debugged-by: Benjamin Kidwell &lt;benjkidwell@yahoo.com&gt;
Signed-off-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vmscan: move isolate_lru_page() to vmscan.c</title>
<updated>2008-10-20T15:50:25Z</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2008-10-19T03:26:09Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=62695a84eb8f2e718bf4dfb21700afaa7a08e0ea'/>
<id>urn:sha1:62695a84eb8f2e718bf4dfb21700afaa7a08e0ea</id>
<content type='text'>
On large memory systems, the VM can spend way too much time scanning
through pages that it cannot (or should not) evict from memory.  Not only
does it use up CPU time, but it also provokes lock contention and can
leave large systems under memory presure in a catatonic state.

This patch series improves VM scalability by:

1) putting filesystem backed, swap backed and unevictable pages
   onto their own LRUs, so the system only scans the pages that it
   can/should evict from memory

2) switching to two handed clock replacement for the anonymous LRUs,
   so the number of pages that need to be scanned when the system
   starts swapping is bound to a reasonable number

3) keeping unevictable pages off the LRU completely, so the
   VM does not waste CPU time scanning them. ramfs, ramdisk,
   SHM_LOCKED shared memory segments and mlock()ed VMA pages
   are keept on the unevictable list.

This patch:

isolate_lru_page logically belongs to be in vmscan.c than migrate.c.

It is tough, because we don't need that function without memory migration
so there is a valid argument to have it in migrate.c.  However a
subsequent patch needs to make use of it in the core mm, so we can happily
move it to vmscan.c.

Also, make the function a little more generic by not requiring that it
adds an isolated page to a given list.  Callers can do that.

	Note that we now have '__isolate_lru_page()', that does
	something quite different, visible outside of vmscan.c
	for use with memory controller.  Methinks we need to
	rationalize these names/purposes.	--lts

[akpm@linux-foundation.org: fix mm/memory_hotplug.c build]
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Lee Schermerhorn &lt;Lee.Schermerhorn@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
