From 90bd6fd31c8097ee4ddcb74b7e08363134863de5 Mon Sep 17 00:00:00 2001 From: Petr Holasek Date: Fri, 22 Feb 2013 16:35:00 -0800 Subject: ksm: allow trees per NUMA node Here's a KSM series, based on mmotm 2013-01-23-17-04: starting with Petr's v7 "KSM: numa awareness sysfs knob"; then fixing the two issues we had with that, fully enabling KSM page migration on the way. (A different kind of KSM/NUMA issue which I've certainly not begun to address here: when KSM pages are unmerged, there's usually no sense in preferring to allocate the new pages local to the caller's node.) This patch: Introduces new sysfs boolean knob /sys/kernel/mm/ksm/merge_across_nodes which control merging pages across different numa nodes. When it is set to zero only pages from the same node are merged, otherwise pages from all nodes can be merged together (default behavior). Typical use-case could be a lot of KVM guests on NUMA machine and cpus from more distant nodes would have significant increase of access latency to the merged ksm page. Sysfs knob was choosen for higher variability when some users still prefers higher amount of saved physical memory regardless of access latency. Every numa node has its own stable & unstable trees because of faster searching and inserting. Changing of merge_across_nodes value is possible only when there are not any ksm shared pages in system. I've tested this patch on numa machines with 2, 4 and 8 nodes and measured speed of memory access inside of KVM guests with memory pinned to one of nodes with this benchmark: http://pholasek.fedorapeople.org/alloc_pg.c Population standard deviations of access times in percentage of average were following: merge_across_nodes=1 2 nodes 1.4% 4 nodes 1.6% 8 nodes 1.7% merge_across_nodes=0 2 nodes 1% 4 nodes 0.32% 8 nodes 0.018% RFC: https://lkml.org/lkml/2011/11/30/91 v1: https://lkml.org/lkml/2012/1/23/46 v2: https://lkml.org/lkml/2012/6/29/105 v3: https://lkml.org/lkml/2012/9/14/550 v4: https://lkml.org/lkml/2012/9/23/137 v5: https://lkml.org/lkml/2012/12/10/540 v6: https://lkml.org/lkml/2012/12/23/154 v7: https://lkml.org/lkml/2012/12/27/225 Hugh notes that this patch brings two problems, whose solution needs further support in mm/ksm.c, which follows in subsequent patches: 1) switching merge_across_nodes after running KSM is liable to oops on stale nodes still left over from the previous stable tree; 2) memory hotremove may migrate KSM pages, but there is no provision here for !merge_across_nodes to migrate nodes to the proper tree. Signed-off-by: Petr Holasek Signed-off-by: Hugh Dickins Acked-by: Rik van Riel Cc: Andrea Arcangeli Cc: Izik Eidus Cc: Gerald Schaefer Cc: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/ksm.txt | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt index b392e496f816..25cc89ba811b 100644 --- a/Documentation/vm/ksm.txt +++ b/Documentation/vm/ksm.txt @@ -58,6 +58,13 @@ sleep_millisecs - how many milliseconds ksmd should sleep before next scan e.g. "echo 20 > /sys/kernel/mm/ksm/sleep_millisecs" Default: 20 (chosen for demonstration purposes) +merge_across_nodes - specifies if pages from different numa nodes can be merged. + When set to 0, ksm merges only pages which physically + reside in the memory area of same NUMA node. It brings + lower latency to access to shared page. Value can be + changed only when there is no ksm shared pages in system. + Default: 1 + run - set 0 to stop ksmd from running but keep merged pages, set 1 to run ksmd e.g. "echo 1 > /sys/kernel/mm/ksm/run", set 2 to stop ksmd and unmerge all pages currently merged, -- cgit v1.2.3