From 82ff165cd35110d4e380b55927bbd74dcb564998 Mon Sep 17 00:00:00 2001 From: Bhupesh Sharma Date: Thu, 23 Jul 2020 21:15:21 -0700 Subject: mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages() Prabhakar reported an OOPS inside mem_cgroup_get_nr_swap_pages() function in a corner case seen on some arm64 boards when kdump kernel runs with "cgroup_disable=memory" passed to the kdump kernel via bootargs. The root-cause behind the same is that currently mem_cgroup_swap_init() function is implemented as a subsys_initcall() call instead of a core_initcall(), this means 'cgroup_memory_noswap' still remains set to the default value (false) even when memcg is disabled via "cgroup_disable=memory" boot parameter. This may result in premature OOPS inside mem_cgroup_get_nr_swap_pages() function in corner cases: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 [0000000000000188] user address but active_mm is swapper Internal error: Oops: 96000006 [#1] SMP Modules linked in: <..snip..> Call trace: mem_cgroup_get_nr_swap_pages+0x9c/0xf4 shrink_lruvec+0x404/0x4f8 shrink_node+0x1a8/0x688 do_try_to_free_pages+0xe8/0x448 try_to_free_pages+0x110/0x230 __alloc_pages_slowpath.constprop.106+0x2b8/0xb48 __alloc_pages_nodemask+0x2ac/0x2f8 alloc_page_interleave+0x20/0x90 alloc_pages_current+0xdc/0xf8 atomic_pool_expand+0x60/0x210 __dma_atomic_pool_init+0x50/0xa4 dma_atomic_pool_init+0xac/0x158 do_one_initcall+0x50/0x218 kernel_init_freeable+0x22c/0x2d0 kernel_init+0x18/0x110 ret_from_fork+0x10/0x18 Code: aa1403e3 91106000 97f82a27 14000011 (f940c663) ---[ end trace 9795948475817de4 ]--- Kernel panic - not syncing: Fatal exception Rebooting in 10 seconds.. Fixes: eccb52e78809 ("mm: memcontrol: prepare swap controller setup for integration") Reported-by: Prabhakar Kushwaha Signed-off-by: Bhupesh Sharma Signed-off-by: Andrew Morton Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Vladimir Davydov Cc: James Morse Cc: Mark Rutland Cc: Will Deacon Cc: Catalin Marinas Link: http://lkml.kernel.org/r/1593641660-13254-2-git-send-email-bhsharma@redhat.com Signed-off-by: Linus Torvalds --- mm/memcontrol.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'mm/memcontrol.c') diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 19622328e4b5..c75c4face02e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7186,6 +7186,13 @@ static struct cftype memsw_files[] = { { }, /* terminate */ }; +/* + * If mem_cgroup_swap_init() is implemented as a subsys_initcall() + * instead of a core_initcall(), this could mean cgroup_memory_noswap still + * remains set to false even when memcg is disabled via "cgroup_disable=memory" + * boot parameter. This may result in premature OOPS inside + * mem_cgroup_get_nr_swap_pages() function in corner cases. + */ static int __init mem_cgroup_swap_init(void) { /* No memory control -> no swap control */ @@ -7200,6 +7207,6 @@ static int __init mem_cgroup_swap_init(void) return 0; } -subsys_initcall(mem_cgroup_swap_init); +core_initcall(mem_cgroup_swap_init); #endif /* CONFIG_MEMCG_SWAP */ -- cgit v1.2.3 From 8d22a9351035ef2ff12ef163a1091b8b8cf1e49c Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Thu, 23 Jul 2020 21:15:24 -0700 Subject: mm/memcg: fix refcount error while moving and swapping It was hard to keep a test running, moving tasks between memcgs with move_charge_at_immigrate, while swapping: mem_cgroup_id_get_many()'s refcount is discovered to be 0 (supposedly impossible), so it is then forced to REFCOUNT_SATURATED, and after thousands of warnings in quick succession, the test is at last put out of misery by being OOM killed. This is because of the way moved_swap accounting was saved up until the task move gets completed in __mem_cgroup_clear_mc(), deferred from when mem_cgroup_move_swap_account() actually exchanged old and new ids. Concurrent activity can free up swap quicker than the task is scanned, bringing id refcount down 0 (which should only be possible when offlining). Just skip that optimization: do that part of the accounting immediately. Fixes: 615d66c37c75 ("mm: memcontrol: fix memcg id ref counter on swap charge move") Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Reviewed-by: Alex Shi Cc: Johannes Weiner Cc: Alex Shi Cc: Shakeel Butt Cc: Michal Hocko Cc: Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2007071431050.4726@eggly.anvils Signed-off-by: Linus Torvalds --- mm/memcontrol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'mm/memcontrol.c') diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c75c4face02e..13f559af1ab6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5669,7 +5669,6 @@ static void __mem_cgroup_clear_mc(void) if (!mem_cgroup_is_root(mc.to)) page_counter_uncharge(&mc.to->memory, mc.moved_swap); - mem_cgroup_id_get_many(mc.to, mc.moved_swap); css_put_many(&mc.to->css, mc.moved_swap); mc.moved_swap = 0; @@ -5860,7 +5859,8 @@ put: /* get_mctgt_type() gets the page */ ent = target.ent; if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { mc.precharge--; - /* we fixup refcnts and charges later. */ + mem_cgroup_id_get_many(mc.to, 1); + /* we fixup other refcnts and charges later. */ mc.moved_swap++; } break; -- cgit v1.2.3