mm/munlock: delete page_mlock() and all its works

We have recommended some applications to mlock their userspace, but that turns out to be counter-productive: when many processes mlock the same file, contention on rmap's i_mmap_rwsem can become intolerable at exit: it is needed for write, to remove any vma mapping that file from rmap's tree; but hogged for read by those with mlocks calling page_mlock() (formerly known as try_to_munlock()) on *each* page mapped from the file (the purpose being to find out whether another process has the page mlocked, so therefore it should not be unmlocked yet). Several optimizations have been made in the past: one is to skip page_mlock() when mapcount tells that nothing else has this page mapped; but that doesn't help at all when others do have it mapped. This time around, I initially intended to add a preliminary search of the rmap tree for overlapping VM_LOCKED ranges; but that gets messy with locking order, when in doubt whether a page is actually present; and risks adding even more contention on the i_mmap_rwsem. A solution would be much easier, if only there were space in struct page for an mlock_count... but actually, most of the time, there is space for it - an mlocked page spends most of its life on an unevictable LRU, but since 3.18 removed the scan_unevictable_pages sysctl, that "LRU" has been redundant. Let's try to reuse its page->lru. But leave that until a later patch: in this patch, clear the ground by removing page_mlock(), and all the infrastructure that has gathered around it - which mostly hinders understanding, and will make reviewing new additions harder. Don't mind those old comments about THPs, they date from before 4.5's refcounting rework: splitting is not a risk here. Just keep a minimal version of munlock_vma_page(), as reminder of what it should attend to (in particular, the odd way PGSTRANDED is counted out of PGMUNLOCKED), and likewise a stub for munlock_vma_pages_range(). Move unchanged __mlock_posix_error_return() out of the way, down to above its caller: this series then makes no further change after mlock_fixup(). After this and each following commit, the kernel builds, boots and runs; but with deficiencies which may show up in testing of mlock and munlock. The system calls succeed or fail as before, and mlock remains effective in preventing page reclaim; but meminfo's Unevictable and Mlocked amounts may be shown too low after mlock, grow, then stay too high after munlock: with previously mlocked pages remaining unevictable for too long, until finally unmapped and freed and counts corrected. Normal service will be resumed in "mm/munlock: mlock_pte_range() when mlocking or munlocking". Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
author: Hugh Dickins <hughd@google.com> 2022-02-14 18:20:24 -0800
committer: Matthew Wilcox (Oracle) <willy@infradead.org> 2022-02-17 11:56:13 -0500
commit: ebcbc6ea7d8a604ad8504dae70a6ac1b1e64a0b7 (patch)
tree: 014fb16d25e6e420bbaef89f800f58fd24983747 /mm/internal.h
parent: f71077a4d84bbe8c7b91b7db7c4ef815755ac5e3 (diff)
download: linux-ebcbc6ea7d8a604ad8504dae70a6ac1b1e64a0b7.tar.bz2
1 files changed, 1 insertions, 1 deletions
diff --git a/mm/internal.h b/mm/internal.h
index d80300392a19..e48c486d5ddf 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -409,7 +409,7 @@ static inline void munlock_vma_pages_all(struct vm_area_struct *vma)
  * must be called with vma's mmap_lock held for read or write, and page locked.
  */
 extern void mlock_vma_page(struct page *page);
-extern unsigned int munlock_vma_page(struct page *page);
+extern void munlock_vma_page(struct page *page);
 
 extern int mlock_future_check(struct mm_struct *mm, unsigned long flags,
 			      unsigned long len);
author	Hugh Dickins <hughd@google.com>	2022-02-14 18:20:24 -0800
committer	Matthew Wilcox (Oracle) <willy@infradead.org>	2022-02-17 11:56:13 -0500
commit	ebcbc6ea7d8a604ad8504dae70a6ac1b1e64a0b7 (patch)
tree	014fb16d25e6e420bbaef89f800f58fd24983747 /mm/internal.h
parent	f71077a4d84bbe8c7b91b7db7c4ef815755ac5e3 (diff)
download	linux-ebcbc6ea7d8a604ad8504dae70a6ac1b1e64a0b7.tar.bz2