summaryrefslogtreecommitdiffstats
path: root/include
diff options
context:
space:
mode:
authorJann Horn <jannh@google.com>2020-10-15 20:12:54 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2020-10-16 11:11:21 -0700
commita07279c9a8cd7dbd321640ff7210591599ee00a4 (patch)
treec1f27d5713449e3ec7762dcaddae067a19f2c145 /include
parent429a22e776a2b9f85a2b9c53d8e647598b553dd1 (diff)
downloadlinux-a07279c9a8cd7dbd321640ff7210591599ee00a4.tar.bz2
binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot
In both binfmt_elf and binfmt_elf_fdpic, use a new helper dump_vma_snapshot() to take a snapshot of the VMA list (including the gate VMA, if we have one) while protected by the mmap_lock, and then use that snapshot instead of walking the VMA list without locking. An alternative approach would be to keep the mmap_lock held across the entire core dumping operation; however, keeping the mmap_lock locked while we may be blocked for an unbounded amount of time (e.g. because we're dumping to a FUSE filesystem or so) isn't really optimal; the mmap_lock blocks things like the ->release handler of userfaultfd, and we don't really want critical system daemons to grind to a halt just because someone "gifted" them SCM_RIGHTS to an eternally-locked userfaultfd, or something like that. Since both the normal ELF code and the FDPIC ELF code need this functionality (and if any other binfmt wants to add coredump support in the future, they'd probably need it, too), implement this with a common helper in fs/coredump.c. A downside of this approach is that we now need a bigger amount of kernel memory per userspace VMA in the normal ELF case, and that we need O(n) kernel memory in the FDPIC ELF case at all; but 40 bytes per VMA shouldn't be terribly bad. There currently is a data race between stack expansion and anything that reads ->vm_start or ->vm_end under the mmap_lock held in read mode; to mitigate that for core dumping, take the mmap_lock in write mode when taking a snapshot of the VMA hierarchy. (If we only took the mmap_lock in read mode, we could end up with a corrupted core dump if someone does get_user_pages_remote() concurrently. Not really a major problem, but taking the mmap_lock either way works here, so we might as well avoid the issue.) (This doesn't do anything about the existing data races with stack expansion in other mm code.) Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-6-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'include')
-rw-r--r--include/linux/coredump.h10
1 files changed, 9 insertions, 1 deletions
diff --git a/include/linux/coredump.h b/include/linux/coredump.h
index bfecb8d79a7f..e58e8c207782 100644
--- a/include/linux/coredump.h
+++ b/include/linux/coredump.h
@@ -7,6 +7,12 @@
#include <linux/fs.h>
#include <asm/siginfo.h>
+struct core_vma_metadata {
+ unsigned long start, end;
+ unsigned long flags;
+ unsigned long dump_size;
+};
+
/*
* These are the only things you should do on a core-file: use only these
* functions to write out all the necessary info.
@@ -16,9 +22,11 @@ extern int dump_skip(struct coredump_params *cprm, size_t nr);
extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr);
extern int dump_align(struct coredump_params *cprm, int align);
extern void dump_truncate(struct coredump_params *cprm);
-unsigned long vma_dump_size(struct vm_area_struct *vma, unsigned long mm_flags);
int dump_user_range(struct coredump_params *cprm, unsigned long start,
unsigned long len);
+int dump_vma_snapshot(struct coredump_params *cprm, int *vma_count,
+ struct core_vma_metadata **vma_meta,
+ size_t *vma_data_size_ptr);
#ifdef CONFIG_COREDUMP
extern void do_coredump(const kernel_siginfo_t *siginfo);
#else