proc: Use a list of inodes to flush from proc

Rework the flushing of proc to use a list of directory inodes that need to be flushed. The list is kept on struct pid not on struct task_struct, as there is a fixed connection between proc inodes and pids but at least for the case of de_thread the pid of a task_struct changes. This removes the dependency on proc_mnt which allows for different mounts of proc having different mount options even in the same pid namespace and this allows for the removal of proc_mnt which will trivially the first mount of proc to honor it's mount options. This flushing remains an optimization. The functions pid_delete_dentry and pid_revalidate ensure that ordinary dcache management will not attempt to use dentries past the point their respective task has died. When unused the shrinker will eventually be able to remove these dentries. There is a case in de_thread where proc_flush_pid can be called early for a given pid. Which winds up being safe (if suboptimal) as this is just an optiimization. Only pid directories are put on the list as the other per pid files are children of those directories and d_invalidate on the directory will get them as well. So that the pid can be used during flushing it's reference count is taken in release_task and dropped in proc_flush_pid. Further the call of proc_flush_pid is moved after the tasklist_lock is released in release_task so that it is certain that the pid has already been unhashed when flushing it taking place. This removes a small race where a dentry could recreated. As struct pid is supposed to be small and I need a per pid lock I reuse the only lock that currently exists in struct pid the the wait_pidfd.lock. The net result is that this adds all of this functionality with just a little extra list management overhead and a single extra pointer in struct pid. v2: Initialize pid->inodes. I somehow failed to get that initialization into the initial version of the patch. A boot failure was reported by "kernel test robot <lkp@intel.com>", and failure to initialize that pid->inodes matches all of the reported symptoms. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
author: Eric W. Biederman <ebiederm@xmission.com> 2020-02-19 18:22:26 -0600
committer: Eric W. Biederman <ebiederm@xmission.com> 2020-02-24 10:14:44 -0600
commit: 7bc3e6e55acf065500a24621f3b313e7e5998acf (patch)
tree: 97ea4c6eec0838079455dd56565226863c157ad2 /kernel/exit.c
parent: 71448011ea2a1cd36d8f5cbdab0ed716c454d565 (diff)
download: linux-7bc3e6e55acf065500a24621f3b313e7e5998acf.tar.bz2
1 files changed, 3 insertions, 1 deletions
diff --git a/kernel/exit.c b/kernel/exit.c
index 2833ffb0c211..502b4995b688 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -191,6 +191,7 @@ void put_task_struct_rcu_user(struct task_struct *task)
 void release_task(struct task_struct *p)
 {
 	struct task_struct *leader;
+	struct pid *thread_pid;
 	int zap_leader;
 repeat:
 	/* don't need to get the RCU readlock here - the process is dead and
@@ -199,11 +200,11 @@ repeat:
 	atomic_dec(&__task_cred(p)->user->processes);
 	rcu_read_unlock();
 
-	proc_flush_task(p);
 	cgroup_release(p);
 
 	write_lock_irq(&tasklist_lock);
 	ptrace_release_task(p);
+	thread_pid = get_pid(p->thread_pid);
 	__exit_signal(p);
 
 	/*
@@ -226,6 +227,7 @@ repeat:
 	}
 
 	write_unlock_irq(&tasklist_lock);
+	proc_flush_pid(thread_pid);
 	release_thread(p);
 	put_task_struct_rcu_user(p);
author	Eric W. Biederman <ebiederm@xmission.com>	2020-02-19 18:22:26 -0600
committer	Eric W. Biederman <ebiederm@xmission.com>	2020-02-24 10:14:44 -0600
commit	7bc3e6e55acf065500a24621f3b313e7e5998acf (patch)
tree	97ea4c6eec0838079455dd56565226863c157ad2 /kernel/exit.c
parent	71448011ea2a1cd36d8f5cbdab0ed716c454d565 (diff)
download	linux-7bc3e6e55acf065500a24621f3b313e7e5998acf.tar.bz2