Merge tag 'fs.fixes.v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull mount_setattr fix from Christian Brauner: "The recent cleanup in e257039f0fc7 ("mount_setattr(): clean the control flow and calling conventions") switched the mount attribute codepaths from do-while to for loops as they are more idiomatic when walking mounts. However, we did originally choose do-while constructs because if we request a mount or mount tree to be made read-only we need to hold writers in the following way: The mount attribute code will grab lock_mount_hash() and then call mnt_hold_writers() which will _unconditionally_ set MNT_WRITE_HOLD on the mount. Any callers that need write access have to call mnt_want_write(). They will immediately see that MNT_WRITE_HOLD is set on the mount and the caller will then either spin (on non-preempt-rt) or wait on lock_mount_hash() (on preempt-rt). The fact that MNT_WRITE_HOLD is set unconditionally means that once mnt_hold_writers() returns we need to _always_ pair it with mnt_unhold_writers() in both the failure and success paths. The do-while constructs did take care of this. But Al's change to a for loop in the failure path stops on the first mount we failed to change mount attributes _without_ going into the loop to call mnt_unhold_writers(). This in turn means that once we failed to make a mount read-only via mount_setattr() - i.e. there are already writers on that mount - we will block any writers indefinitely. Fix this by ensuring that the for loop always unsets MNT_WRITE_HOLD including the first mount we failed to change to read-only. Also sprinkle a few comments into the cleanup code to remind people about what is happening including myself. After all, I didn't catch it during review. This is only relevant on mainline and was reported by syzbot. Details about the syzbot reports are all in the commit message" * tag 'fs.fixes.v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: fs: unset MNT_WRITE_HOLD on failure
author: Linus Torvalds <torvalds@linux-foundation.org> 2022-04-22 13:17:19 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2022-04-22 13:17:19 -0700
commit: 279b83c6731c73a2197a1724d67312ba415e0607 (patch)
tree: 6684b00e07a661266deb6924fea8274faa05b24d /fs
parent: 2d230968ad0d15250af54c6ac70c5ae95db63c78 (diff)
parent: 0014edaedfd804dbf35b009808789325ca615716 (diff)
download: linux-279b83c6731c73a2197a1724d67312ba415e0607.tar.bz2
1 files changed, 13 insertions, 1 deletions
diff --git a/fs/namespace.c b/fs/namespace.c
index a0a36bfa3aa0..afe2b64b14f1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4058,10 +4058,22 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
 	if (err) {
 		struct mount *p;
 
-		for (p = mnt; p != m; p = next_mnt(p, mnt)) {
+		/*
+		 * If we had to call mnt_hold_writers() MNT_WRITE_HOLD will
+		 * be set in @mnt_flags. The loop unsets MNT_WRITE_HOLD for all
+		 * mounts and needs to take care to include the first mount.
+		 */
+		for (p = mnt; p; p = next_mnt(p, mnt)) {
 			/* If we had to hold writers unblock them. */
 			if (p->mnt.mnt_flags & MNT_WRITE_HOLD)
 				mnt_unhold_writers(p);
+
+			/*
+			 * We're done once the first mount we changed got
+			 * MNT_WRITE_HOLD unset.
+			 */
+			if (p == m)
+				break;
 		}
 	}
 	return err;
author	Linus Torvalds <torvalds@linux-foundation.org>	2022-04-22 13:17:19 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2022-04-22 13:17:19 -0700
commit	279b83c6731c73a2197a1724d67312ba415e0607 (patch)
tree	6684b00e07a661266deb6924fea8274faa05b24d /fs
parent	2d230968ad0d15250af54c6ac70c5ae95db63c78 (diff)
parent	0014edaedfd804dbf35b009808789325ca615716 (diff)
download	linux-279b83c6731c73a2197a1724d67312ba415e0607.tar.bz2