summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2022-04-12Merge tag 'nfsd-5.18-1' of ↵Linus Torvalds2-22/+27
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a write performance regression - Fix crashes during request deferral on RDMA transports * tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix the svc_deferred_event trace class SUNRPC: Fix NFSD's request deferral on RDMA transports nfsd: Clean up nfsd_file_put() nfsd: Fix a write performance regression SUNRPC: Return true/false (not 1/0) from bool functions
2022-04-12stat: fix inconsistency between struct stat and struct compat_statMikulas Patocka1-9/+10
struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit st_dev and st_rdev; struct compat_stat (defined in arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by a 16-bit padding. This patch fixes struct compat_stat to match struct stat. [ Historical note: the old x86 'struct stat' did have that 16-bit field that the compat layer had kept around, but it was changes back in 2003 by "struct stat - support larger dev_t": https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d and back in those days, the x86_64 port was still new, and separate from the i386 code, and had already picked up the old version with a 16-bit st_dev field ] Note that we can't change compat_dev_t because it is used by compat_loop_info. Also, if the st_dev and st_rdev values are 32-bit, we don't have to use old_valid_dev to test if the value fits into them. This fixes -EOVERFLOW on filesystems that are on NVMe because NVMe uses the major number 259. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-10Merge tag 'driver-core-5.18-rc2' of ↵Linus Torvalds1-13/+0
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here are two small driver core changes for 5.18-rc2. They are the final bits in the removal of the default_attrs field in struct kobj_type. I had to wait until after 5.18-rc1 for all of the changes to do this came in through different development trees, and then one new user snuck in. So this series has two changes: - removal of the default_attrs field in the powerpc/pseries/vas code. The change has been acked by the PPC maintainers to come through this tree - removal of default_attrs from struct kobj_type now that all in-kernel users are removed. This cleans up the kobject code a little bit and removes some duplicated functionality that confused people (now there is only one way to do default groups) Both of these have been in linux-next for all of this week with no reported problems" * tag 'driver-core-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: kobject: kobj_type: remove default_attrs powerpc/pseries/vas: use default_groups in kobj_type
2022-04-08Merge tag 'io_uring-5.18-2022-04-08' of git://git.kernel.dk/linux-blockLinus Torvalds2-420/+198
Pull io_uring fixes from Jens Axboe: "A bit bigger than usual post merge window, largely due to a revert and a fix of at what point files are assigned for requests. The latter fixing a linked request use case where a dependent link can rely on what file is assigned consistently. Summary: - 32-bit compat fix for IORING_REGISTER_IOWQ_AFF (Eugene) - File assignment fixes (me) - Revert of the NAPI poll addition from this merge window. The author isn't available right now to engage on this, so let's revert it and we can retry for the 5.19 release (me, Jakub) - Fix a timeout removal race (me) - File update and SCM fixes (Pavel)" * tag 'io_uring-5.18-2022-04-08' of git://git.kernel.dk/linux-block: io_uring: fix race between timeout flush and removal io_uring: use nospec annotation for more indexes io_uring: zero tag on rsrc removal io_uring: don't touch scm_fp_list after queueing skb io_uring: nospec index for tags on files update io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF Revert "io_uring: Add support for napi_busy_poll" io_uring: drop the old style inflight file tracking io_uring: defer file assignment io_uring: propagate issue_flags state down to file assignment io_uring: move read/write file prep state into actual opcode handler io_uring: defer splice/tee file validity check until command issue io_uring: don't check req->file in io_fsync_prep()
2022-04-08io_uring: fix race between timeout flush and removalJens Axboe1-4/+3
io_flush_timeouts() assumes the timeout isn't in progress of triggering or being removed/canceled, so it unconditionally removes it from the timeout list and attempts to cancel it. Leave it on the list and let the normal timeout cancelation take care of it. Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-08Merge tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds9-24/+22
Pull NFS client fixes from Trond Myklebust: "Stable fixes: - SUNRPC: Ensure we flush any closed sockets before xs_xprt_free() Bugfixes: - Fix an Oopsable condition due to SLAB_ACCOUNT setting in the NFSv4.2 xattr code. - Fix for open() using an file open mode of '3' in NFSv4 - Replace readdir's use of xxhash() with hash_64() - Several patches to handle malloc() failure in SUNRPC" * tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg() SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec() SUNRPC: Handle allocation failure in rpc_new_task() NFS: Ensure rpc_run_task() cannot fail in nfs_async_rename() NFSv4/pnfs: Handle RPC allocation errors in nfs4_proc_layoutget SUNRPC: Handle low memory situations in call_status() SUNRPC: Handle ENOMEM in call_transmit_status() NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocation SUNRPC: Ensure we flush any closed sockets before xs_xprt_free() NFS: Replace readdir's use of xxhash() with hash_64() SUNRPC: handle malloc failure in ->request_prepare NFSv4: fix open failure with O_ACCMODE flag Revert "NFSv4: Handle the special Linux file open access mode"
2022-04-07Merge tag '5.18-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds4-12/+17
Pull cifs client fixes from Steve French: - reconnect fixes: one for DFS and one to avoid a reconnect race - small change to deal with upcoming behavior change of list iterators * tag '5.18-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module number cifs: force new session setup and tcon for dfs cifs: remove check of list iterator against head past the loop body cifs: fix potential race with cifsd thread
2022-04-07NFS: Ensure rpc_run_task() cannot fail in nfs_async_rename()Trond Myklebust1-0/+1
Ensure the call to rpc_run_task() cannot fail by preallocating the rpc_task. Fixes: 910ad38697d9 ("NFS: Fix memory allocation in rpc_alloc_task()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07NFSv4/pnfs: Handle RPC allocation errors in nfs4_proc_layoutgetTrond Myklebust1-0/+2
If rpc_run_task() fails due to an allocation error, then bail out early. Fixes: 910ad38697d9 ("NFS: Fix memory allocation in rpc_alloc_task()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocationMuchun Song1-1/+1
The commit 5c60e89e71f8 ("NFSv4.2: Fix up an invalid combination of memory allocation flags") has stripped GFP_KERNEL_ACCOUNT down to GFP_KERNEL, however, it forgot to remove SLAB_ACCOUNT from kmem_cache allocation. It means that memory is still limited by kmemcg. This patch also fix a NULL pointer reference issue [1] reported by NeilBrown. Link: https://lore.kernel.org/all/164870069595.25542.17292003658915487357@noble.neil.brown.name/ [1] Fixes: 5c60e89e71f8 ("NFSv4.2: Fix up an invalid combination of memory allocation flags") Fixes: 5abc1e37afa0 ("mm: list_lru: allocate list_lru_one only when needed") Reported-by: NeilBrown <neilb@suse.de> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()Trond Myklebust1-0/+1
We must ensure that all sockets are closed before we call xprt_free() and release the reference to the net namespace. The problem is that calling fput() will defer closing the socket until delayed_fput() gets called. Let's fix the situation by allowing rpciod and the transport teardown code (which runs on the system wq) to call __fput_sync(), and directly close the socket. Reported-by: Felix Fu <foyjog@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: a73881c96d73 ("SUNRPC: Fix an Oops in udp_poll()") Cc: stable@vger.kernel.org # 5.1.x: 3be232f11a3c: SUNRPC: Prevent immediate close+reconnect Cc: stable@vger.kernel.org # 5.1.x: 89f42494f92f: SUNRPC: Don't call connect() more than once on a TCP socket Cc: stable@vger.kernel.org # 5.1.x Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07NFS: Replace readdir's use of xxhash() with hash_64()Trond Myklebust2-10/+3
Both xxhash() and hash_64() appear to give similarly low collision rates with a standard linearly increasing readdir offset. They both give similarly higher collision rates when applied to ext4's offsets. So switch to using the standard hash_64(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07io_uring: use nospec annotation for more indexesPavel Begunkov1-6/+5
There are still several places that using pre array_index_nospec() indexes, fix them up. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b01ef5ee83f72ed35ad525912370b729f5d145f4.1649336342.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: zero tag on rsrc removalPavel Begunkov1-1/+3
Automatically default rsrc tag in io_queue_rsrc_removal(), it's safer than leaving it there and relying on the rest of the code to behave and not use it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1cf262a50df17478ea25b22494dcc19f3a80301f.1649336342.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: don't touch scm_fp_list after queueing skbPavel Begunkov1-2/+6
It's safer to not touch scm_fp_list after we queued an skb to which it was assigned, there might be races lurking if we screw subtle sync guarantees on the io_uring side. Fixes: 6b06314c47e14 ("io_uring: add file set registration") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: nospec index for tags on files updatePavel Begunkov1-1/+1
Don't forget to array_index_nospec() for indexes before updating rsrc tags in __io_sqe_files_update(), just use already safe and precalculated index @i. Fixes: c3bdad0271834 ("io_uring: add generic rsrc update with tags") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFFEugene Syromiatnikov1-1/+9
Similarly to the way it is done im mbind syscall. Cc: stable@vger.kernel.org # 5.14 Fixes: fe76421d1da1dcdb ("io_uring: allow user configurable IO thread CPU affinity") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07Revert "io_uring: Add support for napi_busy_poll"Jens Axboe1-231/+1
This reverts commit adc8682ec69012b68d5ab7123e246d2ad9a6f94b. There's some discussion on the API not being as good as it can be. Rather than ship something and be stuck with it forever, let's revert the NAPI support for now and work on getting something sorted out for the next kernel release instead. Link: https://lore.kernel.org/io-uring/b7bbc124-8502-0ee9-d4c8-7c41b4487264@kernel.dk/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: drop the old style inflight file trackingJens Axboe1-58/+27
io_uring tracks requests that are referencing an io_uring descriptor to be able to cancel without worrying about loops in the references. Since we now assign the file at execution time, the easier approach is to drop a potentially problematic reference before we punt the request. This eliminates the need to special case these types of files beyond just marking them as such, and simplifies cancelation quite a bit. This also fixes a recent issue where an async punted tee operation would with the io_uring descriptor as the output file would crash when attempting to get a reference to the file from the io-wq worker. We could have worked around that, but this is the much cleaner fix. Fixes: 6bf9c47a3989 ("io_uring: defer file assignment") Reported-by: syzbot+c4b9303500a21750b250@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: defer file assignmentJens Axboe2-10/+30
If an application uses direct open or accept, it knows in advance what direct descriptor value it will get as it picks it itself. This allows combined requests such as: sqe = io_uring_get_sqe(ring); io_uring_prep_openat_direct(sqe, ..., file_slot); sqe->flags |= IOSQE_IO_LINK | IOSQE_CQE_SKIP_SUCCESS; sqe = io_uring_get_sqe(ring); io_uring_prep_read(sqe,file_slot, buf, buf_size, 0); sqe->flags |= IOSQE_FIXED_FILE; io_uring_submit(ring); where we prepare both a file open and read, and only get a completion event for the read when both have completed successfully. Currently links are fully prepared before the head is issued, but that fails if the dependent link needs a file assigned that isn't valid until the head has completed. Conversely, if the same chain is performed but the fixed file slot is already valid, then we would be unexpectedly returning data from the old file slot rather than the newly opened one. Make sure we're consistent here. Allow deferral of file setup, which makes this documented case work. Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: propagate issue_flags state down to file assignmentJens Axboe1-35/+47
We'll need this in a future patch, when we could be assigning the file after the prep stage. While at it, get rid of the io_file_get() helper, it just makes the code harder to read. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-05Merge tag 'for-5.18-rc1-tag' of ↵Linus Torvalds6-55/+81
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - prevent deleting subvolume with active swapfile - fix qgroup reserve limit calculation overflow - remove device count in superblock and its item in one transaction so they cant't get out of sync - skip defragmenting an isolated sector, this could cause some extra IO - unify handling of mtime/permissions in hole punch with fallocate - zoned mode fixes: - remove assert checking for only single mode, we have the DUP mode implemented - fix potential lockdep warning while traversing devices when checking for zone activation * tag 'for-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: prevent subvol with swapfile from being deleted btrfs: do not warn for free space inode in cow_file_range btrfs: avoid defragging extents whose next extents are not targets btrfs: fix fallocate to use file_modified to update permissions consistently btrfs: remove device item and update super block in the same transaction btrfs: fix qgroup reserve overflow the qgroup limit btrfs: zoned: remove left over ASSERT checking for single profile btrfs: zoned: traverse devices under chunk_mutex in btrfs_can_activate_zone
2022-04-05kobject: kobj_type: remove default_attrsGreg Kroah-Hartman1-13/+0
Now that all in-kernel users of default_attrs for the kobj_type are gone and converted to properly use the default_groups pointer instead, it can be safely removed. There is one standard way to create sysfs files in a kobj_type, and not two like before, causing confusion as to which should be used. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20220106133151.607703-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-04cifs: update internal module numberSteve French1-1/+1
To 2.36 Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-04cifs: force new session setup and tcon for dfsPaulo Alcantara1-5/+8
Do not reuse existing sessions and tcons in DFS failover as it might connect to different servers and shares. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-04io_uring: move read/write file prep state into actual opcode handlerJens Axboe1-48/+53
In preparation for not necessarily having a file assigned at prep time, defer any initialization associated with the file to when the opcode handler is run. Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-04io_uring: defer splice/tee file validity check until command issueJens Axboe1-28/+21
In preparation for not using the file at prep time, defer checking if this file refers to a valid io_uring instance until issue time. This also means we can get rid of the cleanup flag for splice and tee. Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-04cifs: remove check of list iterator against head past the loop bodyJakob Koschel1-4/+6
When list_for_each_entry() completes the iteration over the whole list without breaking the loop, the iterator value will be a bogus pointer computed based on the head element. While it is safe to use the pointer to determine if it was computed based on the head element, either with list_entry_is_head() or &pos->member == head, using the iterator variable after the loop should be avoided. In preparation to limit the scope of a list iterator to the list traversal loop, use a dedicated pointer to point to the found element [1]. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-04cifs: fix potential race with cifsd threadPaulo Alcantara2-2/+2
To avoid racing with demultiplex thread while it is handling data on socket, use cifs_signal_cifsd_for_reconnect() helper for marking current server to reconnect and let the demultiplex thread handle the rest. Fixes: dca65818c80c ("cifs: use a different reconnect helper for non-cifsd threads") Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-03io_uring: don't check req->file in io_fsync_prep()Jens Axboe1-3/+0
This is a leftover from the really old days where we weren't able to track and error early if we need a file and it wasn't assigned. Kill the check. Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-03Merge tag 'trace-v5.18-2' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing updates from Steven Rostedt: - Rename the staging files to give them some meaning. Just stage1,stag2,etc, does not show what they are for - Check for NULL from allocation in bootconfig - Hold event mutex for dyn_event call in user events - Mark user events to broken (to work on the API) - Remove eBPF updates from user events - Remove user events from uapi header to keep it from being installed. - Move ftrace_graph_is_dead() into inline as it is called from hot paths and also convert it into a static branch. * tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Move user_events.h temporarily out of include/uapi ftrace: Make ftrace_graph_is_dead() a static branch tracing: Set user_events to BROKEN tracing/user_events: Remove eBPF interfaces tracing/user_events: Hold event_mutex during dyn_event_add proc: bootconfig: Add null pointer check tracing: Rename the staging files for trace_events
2022-04-02proc: bootconfig: Add null pointer checkLv Ruyi1-0/+2
kzalloc is a memory allocation function which can return NULL when some internal memory errors happen. It is safer to add null pointer check. Link: https://lkml.kernel.org/r/20220329104004.2376879-1-lv.ruyi@zte.com.cn Cc: stable@vger.kernel.org Fixes: c1a3c36017d4 ("proc: bootconfig: Add /proc/bootconfig to show boot config list") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-01Merge branch 'work.misc' of ↵Linus Torvalds6-22/+16
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted bits and pieces" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aio: drop needless assignment in aio_read() clean overflow checks in count_mounts() a bit seq_file: fix NULL pointer arithmetic warning uml/x86: use x86 load_unaligned_zeropad() asm/user.h: killed unused macros constify struct path argument of finish_automount()/do_add_mount() fs: Remove FIXME comment in generic_write_checks()
2022-04-01Merge tag 'vfs-5.18-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-1/+1
Pull vfs fix from Darrick Wong: "The erofs developers felt that FIEMAP should handle ranged requests starting at s_maxbytes by returning EFBIG instead of passing the filesystem implementation a nonsense 0-byte request. Not sure why they keep tagging this 'iomap', but the VFS shouldn't be asking for information about ranges of a file that the filesystem already declared that it does not support. - Fix a potential infinite loop in FIEMAP by fixing an off by one error when comparing the requested range against s_maxbytes" * tag 'vfs-5.18-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fs: fix an infinite loop in iomap_fiemap
2022-04-01Merge tag 'xfs-5.18-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds18-246/+347
Pull xfs fixes from Darrick Wong: "This fixes multiple problems in the reserve pool sizing functions: an incorrect free space calculation, a pointless infinite loop, and even more braindamage that could result in the pool being overfilled. The pile of patches from Dave fix myriad races and UAF bugs in the log recovery code that much to our mutual surprise nobody's tripped over. Dave also fixed a performance optimization that had turned into a regression. Dave Chinner is taking over as XFS maintainer starting Sunday and lasting until 5.19-rc1 is tagged so that I can focus on starting a massive design review for the (feature complete after five years) online repair feature. From then on, he and I will be moving XFS to a co-maintainership model by trading duties every other release. NOTE: I hope very strongly that the other pieces of the (X)FS ecosystem (fstests and xfsprogs) will make similar changes to spread their maintenance load. Summary: - Fix an incorrect free space calculation in xfs_reserve_blocks that could lead to a request for free blocks that will never succeed. - Fix a hang in xfs_reserve_blocks caused by an infinite loop and the incorrect free space calculation. - Fix yet a third problem in xfs_reserve_blocks where multiple racing threads can overfill the reserve pool. - Fix an accounting error that lead to us reporting reserved space as "available". - Fix a race condition during abnormal fs shutdown that could cause UAF problems when memory reclaim and log shutdown try to clean up inodes. - Fix a bug where log shutdown can race with unmount to tear down the log, thereby causing UAF errors. - Disentangle log and filesystem shutdown to reduce confusion. - Fix some confusion in xfs_trans_commit such that a race between transaction commit and filesystem shutdown can cause unlogged dirty inode metadata to be committed, thereby corrupting the filesystem. - Remove a performance optimization in the log as it was discovered that certain storage hardware handle async log flushes so poorly as to cause serious performance regressions. Recent restructuring of other parts of the logging code mean that no performance benefit is seen on hardware that handle it well" * tag 'xfs-5.18-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: drop async cache flushes from CIL commits. xfs: shutdown during log recovery needs to mark the log shutdown xfs: xfs_trans_commit() path must check for log shutdown xfs: xfs_do_force_shutdown needs to block racing shutdowns xfs: log shutdown triggers should only shut down the log xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks xfs: shutdown in intent recovery has non-intent items in the AIL xfs: aborting inodes on shutdown may need buffer lock xfs: don't report reserved bnobt space as available xfs: fix overfilling of reserve pool xfs: always succeed at setting the reserve pool size xfs: remove infinite loop when reserving free block pool xfs: don't include bnobt blocks when reserving free block pool xfs: document the XFS_ALLOC_AGFL_RESERVE constant
2022-04-01Merge tag 'for-5.18/io_uring-2022-04-01' of git://git.kernel.dk/linux-blockLinus Torvalds1-23/+110
Pull io_uring fixes from Jens Axboe: "A little bit all over the map, some regression fixes for this merge window, and some general fixes that are stable bound. In detail: - Fix an SQPOLL memory ordering issue (Almog) - Accept fixes (Dylan) - Poll fixes (me) - Fixes for provided buffers and recycling (me) - Tweak to IORING_OP_MSG_RING command added in this merge window (me) - Memory leak fix (Pavel) - Misc fixes and tweaks (Pavel, me)" * tag 'for-5.18/io_uring-2022-04-01' of git://git.kernel.dk/linux-block: io_uring: defer msg-ring file validity check until command issue io_uring: fail links if msg-ring doesn't succeeed io_uring: fix memory leak of uid in files registration io_uring: fix put_kbuf without proper locking io_uring: fix invalid flags for io_put_kbuf() io_uring: improve req fields comments io_uring: enable EPOLLEXCLUSIVE for accept poll io_uring: improve task work cache utilization io_uring: fix async accept on O_NONBLOCK sockets io_uring: remove IORING_CQE_F_MSG io_uring: add flag for disabling provided buffer recycling io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly io_uring: don't recycle provided buffer if punted to async worker io_uring: fix assuming triggered poll waitqueue is the single poll io_uring: bump poll refs to full 31-bits io_uring: remove poll entry from list when canceling all io_uring: fix memory ordering when SQPOLL thread goes to sleep io_uring: ensure that fsnotify is always called io_uring: recycle provided before arming poll
2022-04-01Merge tag '5.18-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds3-15/+14
Pull ksmbd updates from Steve French: - three cleanup fixes - shorten module load warning - two documentation fixes * tag '5.18-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: replace usage of found with dedicated list iterator variable ksmbd: Remove a redundant zeroing of memory MAINTAINERS: ksmbd: switch Sergey to reviewer ksmbd: shorten experimental warning on loading the module ksmbd: use netif_is_bridge_port Documentation: ksmbd: update Feature Status table
2022-04-01Merge tag '5.18-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds18-1377/+924
Pull more cifs updates from Steve French: - three fixes for big endian issues in how Persistent and Volatile file ids were stored - Various misc. fixes: including some for oops, 2 for ioctls, 1 for writeback - cleanup of how tcon (tree connection) status is tracked - Four changesets to move various duplicated protocol definitions (defined both in cifs.ko and ksmbd) into smbfs_common/smb2pdu.h - important performance improvement to use cached handles in some key compounding code paths (reduces numbers of opens/closes sent in some workloads) - fix to allow alternate DFS target to be used to retry on a failed i/o * tag '5.18-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix NULL ptr dereference in smb2_ioctl_query_info() cifs: prevent bad output lengths in smb2_ioctl_query_info() smb3: fix ksmbd bigendian bug in oplock break, and move its struct to smbfs_common smb3: cleanup and clarify status of tree connections smb3: move defines for query info and query fsinfo to smbfs_common smb3: move defines for ioctl protocol header and SMB2 sizes to smbfs_common [smb3] move more common protocol header definitions to smbfs_common cifs: fix incorrect use of list iterator after the loop ksmbd: store fids as opaque u64 integers cifs: fix bad fids sent over wire cifs: change smb2_query_info_compound to use a cached fid, if available cifs: convert the path to utf16 in smb2_query_info_compound cifs: writeback fix cifs: do not skip link targets when an I/O fails
2022-04-01Merge tag 'exfat-for-5.18-rc1' of ↵Linus Torvalds4-30/+47
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Add keep_last_dots mount option to allow access to paths with trailing dots - Avoid repetitive volume dirty bit set/clear to improve storage life time * tag 'exfat-for-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: do not clear VolumeDirty in writeback exfat: allow access to paths with trailing dots
2022-04-01Merge tag 'folio-5.18d' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds19-29/+27
Pull more filesystem folio updates from Matthew Wilcox: "A mixture of odd changes that didn't quite make it into the original pull and fixes for things that did. Also the readpages changes had to wait for the NFS tree to be pulled first. - Remove ->readpages infrastructure - Remove AOP_FLAG_CONT_EXPAND - Move read_descriptor_t to networking code - Pass the iocb to generic_perform_write - Minor updates to iomap, btrfs, ext4, f2fs, ntfs" * tag 'folio-5.18d' of git://git.infradead.org/users/willy/pagecache: btrfs: Remove a use of PAGE_SIZE in btrfs_invalidate_folio() ntfs: Correct mark_ntfs_record_dirty() folio conversion f2fs: Get the superblock from the mapping instead of the page f2fs: Correct f2fs_dirty_data_folio() conversion ext4: Correct ext4_journalled_dirty_folio() conversion filemap: Remove AOP_FLAG_CONT_EXPAND fs: Pass an iocb to generic_perform_write() fs, net: Move read_descriptor_t to net.h fs: Remove read_actor_t iomap: Simplify is_partially_uptodate a little readahead: Update comments mm: remove the skip_page argument to read_pages mm: remove the pages argument to read_pages fs: Remove ->readpages address space operation readahead: Remove read_cache_pages()
2022-04-01nilfs2: get rid of nilfs_mapping_init()Ryusuke Konishi2-10/+0
After applying the lockdep warning fixes, nilfs_mapping_init() is no longer used, so delete it. Link: https://lkml.kernel.org/r/1647867427-30498-4-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hao Sun <sunhao.th@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-01nilfs2: fix lockdep warnings during disk space reclamationRyusuke Konishi5-21/+92
During disk space reclamation, nilfs2 still emits the following lockdep warning due to page/folio operations on shadowed page caches that nilfs2 uses to get a snapshot of DAT file in memory: WARNING: CPU: 0 PID: 2643 at include/linux/backing-dev.h:272 __folio_mark_dirty+0x645/0x670 ... RIP: 0010:__folio_mark_dirty+0x645/0x670 ... Call Trace: filemap_dirty_folio+0x74/0xd0 __set_page_dirty_nobuffers+0x85/0xb0 nilfs_copy_dirty_pages+0x288/0x510 [nilfs2] nilfs_mdt_save_to_shadow_map+0x50/0xe0 [nilfs2] nilfs_clean_segments+0xee/0x5d0 [nilfs2] nilfs_ioctl_clean_segments.isra.19+0xb08/0xf40 [nilfs2] nilfs_ioctl+0xc52/0xfb0 [nilfs2] __x64_sys_ioctl+0x11d/0x170 This fixes the remaining warning by using inode objects to hold those page caches. Link: https://lkml.kernel.org/r/1647867427-30498-3-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-01nilfs2: fix lockdep warnings in page operations for btree nodesRyusuke Konishi10-50/+154
Patch series "nilfs2 lockdep warning fixes". The first two are to resolve the lockdep warning issue, and the last one is the accompanying cleanup and low priority. Based on your comment, this series solves the issue by separating inode object as needed. Since I was worried about the impact of the object composition changes, I tested the series carefully not to cause regressions especially for delicate functions such like disk space reclamation and snapshots. This patch (of 3): If CONFIG_LOCKDEP is enabled, nilfs2 hits lockdep warnings at inode_to_wb() during page/folio operations for btree nodes: WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 inode_to_wb include/linux/backing-dev.h:269 [inline] WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 folio_account_dirtied mm/page-writeback.c:2460 [inline] WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 __folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509 Modules linked in: ... RIP: 0010:inode_to_wb include/linux/backing-dev.h:269 [inline] RIP: 0010:folio_account_dirtied mm/page-writeback.c:2460 [inline] RIP: 0010:__folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509 ... Call Trace: __set_page_dirty include/linux/pagemap.h:834 [inline] mark_buffer_dirty+0x4e6/0x650 fs/buffer.c:1145 nilfs_btree_propagate_p fs/nilfs2/btree.c:1889 [inline] nilfs_btree_propagate+0x4ae/0xea0 fs/nilfs2/btree.c:2085 nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337 nilfs_collect_dat_data+0x45/0xd0 fs/nilfs2/segment.c:625 nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1009 nilfs_segctor_scan_file+0x47a/0x700 fs/nilfs2/segment.c:1048 nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1224 [inline] nilfs_segctor_collect fs/nilfs2/segment.c:1494 [inline] nilfs_segctor_do_construct+0x14f3/0x6c60 fs/nilfs2/segment.c:2036 nilfs_segctor_construct+0x7a7/0xb30 fs/nilfs2/segment.c:2372 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2480 [inline] nilfs_segctor_thread+0x3c3/0xf90 fs/nilfs2/segment.c:2563 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 This is because nilfs2 uses two page caches for each inode and inode->i_mapping never points to one of them, the btree node cache. This causes inode_to_wb(inode) to refer to a different page cache than the caller page/folio operations such like __folio_start_writeback(), __folio_end_writeback(), or __folio_mark_dirty() acquired the lock. This patch resolves the issue by allocating and using an additional inode to hold the page cache of btree nodes. The inode is attached one-to-one to the traditional nilfs2 inode if it requires a block mapping with b-tree. This setup change is in memory only and does not affect the disk format. Link: https://lkml.kernel.org/r/1647867427-30498-1-git-send-email-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/1647867427-30498-2-git-send-email-konishi.ryusuke@gmail.com Link: https://lore.kernel.org/r/YXrYvIo8YRnAOJCj@casper.infradead.org Link: https://lore.kernel.org/r/9a20b33d-b38f-b4a2-4742-c1eb5b8e4d6c@redhat.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+0d5b462a6f07447991b3@syzkaller.appspotmail.com Reported-by: syzbot+34ef28bb2aeb28724aa0@syzkaller.appspotmail.com Reported-by: Hao Sun <sunhao.th@gmail.com> Reported-by: David Hildenbrand <david@redhat.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-01ocfs2: fix crash when mount with quota enabledJoseph Qi2-13/+12
There is a reported crash when mounting ocfs2 with quota enabled. RIP: 0010:ocfs2_qinfo_lock_res_init+0x44/0x50 [ocfs2] Call Trace: ocfs2_local_read_info+0xb9/0x6f0 [ocfs2] dquot_load_quota_sb+0x216/0x470 dquot_load_quota_inode+0x85/0x100 ocfs2_enable_quotas+0xa0/0x1c0 [ocfs2] ocfs2_fill_super.cold+0xc8/0x1bf [ocfs2] mount_bdev+0x185/0x1b0 legacy_get_tree+0x27/0x40 vfs_get_tree+0x25/0xb0 path_mount+0x465/0xac0 __x64_sys_mount+0x103/0x140 It is caused by when initializing dqi_gqlock, the corresponding dqi_type and dqi_sb are not properly initialized. This issue is introduced by commit 6c85c2c72819, which wants to avoid accessing uninitialized variables in error cases. So make global quota info properly initialized. Link: https://lkml.kernel.org/r/20220323023644.40084-1-joseph.qi@linux.alibaba.com Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1007141 Fixes: 6c85c2c72819 ("ocfs2: quota_local: fix possible uninitialized-variable access in ocfs2_local_read_info()") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Dayvison <sathlerds@gmail.com> Tested-by: Valentin Vidic <vvidic@valentin-vidic.from.hr> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-01btrfs: Remove a use of PAGE_SIZE in btrfs_invalidate_folio()Matthew Wilcox (Oracle)1-1/+1
While btrfs doesn't use large folios yet, this should have been changed as part of the conversion from invalidatepage to invalidate_folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01ntfs: Correct mark_ntfs_record_dirty() folio conversionMatthew Wilcox (Oracle)1-1/+1
We've already done the work of block_dirty_folio() here, leaving only the work that needs to be done by filemap_dirty_folio(). This was a misconversion where I misread __set_page_dirty_nobuffers() as __set_page_dirty_buffers(). Fixes: e621900ad28b ("fs: Convert __set_page_dirty_buffers to block_dirty_folio") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01f2fs: Get the superblock from the mapping instead of the pageMatthew Wilcox (Oracle)2-3/+3
It's slightly more efficient to go directly from the mapping to the superblock than to go from the page. Now that these routines have the mapping passed to them, there's no reason not to use it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01f2fs: Correct f2fs_dirty_data_folio() conversionMatthew Wilcox (Oracle)1-1/+1
I got the return value wrong. Very little checks the return value from set_page_dirty(), so nobody noticed during testing. Fixes: 4f5e34f71318 ("f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01ext4: Correct ext4_journalled_dirty_folio() conversionMatthew Wilcox (Oracle)1-1/+1
This should use the new folio_buffers() instead of page_has_buffers(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01filemap: Remove AOP_FLAG_CONT_EXPANDMatthew Wilcox (Oracle)1-2/+1
This flag is no longer used, so remove it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>