summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2022-04-14Merge tag 'for-5.18-rc2-tag' of ↵Linus Torvalds8-23/+49
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more code and warning fixes. There's one feature ioctl removal patch slated for 5.18 that did not make it to the main pull request. It's just a one-liner and the ioctl has a v2 that's in use for a long time, no point to postpone it to 5.19. Late update: - remove balance v1 ioctl, superseded by v2 in 2012 Fixes: - add back cgroup attribution for compressed writes - add super block write start/end annotations to asynchronous balance - fix root reference count on an error handling path - in zoned mode, activate zone at the chunk allocation time to avoid ENOSPC due to timing issues - fix delayed allocation accounting for direct IO Warning fixes: - simplify assertion condition in zoned check - remove an unused variable" * tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix btrfs_submit_compressed_write cgroup attribution btrfs: fix root ref counts in error handling in btrfs_get_root_ref btrfs: zoned: activate block group only for extent allocation btrfs: return allocated block group from do_chunk_alloc() btrfs: mark resumed async balance as writing btrfs: remove support of balance v1 ioctl btrfs: release correct delalloc amount in direct IO write path btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups() btrfs: zoned: remove redundant condition in btrfs_run_delalloc_range
2022-04-14Merge tag 'fscache-fixes-20220413' of ↵Linus Torvalds8-20/+36
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache fixes from David Howells: "Here's a collection of fscache and cachefiles fixes and misc small cleanups. The two main fixes are: - Add a missing unmark of the inode in-use mark in an error path. - Fix a KASAN slab-out-of-bounds error when setting the xattr on a cachefiles volume due to the wrong length being given to memcpy(). In addition, there's the removal of an unused parameter, removal of an unused Kconfig option, conditionalising a bit of procfs-related stuff and some doc fixes" * tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache: remove FSCACHE_OLD_API Kconfig option fscache: Use wrapper fscache_set_cache_state() directly when relinquishing fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FS fscache: Remove the cookie parameter from fscache_clear_page_bits() docs: filesystems: caching/backend-api.rst: fix an object withdrawn API docs: filesystems: caching/backend-api.rst: correct two relinquish APIs use cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattr cachefiles: unmark inode in use in error path
2022-04-12Merge tag 'nfsd-5.18-1' of ↵Linus Torvalds2-22/+27
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a write performance regression - Fix crashes during request deferral on RDMA transports * tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix the svc_deferred_event trace class SUNRPC: Fix NFSD's request deferral on RDMA transports nfsd: Clean up nfsd_file_put() nfsd: Fix a write performance regression SUNRPC: Return true/false (not 1/0) from bool functions
2022-04-12stat: fix inconsistency between struct stat and struct compat_statMikulas Patocka1-9/+10
struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit st_dev and st_rdev; struct compat_stat (defined in arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by a 16-bit padding. This patch fixes struct compat_stat to match struct stat. [ Historical note: the old x86 'struct stat' did have that 16-bit field that the compat layer had kept around, but it was changes back in 2003 by "struct stat - support larger dev_t": https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d and back in those days, the x86_64 port was still new, and separate from the i386 code, and had already picked up the old version with a 16-bit st_dev field ] Note that we can't change compat_dev_t because it is used by compat_loop_info. Also, if the st_dev and st_rdev values are 32-bit, we don't have to use old_valid_dev to test if the value fits into them. This fixes -EOVERFLOW on filesystems that are on NVMe because NVMe uses the major number 259. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-10Merge tag 'driver-core-5.18-rc2' of ↵Linus Torvalds1-13/+0
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here are two small driver core changes for 5.18-rc2. They are the final bits in the removal of the default_attrs field in struct kobj_type. I had to wait until after 5.18-rc1 for all of the changes to do this came in through different development trees, and then one new user snuck in. So this series has two changes: - removal of the default_attrs field in the powerpc/pseries/vas code. The change has been acked by the PPC maintainers to come through this tree - removal of default_attrs from struct kobj_type now that all in-kernel users are removed. This cleans up the kobject code a little bit and removes some duplicated functionality that confused people (now there is only one way to do default groups) Both of these have been in linux-next for all of this week with no reported problems" * tag 'driver-core-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: kobject: kobj_type: remove default_attrs powerpc/pseries/vas: use default_groups in kobj_type
2022-04-08Merge tag 'io_uring-5.18-2022-04-08' of git://git.kernel.dk/linux-blockLinus Torvalds2-420/+198
Pull io_uring fixes from Jens Axboe: "A bit bigger than usual post merge window, largely due to a revert and a fix of at what point files are assigned for requests. The latter fixing a linked request use case where a dependent link can rely on what file is assigned consistently. Summary: - 32-bit compat fix for IORING_REGISTER_IOWQ_AFF (Eugene) - File assignment fixes (me) - Revert of the NAPI poll addition from this merge window. The author isn't available right now to engage on this, so let's revert it and we can retry for the 5.19 release (me, Jakub) - Fix a timeout removal race (me) - File update and SCM fixes (Pavel)" * tag 'io_uring-5.18-2022-04-08' of git://git.kernel.dk/linux-block: io_uring: fix race between timeout flush and removal io_uring: use nospec annotation for more indexes io_uring: zero tag on rsrc removal io_uring: don't touch scm_fp_list after queueing skb io_uring: nospec index for tags on files update io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF Revert "io_uring: Add support for napi_busy_poll" io_uring: drop the old style inflight file tracking io_uring: defer file assignment io_uring: propagate issue_flags state down to file assignment io_uring: move read/write file prep state into actual opcode handler io_uring: defer splice/tee file validity check until command issue io_uring: don't check req->file in io_fsync_prep()
2022-04-08fscache: remove FSCACHE_OLD_API Kconfig optionYue Hu1-3/+0
Commit 01491a756578 ("fscache, cachefiles: Disable configuration") added the FSCACHE_OLD_API configuration when rewritten. Now, it's not used any more. Remove it. Signed-off-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cachefs@redhat.com Link: https://listman.redhat.com/archives/linux-cachefs/2022-March/006647.html # v1
2022-04-08fscache: Use wrapper fscache_set_cache_state() directly when relinquishingYue Hu1-1/+1
We already have the wrapper function to set cache state. Signed-off-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com> cc: linux-cachefs@redhat.com Link: https://listman.redhat.com/archives/linux-cachefs/2022-April/006648.html # v1
2022-04-08fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FSYue Hu2-1/+7
fscache_cookies_seq_ops is only used in proc.c that is compiled under enabled CONFIG_PROC_FS, so move related code under this config. The same case exsits in internal.h. Also, make fscache_lru_cookie_timeout static due to no user outside of cookie.c. Signed-off-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cachefs@redhat.com Link: https://listman.redhat.com/archives/linux-cachefs/2022-April/006649.html # v1
2022-04-08fscache: Remove the cookie parameter from fscache_clear_page_bits()Yue Hu2-5/+3
The cookie is not used at all, remove it and update the usage in io.c and afs/write.c (which is the only user outside of fscache currently) at the same time. [DH: Amended the documentation also] Signed-off-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cachefs@redhat.com Link: https://listman.redhat.com/archives/linux-cachefs/2022-April/006659.html
2022-04-08cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattrDave Wysochanski1-1/+1
Use the actual length of volume coherency data when setting the xattr to avoid the following KASAN report. BUG: KASAN: slab-out-of-bounds in cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles] Write of size 4 at addr ffff888101e02af4 by task kworker/6:0/1347 CPU: 6 PID: 1347 Comm: kworker/6:0 Kdump: loaded Not tainted 5.18.0-rc1-nfs-fscache-netfs+ #13 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014 Workqueue: events fscache_create_volume_work [fscache] Call Trace: <TASK> dump_stack_lvl+0x45/0x5a print_report.cold+0x5e/0x5db ? __lock_text_start+0x8/0x8 ? cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles] kasan_report+0xab/0x120 ? cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles] kasan_check_range+0xf5/0x1d0 memcpy+0x39/0x60 cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles] cachefiles_acquire_volume+0x2be/0x500 [cachefiles] ? __cachefiles_free_volume+0x90/0x90 [cachefiles] fscache_create_volume_work+0x68/0x160 [fscache] process_one_work+0x3b7/0x6a0 worker_thread+0x2c4/0x650 ? process_one_work+0x6a0/0x6a0 kthread+0x16c/0x1a0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Allocated by task 1347: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 cachefiles_set_volume_xattr+0x76/0x350 [cachefiles] cachefiles_acquire_volume+0x2be/0x500 [cachefiles] fscache_create_volume_work+0x68/0x160 [fscache] process_one_work+0x3b7/0x6a0 worker_thread+0x2c4/0x650 kthread+0x16c/0x1a0 ret_from_fork+0x22/0x30 The buggy address belongs to the object at ffff888101e02af0 which belongs to the cache kmalloc-8 of size 8 The buggy address is located 4 bytes inside of 8-byte region [ffff888101e02af0, ffff888101e02af8) The buggy address belongs to the physical page: page:00000000a2292d70 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x101e02 flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff) raw: 0017ffffc0000200 0000000000000000 dead000000000001 ffff888100042280 raw: 0000000000000000 0000000080660066 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888101e02980: fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc ffff888101e02a00: 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00 >ffff888101e02a80: fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 04 fc ^ ffff888101e02b00: fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc ffff888101e02b80: fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc ================================================================== Fixes: 413a4a6b0b55 "cachefiles: Fix volume coherency attribute" Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/20220405134649.6579-1-dwysocha@redhat.com/ # v1 Link: https://lore.kernel.org/r/20220405142810.8208-1-dwysocha@redhat.com/ # Incorrect v2
2022-04-08cachefiles: unmark inode in use in error pathJeffle Xu1-9/+24
Unmark inode in use if error encountered. If the in-use flag leakage occurs in cachefiles_open_file(), Cachefiles will complain "Inode already in use" when later another cookie with the same index key is looked up. If the in-use flag leakage occurs in cachefiles_create_tmpfile(), though the "Inode already in use" warning won't be triggered, fix the leakage anyway. Reported-by: Gao Xiang <hsiangkao@linux.alibaba.com> Fixes: 1f08c925e7a3 ("cachefiles: Implement backing file wrangling") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cachefs@redhat.com Link: https://listman.redhat.com/archives/linux-cachefs/2022-March/006615.html # v1 Link: https://listman.redhat.com/archives/linux-cachefs/2022-March/006618.html # v2
2022-04-08io_uring: fix race between timeout flush and removalJens Axboe1-4/+3
io_flush_timeouts() assumes the timeout isn't in progress of triggering or being removed/canceled, so it unconditionally removes it from the timeout list and attempts to cancel it. Leave it on the list and let the normal timeout cancelation take care of it. Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-08Merge tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds9-24/+22
Pull NFS client fixes from Trond Myklebust: "Stable fixes: - SUNRPC: Ensure we flush any closed sockets before xs_xprt_free() Bugfixes: - Fix an Oopsable condition due to SLAB_ACCOUNT setting in the NFSv4.2 xattr code. - Fix for open() using an file open mode of '3' in NFSv4 - Replace readdir's use of xxhash() with hash_64() - Several patches to handle malloc() failure in SUNRPC" * tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg() SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec() SUNRPC: Handle allocation failure in rpc_new_task() NFS: Ensure rpc_run_task() cannot fail in nfs_async_rename() NFSv4/pnfs: Handle RPC allocation errors in nfs4_proc_layoutget SUNRPC: Handle low memory situations in call_status() SUNRPC: Handle ENOMEM in call_transmit_status() NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocation SUNRPC: Ensure we flush any closed sockets before xs_xprt_free() NFS: Replace readdir's use of xxhash() with hash_64() SUNRPC: handle malloc failure in ->request_prepare NFSv4: fix open failure with O_ACCMODE flag Revert "NFSv4: Handle the special Linux file open access mode"
2022-04-07Merge tag '5.18-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds4-12/+17
Pull cifs client fixes from Steve French: - reconnect fixes: one for DFS and one to avoid a reconnect race - small change to deal with upcoming behavior change of list iterators * tag '5.18-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module number cifs: force new session setup and tcon for dfs cifs: remove check of list iterator against head past the loop body cifs: fix potential race with cifsd thread
2022-04-07NFS: Ensure rpc_run_task() cannot fail in nfs_async_rename()Trond Myklebust1-0/+1
Ensure the call to rpc_run_task() cannot fail by preallocating the rpc_task. Fixes: 910ad38697d9 ("NFS: Fix memory allocation in rpc_alloc_task()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07NFSv4/pnfs: Handle RPC allocation errors in nfs4_proc_layoutgetTrond Myklebust1-0/+2
If rpc_run_task() fails due to an allocation error, then bail out early. Fixes: 910ad38697d9 ("NFS: Fix memory allocation in rpc_alloc_task()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocationMuchun Song1-1/+1
The commit 5c60e89e71f8 ("NFSv4.2: Fix up an invalid combination of memory allocation flags") has stripped GFP_KERNEL_ACCOUNT down to GFP_KERNEL, however, it forgot to remove SLAB_ACCOUNT from kmem_cache allocation. It means that memory is still limited by kmemcg. This patch also fix a NULL pointer reference issue [1] reported by NeilBrown. Link: https://lore.kernel.org/all/164870069595.25542.17292003658915487357@noble.neil.brown.name/ [1] Fixes: 5c60e89e71f8 ("NFSv4.2: Fix up an invalid combination of memory allocation flags") Fixes: 5abc1e37afa0 ("mm: list_lru: allocate list_lru_one only when needed") Reported-by: NeilBrown <neilb@suse.de> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()Trond Myklebust1-0/+1
We must ensure that all sockets are closed before we call xprt_free() and release the reference to the net namespace. The problem is that calling fput() will defer closing the socket until delayed_fput() gets called. Let's fix the situation by allowing rpciod and the transport teardown code (which runs on the system wq) to call __fput_sync(), and directly close the socket. Reported-by: Felix Fu <foyjog@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: a73881c96d73 ("SUNRPC: Fix an Oops in udp_poll()") Cc: stable@vger.kernel.org # 5.1.x: 3be232f11a3c: SUNRPC: Prevent immediate close+reconnect Cc: stable@vger.kernel.org # 5.1.x: 89f42494f92f: SUNRPC: Don't call connect() more than once on a TCP socket Cc: stable@vger.kernel.org # 5.1.x Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07NFS: Replace readdir's use of xxhash() with hash_64()Trond Myklebust2-10/+3
Both xxhash() and hash_64() appear to give similarly low collision rates with a standard linearly increasing readdir offset. They both give similarly higher collision rates when applied to ext4's offsets. So switch to using the standard hash_64(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07io_uring: use nospec annotation for more indexesPavel Begunkov1-6/+5
There are still several places that using pre array_index_nospec() indexes, fix them up. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b01ef5ee83f72ed35ad525912370b729f5d145f4.1649336342.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: zero tag on rsrc removalPavel Begunkov1-1/+3
Automatically default rsrc tag in io_queue_rsrc_removal(), it's safer than leaving it there and relying on the rest of the code to behave and not use it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1cf262a50df17478ea25b22494dcc19f3a80301f.1649336342.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: don't touch scm_fp_list after queueing skbPavel Begunkov1-2/+6
It's safer to not touch scm_fp_list after we queued an skb to which it was assigned, there might be races lurking if we screw subtle sync guarantees on the io_uring side. Fixes: 6b06314c47e14 ("io_uring: add file set registration") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: nospec index for tags on files updatePavel Begunkov1-1/+1
Don't forget to array_index_nospec() for indexes before updating rsrc tags in __io_sqe_files_update(), just use already safe and precalculated index @i. Fixes: c3bdad0271834 ("io_uring: add generic rsrc update with tags") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFFEugene Syromiatnikov1-1/+9
Similarly to the way it is done im mbind syscall. Cc: stable@vger.kernel.org # 5.14 Fixes: fe76421d1da1dcdb ("io_uring: allow user configurable IO thread CPU affinity") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07Revert "io_uring: Add support for napi_busy_poll"Jens Axboe1-231/+1
This reverts commit adc8682ec69012b68d5ab7123e246d2ad9a6f94b. There's some discussion on the API not being as good as it can be. Rather than ship something and be stuck with it forever, let's revert the NAPI support for now and work on getting something sorted out for the next kernel release instead. Link: https://lore.kernel.org/io-uring/b7bbc124-8502-0ee9-d4c8-7c41b4487264@kernel.dk/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: drop the old style inflight file trackingJens Axboe1-58/+27
io_uring tracks requests that are referencing an io_uring descriptor to be able to cancel without worrying about loops in the references. Since we now assign the file at execution time, the easier approach is to drop a potentially problematic reference before we punt the request. This eliminates the need to special case these types of files beyond just marking them as such, and simplifies cancelation quite a bit. This also fixes a recent issue where an async punted tee operation would with the io_uring descriptor as the output file would crash when attempting to get a reference to the file from the io-wq worker. We could have worked around that, but this is the much cleaner fix. Fixes: 6bf9c47a3989 ("io_uring: defer file assignment") Reported-by: syzbot+c4b9303500a21750b250@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: defer file assignmentJens Axboe2-10/+30
If an application uses direct open or accept, it knows in advance what direct descriptor value it will get as it picks it itself. This allows combined requests such as: sqe = io_uring_get_sqe(ring); io_uring_prep_openat_direct(sqe, ..., file_slot); sqe->flags |= IOSQE_IO_LINK | IOSQE_CQE_SKIP_SUCCESS; sqe = io_uring_get_sqe(ring); io_uring_prep_read(sqe,file_slot, buf, buf_size, 0); sqe->flags |= IOSQE_FIXED_FILE; io_uring_submit(ring); where we prepare both a file open and read, and only get a completion event for the read when both have completed successfully. Currently links are fully prepared before the head is issued, but that fails if the dependent link needs a file assigned that isn't valid until the head has completed. Conversely, if the same chain is performed but the fixed file slot is already valid, then we would be unexpectedly returning data from the old file slot rather than the newly opened one. Make sure we're consistent here. Allow deferral of file setup, which makes this documented case work. Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07io_uring: propagate issue_flags state down to file assignmentJens Axboe1-35/+47
We'll need this in a future patch, when we could be assigning the file after the prep stage. While at it, get rid of the io_file_get() helper, it just makes the code harder to read. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-06btrfs: fix btrfs_submit_compressed_write cgroup attributionDennis Zhou1-0/+8
This restores the logic from commit 46bcff2bfc5e ("btrfs: fix compressed write bio blkcg attribution") which added cgroup attribution to btrfs writeback. It also adds back the REQ_CGROUP_PUNT flag for these ios. Fixes: 91507240482e ("btrfs: determine stripe boundary at bio allocation time in btrfs_submit_compressed_write") CC: stable@vger.kernel.org # 5.16+ Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06btrfs: fix root ref counts in error handling in btrfs_get_root_refJia-Ju Bai1-2/+3
In btrfs_get_root_ref(), when btrfs_insert_fs_root() fails, btrfs_put_root() can happen for two reasons: - the root already exists in the tree, in that case it returns the reference obtained in btrfs_lookup_fs_root() - another error so the cleanup is done in the fail label Calling btrfs_put_root() unconditionally would lead to double decrement of the root reference possibly freeing it in the second case. Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Fixes: bc44d7c4b2b1 ("btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root") CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06btrfs: zoned: activate block group only for extent allocationNaohiro Aota3-9/+21
In btrfs_make_block_group(), we activate the allocated block group, expecting that the block group is soon used for allocation. However, the chunk allocation from flush_space() context broke the assumption. There can be a large time gap between the chunk allocation time and the extent allocation time from the chunk. Activating the empty block groups pre-allocated from flush_space() context can exhaust the active zone counter of a device. Once we use all the active zone counts for empty pre-allocated block groups, we cannot activate new block group for the other things: metadata, tree-log, or data relocation block group. That failure results in a fake -ENOSPC. This patch introduces CHUNK_ALLOC_FORCE_FOR_EXTENT to distinguish the chunk allocation from find_free_extent(). Now, the new block group is activated only in that context. Fixes: eb66a010d518 ("btrfs: zoned: activate new block group") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06btrfs: return allocated block group from do_chunk_alloc()Naohiro Aota1-3/+13
Return the allocated block group from do_chunk_alloc(). This is a preparation patch for the next patch. CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06btrfs: mark resumed async balance as writingNaohiro Aota1-0/+2
When btrfs balance is interrupted with umount, the background balance resumes on the next mount. There is a potential deadlock with FS freezing here like as described in commit 26559780b953 ("btrfs: zoned: mark relocation as writing"). Mark the process as sb_writing to avoid it. Reviewed-by: Filipe Manana <fdmanana@suse.com> CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06btrfs: remove support of balance v1 ioctlNikolay Borisov1-2/+0
It was scheduled for removal in kernel v5.18 commit 6c405b24097c ("btrfs: deprecate BTRFS_IOC_BALANCE ioctl") thus its time has come. Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06btrfs: release correct delalloc amount in direct IO write pathNaohiro Aota1-3/+3
Running generic/406 causes the following WARNING in btrfs_destroy_inode() which tells there are outstanding extents left. In btrfs_get_blocks_direct_write(), we reserve a temporary outstanding extents with btrfs_delalloc_reserve_metadata() (or indirectly from btrfs_delalloc_reserve_space(()). We then release the outstanding extents with btrfs_delalloc_release_extents(). However, the "len" can be modified in the COW case, which releases fewer outstanding extents than expected. Fix it by calling btrfs_delalloc_release_extents() for the original length. To reproduce the warning, the filesystem should be 1 GiB. It's triggering a short-write, due to not being able to allocate a large extent and instead allocating a smaller one. WARNING: CPU: 0 PID: 757 at fs/btrfs/inode.c:8848 btrfs_destroy_inode+0x1e6/0x210 [btrfs] Modules linked in: btrfs blake2b_generic xor lzo_compress lzo_decompress raid6_pq zstd zstd_decompress zstd_compress xxhash zram zsmalloc CPU: 0 PID: 757 Comm: umount Not tainted 5.17.0-rc8+ #101 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014 RIP: 0010:btrfs_destroy_inode+0x1e6/0x210 [btrfs] RSP: 0018:ffffc9000327bda8 EFLAGS: 00010206 RAX: 0000000000000000 RBX: ffff888100548b78 RCX: 0000000000000000 RDX: 0000000000026900 RSI: 0000000000000000 RDI: ffff888100548b78 RBP: ffff888100548940 R08: 0000000000000000 R09: ffff88810b48aba8 R10: 0000000000000001 R11: ffff8881004eb240 R12: ffff88810b48a800 R13: ffff88810b48ec08 R14: ffff88810b48ed00 R15: ffff888100490c68 FS: 00007f8549ea0b80(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f854a09e733 CR3: 000000010a2e9003 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> destroy_inode+0x33/0x70 dispose_list+0x43/0x60 evict_inodes+0x161/0x1b0 generic_shutdown_super+0x2d/0x110 kill_anon_super+0xf/0x20 btrfs_kill_super+0xd/0x20 [btrfs] deactivate_locked_super+0x27/0x90 cleanup_mnt+0x12c/0x180 task_work_run+0x54/0x80 exit_to_user_mode_prepare+0x152/0x160 syscall_exit_to_user_mode+0x12/0x30 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f854a000fb7 Fixes: f0bfa76a11e9 ("btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range") CC: stable@vger.kernel.org # 5.17 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()Nathan Chancellor1-4/+0
Clang's version of -Wunused-but-set-variable recently gained support for unary operations, which reveals two unused variables: fs/btrfs/block-group.c:2949:6: error: variable 'num_started' set but not used [-Werror,-Wunused-but-set-variable] int num_started = 0; ^ fs/btrfs/block-group.c:3116:6: error: variable 'num_started' set but not used [-Werror,-Wunused-but-set-variable] int num_started = 0; ^ 2 errors generated. These variables appear to be unused from their introduction, so just remove them to silence the warnings. Fixes: c9dc4c657850 ("Btrfs: two stage dirty block group writeout") Fixes: 1bbc621ef284 ("Btrfs: allow block group cache writeout outside critical section in commit") CC: stable@vger.kernel.org # 5.4+ Link: https://github.com/ClangBuiltLinux/linux/issues/1614 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06btrfs: zoned: remove redundant condition in btrfs_run_delalloc_rangeHaowen Bai1-2/+1
The logic !A || A && B is equivalent to !A || B. so we can make code clear. Note: though it's preferred to be in the more human readable form, there have been repeated reports and patches as the expression is detected by tools so apply it to reduce the load. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Haowen Bai <baihaowen@meizu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add note ] Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-05Merge tag 'for-5.18-rc1-tag' of ↵Linus Torvalds6-55/+81
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - prevent deleting subvolume with active swapfile - fix qgroup reserve limit calculation overflow - remove device count in superblock and its item in one transaction so they cant't get out of sync - skip defragmenting an isolated sector, this could cause some extra IO - unify handling of mtime/permissions in hole punch with fallocate - zoned mode fixes: - remove assert checking for only single mode, we have the DUP mode implemented - fix potential lockdep warning while traversing devices when checking for zone activation * tag 'for-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: prevent subvol with swapfile from being deleted btrfs: do not warn for free space inode in cow_file_range btrfs: avoid defragging extents whose next extents are not targets btrfs: fix fallocate to use file_modified to update permissions consistently btrfs: remove device item and update super block in the same transaction btrfs: fix qgroup reserve overflow the qgroup limit btrfs: zoned: remove left over ASSERT checking for single profile btrfs: zoned: traverse devices under chunk_mutex in btrfs_can_activate_zone
2022-04-05kobject: kobj_type: remove default_attrsGreg Kroah-Hartman1-13/+0
Now that all in-kernel users of default_attrs for the kobj_type are gone and converted to properly use the default_groups pointer instead, it can be safely removed. There is one standard way to create sysfs files in a kobj_type, and not two like before, causing confusion as to which should be used. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20220106133151.607703-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-04cifs: update internal module numberSteve French1-1/+1
To 2.36 Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-04cifs: force new session setup and tcon for dfsPaulo Alcantara1-5/+8
Do not reuse existing sessions and tcons in DFS failover as it might connect to different servers and shares. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-04io_uring: move read/write file prep state into actual opcode handlerJens Axboe1-48/+53
In preparation for not necessarily having a file assigned at prep time, defer any initialization associated with the file to when the opcode handler is run. Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-04io_uring: defer splice/tee file validity check until command issueJens Axboe1-28/+21
In preparation for not using the file at prep time, defer checking if this file refers to a valid io_uring instance until issue time. This also means we can get rid of the cleanup flag for splice and tee. Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-04cifs: remove check of list iterator against head past the loop bodyJakob Koschel1-4/+6
When list_for_each_entry() completes the iteration over the whole list without breaking the loop, the iterator value will be a bogus pointer computed based on the head element. While it is safe to use the pointer to determine if it was computed based on the head element, either with list_entry_is_head() or &pos->member == head, using the iterator variable after the loop should be avoided. In preparation to limit the scope of a list iterator to the list traversal loop, use a dedicated pointer to point to the found element [1]. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-04cifs: fix potential race with cifsd threadPaulo Alcantara2-2/+2
To avoid racing with demultiplex thread while it is handling data on socket, use cifs_signal_cifsd_for_reconnect() helper for marking current server to reconnect and let the demultiplex thread handle the rest. Fixes: dca65818c80c ("cifs: use a different reconnect helper for non-cifsd threads") Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-03io_uring: don't check req->file in io_fsync_prep()Jens Axboe1-3/+0
This is a leftover from the really old days where we weren't able to track and error early if we need a file and it wasn't assigned. Kill the check. Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-03Merge tag 'trace-v5.18-2' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing updates from Steven Rostedt: - Rename the staging files to give them some meaning. Just stage1,stag2,etc, does not show what they are for - Check for NULL from allocation in bootconfig - Hold event mutex for dyn_event call in user events - Mark user events to broken (to work on the API) - Remove eBPF updates from user events - Remove user events from uapi header to keep it from being installed. - Move ftrace_graph_is_dead() into inline as it is called from hot paths and also convert it into a static branch. * tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Move user_events.h temporarily out of include/uapi ftrace: Make ftrace_graph_is_dead() a static branch tracing: Set user_events to BROKEN tracing/user_events: Remove eBPF interfaces tracing/user_events: Hold event_mutex during dyn_event_add proc: bootconfig: Add null pointer check tracing: Rename the staging files for trace_events
2022-04-02proc: bootconfig: Add null pointer checkLv Ruyi1-0/+2
kzalloc is a memory allocation function which can return NULL when some internal memory errors happen. It is safer to add null pointer check. Link: https://lkml.kernel.org/r/20220329104004.2376879-1-lv.ruyi@zte.com.cn Cc: stable@vger.kernel.org Fixes: c1a3c36017d4 ("proc: bootconfig: Add /proc/bootconfig to show boot config list") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-01Merge branch 'work.misc' of ↵Linus Torvalds6-22/+16
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted bits and pieces" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aio: drop needless assignment in aio_read() clean overflow checks in count_mounts() a bit seq_file: fix NULL pointer arithmetic warning uml/x86: use x86 load_unaligned_zeropad() asm/user.h: killed unused macros constify struct path argument of finish_automount()/do_add_mount() fs: Remove FIXME comment in generic_write_checks()