summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2022-07-01Merge tag 'nfs-for-5.19-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2-6/+14
Pull NFS client fixes from Anna Schumaker: - Allocate a fattr for _nfs4_discover_trunking() - Fix module reference count leak in nfs4_run_state_manager() * tag 'nfs-for-5.19-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4: Add an fattr allocation to _nfs4_discover_trunking() NFS: restore module put when manager exits.
2022-07-01Merge tag 'ceph-for-5.19-rc5' of https://github.com/ceph/ceph-clientLinus Torvalds1-0/+1
Pull ceph fix from Ilya Dryomov: "A ceph filesystem fix, marked for stable. There appears to be a deeper issue on the MDS side, but for now we are going with this one-liner to avoid busy looping and potential soft lockups" * tag 'ceph-for-5.19-rc5' of https://github.com/ceph/ceph-client: ceph: wait on async create before checking caps for syncfs
2022-07-01Merge tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-blockLinus Torvalds1-7/+12
Pull io_uring fixes from Jens Axboe: "Two minor tweaks: - While we still can, adjust the send/recv based flags to be in ->ioprio rather than in ->addr2. This is consistent with eg accept, and also doesn't waste a full 64-bit field for flags (Pavel) - 5.18-stable fix for re-importing provided buffers. Not much real world relevance here as it'll only impact non-pollable files gone async, which is more of a practical test case rather than something that is used in the wild (Dylan)" * tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-block: io_uring: fix provided buffer import io_uring: keep sendrecv flags in ioprio
2022-06-30vfs: fix copy_file_range() regression in cross-fs copiesAmir Goldstein4-37/+68
A regression has been reported by Nicolas Boichat, found while using the copy_file_range syscall to copy a tracefs file. Before commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the kernel would return -EXDEV to userspace when trying to copy a file across different filesystems. After this commit, the syscall doesn't fail anymore and instead returns zero (zero bytes copied), as this file's content is generated on-the-fly and thus reports a size of zero. Another regression has been reported by He Zhe - the assertion of WARN_ON_ONCE(ret == -EOPNOTSUPP) can be triggered from userspace when copying from a sysfs file whose read operation may return -EOPNOTSUPP. Since we do not have test coverage for copy_file_range() between any two types of filesystems, the best way to avoid these sort of issues in the future is for the kernel to be more picky about filesystems that are allowed to do copy_file_range(). This patch restores some cross-filesystem copy restrictions that existed prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices"), namely, cross-sb copy is not allowed for filesystems that do not implement ->copy_file_range(). Filesystems that do implement ->copy_file_range() have full control of the result - if this method returns an error, the error is returned to the user. Before this change this was only true for fs that did not implement the ->remap_file_range() operation (i.e. nfsv3). Filesystems that do not implement ->copy_file_range() still fall-back to the generic_copy_file_range() implementation when the copy is within the same sb. This helps the kernel can maintain a more consistent story about which filesystems support copy_file_range(). nfsd and ksmbd servers are modified to fall-back to the generic_copy_file_range() implementation in case vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV, which preserves behavior of server-side-copy. fall-back to generic_copy_file_range() is not implemented for the smb operation FSCTL_DUPLICATE_EXTENTS_TO_FILE, which is arguably a correct change of behavior. Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/ Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/ Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/ Link: https://lore.kernel.org/linux-fsdevel/20210630161320.29006-1-lhenriques@suse.de/ Reported-by: Nicolas Boichat <drinkcat@chromium.org> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Luis Henriques <lhenriques@suse.de> Fixes: 64bf5ff58dff ("vfs: no fallback for ->copy_file_range") Link: https://lore.kernel.org/linux-fsdevel/20f17f64-88cb-4e80-07c1-85cb96c83619@windriver.com/ Reported-by: He Zhe <zhe.he@windriver.com> Tested-by: Namjae Jeon <linkinjeon@kernel.org> Tested-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-30NFSv4: Add an fattr allocation to _nfs4_discover_trunking()Scott Mayhew1-6/+13
This was missed in c3ed222745d9 ("NFSv4: Fix free of uninitialized nfs4_label on referral lookup.") and causes a panic when mounting with '-o trunkdiscovery': PID: 1604 TASK: ffff93dac3520000 CPU: 3 COMMAND: "mount.nfs" #0 [ffffb79140f738f8] machine_kexec at ffffffffaec64bee #1 [ffffb79140f73950] __crash_kexec at ffffffffaeda67fd #2 [ffffb79140f73a18] crash_kexec at ffffffffaeda76ed #3 [ffffb79140f73a30] oops_end at ffffffffaec2658d #4 [ffffb79140f73a50] general_protection at ffffffffaf60111e [exception RIP: nfs_fattr_init+0x5] RIP: ffffffffc0c18265 RSP: ffffb79140f73b08 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff93dac304a800 RCX: 0000000000000000 RDX: ffffb79140f73bb0 RSI: ffff93dadc8cbb40 RDI: d03ee11cfaf6bd50 RBP: ffffb79140f73be8 R8: ffffffffc0691560 R9: 0000000000000006 R10: ffff93db3ffd3df8 R11: 0000000000000000 R12: ffff93dac4040000 R13: ffff93dac2848e00 R14: ffffb79140f73b60 R15: ffffb79140f73b30 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffb79140f73b08] _nfs41_proc_get_locations at ffffffffc0c73d53 [nfsv4] #6 [ffffb79140f73bf0] nfs4_proc_get_locations at ffffffffc0c83e90 [nfsv4] #7 [ffffb79140f73c60] nfs4_discover_trunking at ffffffffc0c83fb7 [nfsv4] #8 [ffffb79140f73cd8] nfs_probe_fsinfo at ffffffffc0c0f95f [nfs] #9 [ffffb79140f73da0] nfs_probe_server at ffffffffc0c1026a [nfs] RIP: 00007f6254fce26e RSP: 00007ffc69496ac8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6254fce26e RDX: 00005600220a82a0 RSI: 00005600220a64d0 RDI: 00005600220a6520 RBP: 00007ffc69496c50 R8: 00005600220a8710 R9: 003035322e323231 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc69496c50 R13: 00005600220a8440 R14: 0000000000000010 R15: 0000560020650ef9 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b Fixes: c3ed222745d9 ("NFSv4: Fix free of uninitialized nfs4_label on referral lookup.") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-06-30NFS: restore module put when manager exits.NeilBrown1-0/+1
Commit f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") removed calls to module_put_and_kthread_exit() from threads that acted as SUNRPC servers and had a related svc_serv_ops structure. This was correct. It ALSO removed the module_put_and_kthread_exit() call from nfs4_run_state_manager() which is NOT a SUNRPC service. Consequently every time the NFSv4 state manager runs the module count increments and won't be decremented. So the nfsv4 module cannot be unloaded. So restore the module_put_and_kthread_exit() call. Fixes: f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-06-30io_uring: fix provided buffer importDylan Yudaken1-3/+4
io_import_iovec uses the s pointer, but this was changed immediately after the iovec was re-imported and so it was imported into the wrong place. Change the ordering. Fixes: 2be2eb02e2f5 ("io_uring: ensure reads re-import for selected buffers") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630132006.2825668-1-dylany@fb.com [axboe: ensure we don't half-import as well] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-30Merge tag 'fsnotify_for_v5.19-rc5' of ↵Linus Torvalds1-15/+19
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify fix from Jan Kara: "A fix for recently added fanotify API to have stricter checks and refuse some invalid flag combinations to make our life easier in the future" * tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: refine the validation checks on non-dir inode mask
2022-06-30io_uring: keep sendrecv flags in ioprioPavel Begunkov1-4/+8
We waste a u64 SQE field for flags even though we don't need as many bits and it can be used for something more useful later. Store io_uring specific send/recv flags in sqe->ioprio instead of ->addr2. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Fixes: 0455d4ccec54 ("io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg") [axboe: change comment in io_uring.h as well] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-29Merge tag '5.19-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds4-28/+24
Pull ksmbd server fixes from Steve French: - seek null check (don't use f_seek op directly and blindly) - offset validation in FSCTL_SET_ZERO_DATA - fallocate fix (relates e.g. to xfstests generic/091 and 263) - two cleanup fixes - fix socket settings on some arch * tag '5.19-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: use vfs_llseek instead of dereferencing NULL ksmbd: check invalid FileOffset and BeyondFinalZero in FSCTL_ZERO_DATA ksmbd: set the range of bytes to zero without extending file size in FSCTL_ZERO_DATA ksmbd: remove duplicate flag set in smb2_write ksmbd: smbd: Remove useless license text when SPDX-License-Identifier is already used ksmbd: use SOCK_NONBLOCK type for kernel_accept()
2022-06-29ceph: wait on async create before checking caps for syncfsJeff Layton1-0/+1
Currently, we'll call ceph_check_caps, but if we're still waiting on the reply, we'll end up spinning around on the same inode in flush_dirty_session_caps. Wait for the async create reply before flushing caps. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/55823 Fixes: fbed7045f552 ("ceph: wait for async create reply before sending any cap messages") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-06-28fanotify: refine the validation checks on non-dir inode maskAmir Goldstein1-15/+19
Commit ceaf69f8eadc ("fanotify: do not allow setting dirent events in mask of non-dir") added restrictions about setting dirent events in the mask of a non-dir inode mark, which does not make any sense. For backward compatibility, these restictions were added only to new (v5.17+) APIs. It also does not make any sense to set the flags FAN_EVENT_ON_CHILD or FAN_ONDIR in the mask of a non-dir inode. Add these flags to the dir-only restriction of the new APIs as well. Move the check of the dir-only flags for new APIs into the helper fanotify_events_supported(), which is only called for FAN_MARK_ADD, because there is no need to error on an attempt to remove the dir-only flags from non-dir inode. Fixes: ceaf69f8eadc ("fanotify: do not allow setting dirent events in mask of non-dir") Link: https://lore.kernel.org/linux-fsdevel/20220627113224.kr2725conevh53u4@quack3.lan/ Link: https://lore.kernel.org/r/20220627174719.2838175-1-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-06-26Merge tag 'mm-hotfixes-stable-2022-06-26' of ↵Linus Torvalds1-15/+53
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "Minor things, mainly - mailmap updates, MAINTAINERS updates, etc. Fixes for this merge window: - fix for a damon boot hang, from SeongJae - fix for a kfence warning splat, from Jason Donenfeld - fix for zero-pfn pinning, from Alex Williamson - fix for fallocate hole punch clearing, from Mike Kravetz Fixes for previous releases: - fix for a performance regression, from Marcelo - fix for a hwpoisining BUG from zhenwei pi" * tag 'mm-hotfixes-stable-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mailmap: add entry for Christian Marangi mm/memory-failure: disable unpoison once hw error happens hugetlbfs: zero partial pages during fallocate hole punch mm: memcontrol: reference to tools/cgroup/memcg_slabinfo.py mm: re-allow pinning of zero pfns mm/kfence: select random number before taking raw lock MAINTAINERS: add maillist information for LoongArch MAINTAINERS: update MM tree references MAINTAINERS: update Abel Vesa's email MAINTAINERS: add MEMORY HOT(UN)PLUG section and add David as reviewer MAINTAINERS: add Miaohe Lin as a memory-failure reviewer mailmap: add alias for jarkko@profian.com mm/damon/reclaim: schedule 'damon_reclaim_timer' only after 'system_wq' is initialized kthread: make it clear that kthread_create_on_node() might be terminated by any fatal signal mm: lru_cache_disable: use synchronize_rcu_expedited mm/page_isolation.c: fix one kernel-doc comment
2022-06-26Merge tag 'for-5.19-rc3-tag' of ↵Linus Torvalds10-27/+145
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - zoned relocation fixes: - fix critical section end for extent writeback, this could lead to out of order write - prevent writing to previous data relocation block group if space gets low - reflink fixes: - fix race between reflinking and ordered extent completion - proper error handling when block reserve migration fails - add missing inode iversion/mtime/ctime updates on each iteration when replacing extents - fix deadlock when running fsync/fiemap/commit at the same time - fix false-positive KCSAN report regarding pid tracking for read locks and data race - minor documentation update and link to new site * tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Documentation: update btrfs list of features and link to readthedocs.io btrfs: fix deadlock with fsync+fiemap+transaction commit btrfs: don't set lock_owner when locking extent buffer for reading btrfs: zoned: fix critical section of relocation inode writeback btrfs: zoned: prevent allocation from previous data relocation BG btrfs: do not BUG_ON() on failure to migrate space when replacing extents btrfs: add missing inode updates on each iteration when replacing extents btrfs: fix race between reflinking and ordered extent completion
2022-06-26Merge tag 'exfat-for-5.19-rc4' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fix from Namjae Jeon: - Use updated exfat_chain directly instead of snapshot values in rename. * tag 'exfat-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: use updated exfat_chain directly during renaming
2022-06-26Merge tag '5.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds8-139/+366
Pull cifs client fixes from Steve French: "Fixes addressing important multichannel, and reconnect issues. Multichannel mounts when the server network interfaces changed, or ip addresses changed, uncovered problems, especially in reconnect, but the patches for this were held up until recently due to some lock conflicts that are now addressed. Included in this set of fixes: - three fixes relating to multichannel reconnect, dynamically adjusting the list of server interfaces to avoid problems during reconnect - a lock conflict fix related to the above - two important fixes for negotiate on secondary channels (null netname can unintentionally cause multichannel to be disabled to some servers) - a reconnect fix (reporting incorrect IP address in some cases)" * tag '5.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update cifs_ses::ip_addr after failover cifs: avoid deadlocks while updating iface cifs: periodically query network interfaces from server cifs: during reconnect, update interface if necessary cifs: change iface_list from array to sorted linked list smb3: use netname when available on secondary channels smb3: fix empty netname context on secondary channels
2022-06-25ksmbd: use vfs_llseek instead of dereferencing NULLJason A. Donenfeld1-2/+2
By not checking whether llseek is NULL, this might jump to NULL. Also, it doesn't check FMODE_LSEEK. Fix this by using vfs_llseek(), which always does the right thing. Fixes: f44158485826 ("cifsd: add file operations") Cc: stable@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: Ronnie Sahlberg <lsahlber@redhat.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-25Merge tag 'f2fs-for-5.19-rc4' of ↵Linus Torvalds3-20/+32
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "Some urgent fixes to avoid generating corrupted inodes caused by compressed and inline_data files. In addition, avoid a wrong error report which prevents a roll-forward recovery" * tag 'f2fs-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: do not count ENOENT for error case f2fs: fix iostat related lock protection f2fs: attach inline_data after setting compression
2022-06-24cifs: update cifs_ses::ip_addr after failoverPaulo Alcantara1-1/+7
cifs_ses::ip_addr wasn't being updated in cifs_session_setup() when reconnecting SMB sessions thus returning wrong value in /proc/fs/cifs/DebugData. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-24Merge tag 'io_uring-5.19-2022-06-24' of git://git.kernel.dk/linux-blockLinus Torvalds1-11/+15
Pull io_uring fixes from Jens Axboe: "A few fixes that should go into the 5.19 release. All are fixing issues that either happened in this release, or going to stable. In detail: - A small series of fixlets for the poll handling, all destined for stable (Pavel) - Fix a merge error from myself that caused a potential -EINVAL for the recv/recvmsg flag setting (me) - Fix a kbuf recycling issue for partial IO (me) - Use the original request for the inflight tracking (me) - Fix an issue introduced this merge window with trace points using a custom decoder function, which won't work for perf (Dylan)" * tag 'io_uring-5.19-2022-06-24' of git://git.kernel.dk/linux-block: io_uring: use original request task for inflight tracking io_uring: move io_uring_get_opcode out of TP_printk io_uring: fix double poll leak on repolling io_uring: fix wrong arm_poll error handling io_uring: fail links when poll fails io_uring: fix req->apoll_events io_uring: fix merge error in checking send/recv addr2 flags io_uring: mark reissue requests with REQ_F_PARTIAL_IO
2022-06-24cifs: avoid deadlocks while updating ifaceShyam Prasad N2-15/+33
We use cifs_tcp_ses_lock to protect a lot of things. Not only does it protect the lists of connections, sessions, tree connects, open file lists, etc., we also use it to protect some fields in each of it's entries. In this case, cifs_mark_ses_for_reconnect takes the cifs_tcp_ses_lock to traverse the lists, and then calls cifs_update_iface. However, that can end up calling cifs_put_tcp_session, which picks up the same lock again. Avoid this by taking a ref for the session, drop the lock, and then call update iface. Also, in cifs_update_iface, avoid nested locking of iface_lock and chan_lock, as much as possible. When unavoidable, we need to pick iface_lock first. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-23ksmbd: check invalid FileOffset and BeyondFinalZero in FSCTL_ZERO_DATANamjae Jeon1-10/+17
FileOffset should not be greater than BeyondFinalZero in FSCTL_ZERO_DATA. And don't call ksmbd_vfs_zero_data() if length is zero. Cc: stable@vger.kernel.org Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-23ksmbd: set the range of bytes to zero without extending file size in ↵Namjae Jeon1-1/+3
FSCTL_ZERO_DATA generic/091, 263 test failed since commit f66f8b94e7f2 ("cifs: when extending a file with falloc we should make files not-sparse"). FSCTL_ZERO_DATA sets the range of bytes to zero without extending file size. The VFS_FALLOCATE_FL_KEEP_SIZE flag should be used even on non-sparse files. Cc: stable@vger.kernel.org Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-23ksmbd: remove duplicate flag set in smb2_writeHyunchul Lee1-4/+1
The writethrough flag is set again if is_rdma_channel is false. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-23Merge tag 'trace-v5.19-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Check for NULL in kretprobe_dispatcher() NULL can now be passed in, make sure it can handle it - Clean up unneeded #endif #ifdef of the same preprocessor check in the middle of the block. - Comment clean up - Remove unneeded initialization of the "ret" variable in __trace_uprobe_create() * tag 'trace-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/uprobes: Remove unwanted initialization in __trace_uprobe_create() tracefs: Fix syntax errors in comments tracing: Simplify conditional compilation code in tracing_set_tracer() tracing/kprobes: Check whether get_kretprobe() returns NULL in kretprobe_dispatcher()
2022-06-23io_uring: use original request task for inflight trackingJens Axboe1-1/+1
In prior kernels, we did file assignment always at prep time. This meant that req->task == current. But after deferring that assignment and then pushing the inflight tracking back in, we've got the inflight tracking using current when it should in fact now be using req->task. Fixup that error introduced by adding the inflight tracking back after file assignments got modifed. Fixes: 9cae36a094e7 ("io_uring: reinstate the inflight tracking") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-22cifs: periodically query network interfaces from serverShyam Prasad N4-1/+35
Currently, we only query the server for network interfaces information at the time of mount, and never afterwards. This can be a problem, especially for services like Azure, where the IP address of the channel endpoints can change over time. With this change, we schedule a 600s polling of this info from the server for each tree connect. An alternative for periodic polling was to do this only at the time of reconnect. But this could delay the reconnect time slightly. Also, there are some challenges w.r.t how we have cifs_reconnect implemented today. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-22cifs: during reconnect, update interface if necessaryShyam Prasad N3-0/+88
Going forward, the plan is to periodically query the server for it's interfaces (when multichannel is enabled). This change allows checking for inactive interfaces during reconnect, and reconnect to a new interface if necessary. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-22cifs: change iface_list from array to sorted linked listShyam Prasad N6-129/+201
A server's published interface list can change over time, and needs to be updated. We've storing iface_list as a simple array, which makes it difficult to manipulate an existing list. With this change, iface_list is modified into a linked list of interfaces, which is kept sorted by speed. Also added a reference counter for an iface entry, so that each channel can maintain a backpointer to the iface and drop it easily when needed. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-22smb3: use netname when available on secondary channelsShyam Prasad N1-2/+9
Some servers do not allow null netname contexts, which would cause multichannel to revert to single channel when mounting to some servers (e.g. Azure xSMB). The previous patch fixed that by avoiding incorrectly sending the netname context when there would be a null hostname sent in the netname context, while this patch fixes the null hostname for the secondary channel by using the hostname of the primary channel for the secondary channel. Fixes: 4c14d7043fede ("cifs: populate empty hostnames for extra channels") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-22Merge tag '9p-for-5.19-rc4' of https://github.com/martinetd/linuxLinus Torvalds4-17/+29
Pull 9pfs fixes from Dominique Martinet: "A couple of fid refcount and fscache fixes: - fid refcounting was incorrect in some corner cases and would leak resources, only freed at umount time. The first three commits fix three such cases - 'cache=loose' or fscache was broken when trying to write a partial page to a file with no read permission since the rework a few releases ago. The fix taken here is just to restore old behavior of using the special 'writeback_fid' for such reads, which is open as root/RDWR and such not get complains that we try to read on a WRONLY fid. Long-term it'd be nice to get rid of this and not issue the read at all (skip cache?) in such cases, but that direction hasn't progressed" * tag '9p-for-5.19-rc4' of https://github.com/martinetd/linux: 9p: fix EBADF errors in cached mode 9p: Fix refcounting during full path walks for fid lookups 9p: fix fid refcount leak in v9fs_vfs_get_link 9p: fix fid refcount leak in v9fs_vfs_atomic_open_dotl
2022-06-21io_uring: fix double poll leak on repollingPavel Begunkov1-0/+1
We have re-polling for partial IO, so a request can be polled twice. If it used two poll entries the first time then on the second io_arm_poll_handler() it will find the old apoll entry and NULL kmalloc()'ed second entry, i.e. apoll->double_poll, so leaking it. Fixes: 10c873334feba ("io_uring: allow re-poll if we made progress") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fee2452494222ecc7f1f88c8fb659baef971414a.1655852245.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-21io_uring: fix wrong arm_poll error handlingPavel Begunkov1-0/+1
Leaving ip.error set when a request was punted to task_work execution is problematic, don't forget to clear it. Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a6c84ef4182c6962380aebe11b35bdcb25b0ccfb.1655852245.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-21io_uring: fail links when poll failsPavel Begunkov1-0/+2
Don't forget to cancel all linked requests of poll request when __io_arm_poll_handler() failed. Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a78aad962460f9fdfe4aa4c0b62425c88f9415bc.1655852245.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-21Merge tag 'for-5.19-rc3-tag' of ↵Linus Torvalds2-9/+51
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - print more error messages for invalid mount option values - prevent remount with v1 space cache for subpage filesystem - fix hang during unmount when block group reclaim task is running * tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: add error messages to all unrecognized mount options btrfs: prevent remounting to v1 space cache for subpage mount btrfs: fix hang during unmount when block group reclaim task is running
2022-06-21afs: Fix dynamic root getattrDavid Howells1-1/+2
The recent patch to make afs_getattr consult the server didn't account for the pseudo-inodes employed by the dynamic root-type afs superblock not having a volume or a server to access, and thus an oops occurs if such a directory is stat'd. Fix this by checking to see if the vnode->volume pointer actually points anywhere before following it in afs_getattr(). This can be tested by stat'ing a directory in /afs. It may be sufficient just to do "ls /afs" and the oops looks something like: BUG: kernel NULL pointer dereference, address: 0000000000000020 ... RIP: 0010:afs_getattr+0x8b/0x14b ... Call Trace: <TASK> vfs_statx+0x79/0xf5 vfs_fstatat+0x49/0x62 Fixes: 2aeb8c86d499 ("afs: Fix afs_getattr() to refetch file status if callback break occurred") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/165408450783.1031787.7941404776393751186.stgit@warthog.procyon.org.uk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-21f2fs: do not count ENOENT for error caseJaegeuk Kim1-1/+3
Otherwise, we can get a wrong cp_error mark. Cc: <stable@vger.kernel.org> Fixes: a7b8618aa2f0 ("f2fs: avoid infinite loop to flush node pages") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-06-21io_uring: fix req->apoll_eventsPavel Begunkov1-4/+8
apoll_events should be set once in the beginning of poll arming just as poll->events and not change after. However, currently io_uring resets it on each __io_poll_execute() for no clear reason. There is also a place in __io_arm_poll_handler() where we add EPOLLONESHOT to downgrade a multishot, but forget to do the same thing with ->apoll_events, which is buggy. Fixes: 81459350d581e ("io_uring: cache req->apoll->events in req->cflags") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/0aef40399ba75b1a4d2c2e85e6e8fd93c02fc6e4.1655814213.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-21io_uring: fix merge error in checking send/recv addr2 flagsJens Axboe1-4/+0
With the dropping of the IOPOLL checking in the per-opcode handlers, we inadvertently left two checks in the recv/recvmsg and send/sendmsg prep handlers for the same thing, and one of them includes addr2 which holds the flags for these opcodes. Fix it up and kill the redundant checks. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-21btrfs: fix deadlock with fsync+fiemap+transaction commitJosef Bacik1-15/+52
We are hitting the following deadlock in production occasionally Task 1 Task 2 Task 3 Task 4 Task 5 fsync(A) start trans start commit falloc(A) lock 5m-10m start trans wait for commit fiemap(A) lock 0-10m wait for 5m-10m (have 0-5m locked) have btrfs_need_log_full_commit !full_sync wait_ordered_extents finish_ordered_io(A) lock 0-5m DEADLOCK We have an existing dependency of file extent lock -> transaction. However in fsync if we tried to do the fast logging, but then had to fall back to committing the transaction, we will be forced to call btrfs_wait_ordered_range() to make sure all of our extents are updated. This creates a dependency of transaction -> file extent lock, because btrfs_finish_ordered_io() will need to take the file extent lock in order to run the ordered extents. Fix this by stopping the transaction if we have to do the full commit and we attempted to do the fast logging. Then attach to the transaction and commit it if we need to. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-21btrfs: don't set lock_owner when locking extent buffer for readingZygo Blaxell1-3/+0
In 196d59ab9ccc "btrfs: switch extent buffer tree lock to rw_semaphore" the functions for tree read locking were rewritten, and in the process the read lock functions started setting eb->lock_owner = current->pid. Previously lock_owner was only set in tree write lock functions. Read locks are shared, so they don't have exclusive ownership of the underlying object, so setting lock_owner to any single value for a read lock makes no sense. It's mostly harmless because write locks and read locks are mutually exclusive, and none of the existing code in btrfs (btrfs_init_new_buffer and print_eb_refs_lock) cares what nonsense is written in lock_owner when no writer is holding the lock. KCSAN does care, and will complain about the data race incessantly. Remove the assignments in the read lock functions because they're useless noise. Fixes: 196d59ab9ccc ("btrfs: switch extent buffer tree lock to rw_semaphore") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-21btrfs: zoned: fix critical section of relocation inode writebackNaohiro Aota1-1/+2
We use btrfs_zoned_data_reloc_{lock,unlock} to allow only one process to write out to the relocation inode. That critical section must include all the IO submission for the inode. However, flush_write_bio() in extent_writepages() is out of the critical section, causing an IO submission outside of the lock. This leads to an out of the order IO submission and fail the relocation process. Fix it by extending the critical section. Fixes: 35156d852762 ("btrfs: zoned: only allow one process to add pages to a relocation inode") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-21btrfs: zoned: prevent allocation from previous data relocation BGNaohiro Aota5-2/+53
After commit 5f0addf7b890 ("btrfs: zoned: use dedicated lock for data relocation"), we observe IO errors on e.g, btrfs/232 like below. [09.0][T4038707] WARNING: CPU: 3 PID: 4038707 at fs/btrfs/extent-tree.c:2381 btrfs_cross_ref_exist+0xfc/0x120 [btrfs] <snip> [09.9][T4038707] Call Trace: [09.5][T4038707] <TASK> [09.3][T4038707] run_delalloc_nocow+0x7f1/0x11a0 [btrfs] [09.6][T4038707] ? test_range_bit+0x174/0x320 [btrfs] [09.2][T4038707] ? fallback_to_cow+0x980/0x980 [btrfs] [09.3][T4038707] ? find_lock_delalloc_range+0x33e/0x3e0 [btrfs] [09.5][T4038707] btrfs_run_delalloc_range+0x445/0x1320 [btrfs] [09.2][T4038707] ? test_range_bit+0x320/0x320 [btrfs] [09.4][T4038707] ? lock_downgrade+0x6a0/0x6a0 [09.2][T4038707] ? orc_find.part.0+0x1ed/0x300 [09.5][T4038707] ? __module_address.part.0+0x25/0x300 [09.0][T4038707] writepage_delalloc+0x159/0x310 [btrfs] <snip> [09.4][ C3] sd 10:0:1:0: [sde] tag#2620 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s [09.5][ C3] sd 10:0:1:0: [sde] tag#2620 Sense Key : Illegal Request [current] [09.9][ C3] sd 10:0:1:0: [sde] tag#2620 Add. Sense: Unaligned write command [09.5][ C3] sd 10:0:1:0: [sde] tag#2620 CDB: Write(16) 8a 00 00 00 00 00 02 f3 63 87 00 00 00 2c 00 00 [09.4][ C3] critical target error, dev sde, sector 396041272 op 0x1:(WRITE) flags 0x800 phys_seg 3 prio class 0 [09.9][ C3] BTRFS error (device dm-1): bdev /dev/mapper/dml_102_2 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 The IO errors occur when we allocate a regular extent in previous data relocation block group. On zoned btrfs, we use a dedicated block group to relocate a data extent. Thus, we allocate relocating data extents (pre-alloc) only from the dedicated block group and vice versa. Once the free space in the dedicated block group gets tight, a relocating extent may not fit into the block group. In that case, we need to switch the dedicated block group to the next one. Then, the previous one is now freed up for allocating a regular extent. The BG is already not enough to allocate the relocating extent, but there is still room to allocate a smaller extent. Now the problem happens. By allocating a regular extent while nocow IOs for the relocation is still on-going, we will issue WRITE IOs (for relocation) and ZONE APPEND IOs (for the regular writes) at the same time. That mixed IOs confuses the write pointer and arises the unaligned write errors. This commit introduces a new bit 'zoned_data_reloc_ongoing' to the btrfs_block_group. We set this bit before releasing the dedicated block group, and no extent are allocated from a block group having this bit set. This bit is similar to setting block_group->ro, but is different from it by allowing nocow writes to start. Once all the nocow IO for relocation is done (hooked from btrfs_finish_ordered_io), we reset the bit to release the block group for further allocation. Fixes: c2707a255623 ("btrfs: zoned: add a dedicated data relocation block group") CC: stable@vger.kernel.org # 5.16+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-21btrfs: do not BUG_ON() on failure to migrate space when replacing extentsFilipe Manana1-2/+4
At btrfs_replace_file_extents(), if we fail to migrate reserved metadata space from the transaction block reserve into the local block reserve, we trigger a BUG_ON(). This is because it should not be possible to have a failure here, as we reserved more space when we started the transaction than the space we want to migrate. However having a BUG_ON() is way too drastic, we can perfectly handle the failure and return the error to the caller. So just do that instead, and add a WARN_ON() to make it easier to notice the failure if it ever happens (which is particularly useful for fstests, and the warning will trigger a failure of a test case). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-21btrfs: add missing inode updates on each iteration when replacing extentsFilipe Manana4-0/+23
When replacing file extents, called during fallocate, hole punching, clone and deduplication, we may not be able to replace/drop all the target file extent items with a single transaction handle. We may get -ENOSPC while doing it, in which case we release the transaction handle, balance the dirty pages of the btree inode, flush delayed items and get a new transaction handle to operate on what's left of the target range. By dropping and replacing file extent items we have effectively modified the inode, so we should bump its iversion and update its mtime/ctime before we update the inode item. This is because if the transaction we used for partially modifying the inode gets committed by someone after we release it and before we finish the rest of the range, a power failure happens, then after mounting the filesystem our inode has an outdated iversion and mtime/ctime, corresponding to the values it had before we changed it. So add the missing iversion and mtime/ctime updates. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-21btrfs: fix race between reflinking and ordered extent completionFilipe Manana1-4/+11
While doing a reflink operation, if an ordered extent for a file range that does not overlap with the source and destination ranges of the reflink operation happens, we can end up having a failure in the reflink operation and return -EINVAL to user space. The following sequence of steps explains how this can happen: 1) We have the page at file offset 315392 dirty (under delalloc); 2) A reflink operation for this file starts, using the same file as both source and destination, the source range is [372736, 409600) (length of 36864 bytes) and the destination range is [208896, 245760); 3) At btrfs_remap_file_range_prep(), we flush all delalloc in the source and destination ranges, and wait for any ordered extents in those range to complete; 4) Still at btrfs_remap_file_range_prep(), we then flush all delalloc in the inode, but we neither wait for it to complete nor any ordered extents to complete. This results in starting delalloc for the page at file offset 315392 and creating an ordered extent for that single page range; 5) We then move to btrfs_clone() and enter the loop to find file extent items to copy from the source range to destination range; 6) In the first iteration we end up at last file extent item stored in leaf A: (...) item 131 key (143616 108 315392) itemoff 5101 itemsize 53 extent data disk bytenr 1903988736 nr 73728 extent data offset 12288 nr 61440 ram 73728 This represents the file range [315392, 376832), which overlaps with the source range to clone. @datal is set to 61440, key.offset is 315392 and @next_key_min_offset is therefore set to 376832 (315392 + 61440). @off (372736) is > key.offset (315392), so @new_key.offset is set to the value of @destoff (208896). @new_key.offset == @last_dest_end (208896) so @drop_start is set to 208896 (@new_key.offset). @datal is adjusted to 4096, as @off is > @key.offset. So in this iteration we call btrfs_replace_file_extents() for the range [208896, 212991] (a single page, which is [@drop_start, @new_key.offset + @datal - 1]). @last_dest_end is set to 212992 (@new_key.offset + @datal = 208896 + 4096 = 212992). Before the next iteration of the loop, @key.offset is set to the value 376832, which is @next_key_min_offset; 7) On the second iteration btrfs_search_slot() leaves us again at leaf A, but this time pointing beyond the last slot of leaf A, as that's where a key with offset 376832 should be at if it existed. So end up calling btrfs_next_leaf(); 8) btrfs_next_leaf() releases the path, but before it searches again the tree for the next key/leaf, the ordered extent for the single page range at file offset 315392 completes. That results in trimming the file extent item we processed before, adjusting its key offset from 315392 to 319488, reducing its length from 61440 to 57344 and inserting a new file extent item for that single page range, with a key offset of 315392 and a length of 4096. Leaf A now looks like: (...) item 132 key (143616 108 315392) itemoff 4995 itemsize 53 extent data disk bytenr 1801666560 nr 4096 extent data offset 0 nr 4096 ram 4096 item 133 key (143616 108 319488) itemoff 4942 itemsize 53 extent data disk bytenr 1903988736 nr 73728 extent data offset 16384 nr 57344 ram 73728 9) When btrfs_next_leaf() returns, it gives us a path pointing to leaf A at slot 133, since it's the first key that follows what was the last key we saw (143616 108 315392). In fact it's the same item we processed before, but its key offset was changed, so it counts as a new key; 10) So now we have: @key.offset == 319488 @datal == 57344 @off (372736) is > key.offset (319488), so @new_key.offset is set to 208896 (@destoff value). @new_key.offset (208896) != @last_dest_end (212992), so @drop_start is set to 212992 (@last_dest_end value). @datal is adjusted to 4096 because @off > @key.offset. So in this iteration we call btrfs_replace_file_extents() for the invalid range of [212992, 212991] (which is [@drop_start, @new_key.offset + @datal - 1]). This range is empty, the end offset is smaller than the start offset so btrfs_replace_file_extents() returns -EINVAL, which we end up returning to user space and fail the reflink operation. This all happens because the range of this file extent item was already processed in the previous iteration. This scenario can be triggered very sporadically by fsx from fstests, for example with test case generic/522. So fix this by having btrfs_clone() skip file extent items that cover a file range that we have already processed. CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-20smb3: fix empty netname context on secondary channelsSteve French1-6/+8
Some servers do not allow null netname contexts, which would cause multichannel to revert to single channel when mounting to some servers (e.g. Azure xSMB). Fixes: 4c14d7043fede ("cifs: populate empty hostnames for extra channels") Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-20io_uring: mark reissue requests with REQ_F_PARTIAL_IOJens Axboe1-2/+2
If we mark for reissue, we assume that the buffer will remain stable. Hence if are using a provided buffer, we need to ensure that we stick with it for the duration of that request. This only affects block devices that use provided buffers, as those are the only ones that get marked with REQ_F_REISSUE. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-19f2fs: fix iostat related lock protectionDaeho Jeong1-13/+18
Made iostat related locks safe to be called from irq context again. Cc: <stable@vger.kernel.org> Fixes: a1e09b03e6f5 ("f2fs: use iomap for direct I/O") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Tested-by: Eddie Huang <eddie.huang@mediatek.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-06-19f2fs: attach inline_data after setting compressionJaegeuk Kim1-6/+11
This fixes the below corruption. [345393.335389] F2FS-fs (vdb): sanity_check_inode: inode (ino=6d0, mode=33206) should not have inline_data, run fsck to fix Cc: <stable@vger.kernel.org> Fixes: 677a82b44ebf ("f2fs: fix to do sanity check for inline inode") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>