summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2022-11-06Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds6-7/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix a number of bugs, including some regressions, the most serious of which was one which would cause online resizes to fail with file systems with metadata checksums enabled. Also fix a warning caused by the newly added fortify string checker, plus some bugs that were found using fuzzed file systems" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix fortify warning in fs/ext4/fast_commit.c:1551 ext4: fix wrong return err in ext4_load_and_init_journal() ext4: fix warning in 'ext4_da_release_space' ext4: fix BUG_ON() when directory entry has invalid rec_len ext4: update the backup superblock's at the end of the online resize
2022-11-06Merge tag '6.1-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds6-62/+105
Pull cifs fixes from Steve French: "One symlink handling fix and two fixes foir multichannel issues with iterating channels, including for oplock breaks when leases are disabled" * tag '6.1-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix use-after-free on the link name cifs: avoid unnecessary iteration of tcp sessions cifs: always iterate smb sessions using primary channel
2022-11-06ext4: fix fortify warning in fs/ext4/fast_commit.c:1551Theodore Ts'o1-2/+3
With the new fortify string system, rework the memcpy to avoid this warning: memcpy: detected field-spanning write (size 60) of single field "&raw_inode->i_generation" at fs/ext4/fast_commit.c:1551 (size 4) Cc: stable@kernel.org Fixes: 54d9469bc515 ("fortify: Add run-time WARN for cross-field memcpy()") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-06ext4: fix wrong return err in ext4_load_and_init_journal()Jason Yan1-1/+1
The return value is wrong in ext4_load_and_init_journal(). The local variable 'err' need to be initialized before goto out. The original code in __ext4_fill_super() is fine because it has two return values 'ret' and 'err' and 'ret' is initialized as -EINVAL. After we factor out ext4_load_and_init_journal(), this code is broken. So fix it by directly returning -EINVAL in the error handler path. Cc: stable@kernel.org Fixes: 9c1dd22d7422 ("ext4: factor out ext4_load_and_init_journal()") Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221025040206.3134773-1-yanaijie@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-06ext4: fix warning in 'ext4_da_release_space'Ye Bin1-1/+2
Syzkaller report issue as follows: EXT4-fs (loop0): Free/Dirty block details EXT4-fs (loop0): free_blocks=0 EXT4-fs (loop0): dirty_blocks=0 EXT4-fs (loop0): Block reservation details EXT4-fs (loop0): i_reserved_data_blocks=0 EXT4-fs warning (device loop0): ext4_da_release_space:1527: ext4_da_release_space: ino 18, to_free 1 with only 0 reserved data blocks ------------[ cut here ]------------ WARNING: CPU: 0 PID: 92 at fs/ext4/inode.c:1528 ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1524 Modules linked in: CPU: 0 PID: 92 Comm: kworker/u4:4 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Workqueue: writeback wb_workfn (flush-7:0) RIP: 0010:ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1528 RSP: 0018:ffffc900015f6c90 EFLAGS: 00010296 RAX: 42215896cd52ea00 RBX: 0000000000000000 RCX: 42215896cd52ea00 RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000 RBP: 1ffff1100e907d96 R08: ffffffff816aa79d R09: fffff520002bece5 R10: fffff520002bece5 R11: 1ffff920002bece4 R12: ffff888021fd2000 R13: ffff88807483ecb0 R14: 0000000000000001 R15: ffff88807483e740 FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005555569ba628 CR3: 000000000c88e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ext4_es_remove_extent+0x1ab/0x260 fs/ext4/extents_status.c:1461 mpage_release_unused_pages+0x24d/0xef0 fs/ext4/inode.c:1589 ext4_writepages+0x12eb/0x3be0 fs/ext4/inode.c:2852 do_writepages+0x3c3/0x680 mm/page-writeback.c:2469 __writeback_single_inode+0xd1/0x670 fs/fs-writeback.c:1587 writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1870 wb_writeback+0x41f/0x7b0 fs/fs-writeback.c:2044 wb_do_writeback fs/fs-writeback.c:2187 [inline] wb_workfn+0x3cb/0xef0 fs/fs-writeback.c:2227 process_one_work+0x877/0xdb0 kernel/workqueue.c:2289 worker_thread+0xb14/0x1330 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> Above issue may happens as follows: ext4_da_write_begin ext4_create_inline_data ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS); ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA); __ext4_ioctl ext4_ext_migrate -> will lead to eh->eh_entries not zero, and set extent flag ext4_da_write_begin ext4_da_convert_inline_data_to_extent ext4_da_write_inline_data_begin ext4_da_map_blocks ext4_insert_delayed_block if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)) if (!ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk)) ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk)); -> will return 1 allocated = true; ext4_es_insert_delayed_block(inode, lblk, allocated); ext4_writepages mpage_map_and_submit_extent(handle, &mpd, &give_up_on_write); -> return -ENOSPC mpage_release_unused_pages(&mpd, give_up_on_write); -> give_up_on_write == 1 ext4_es_remove_extent ext4_da_release_space(inode, reserved); if (unlikely(to_free > ei->i_reserved_data_blocks)) -> to_free == 1 but ei->i_reserved_data_blocks == 0 -> then trigger warning as above To solve above issue, forbid inode do migrate which has inline data. Cc: stable@kernel.org Reported-by: syzbot+c740bb18df70ad00952e@syzkaller.appspotmail.com Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221018022701.683489-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-06ext4: fix BUG_ON() when directory entry has invalid rec_lenLuís Henriques1-1/+9
The rec_len field in the directory entry has to be a multiple of 4. A corrupted filesystem image can be used to hit a BUG() in ext4_rec_len_to_disk(), called from make_indexed_dir(). ------------[ cut here ]------------ kernel BUG at fs/ext4/ext4.h:2413! ... RIP: 0010:make_indexed_dir+0x53f/0x5f0 ... Call Trace: <TASK> ? add_dirent_to_buf+0x1b2/0x200 ext4_add_entry+0x36e/0x480 ext4_add_nondir+0x2b/0xc0 ext4_create+0x163/0x200 path_openat+0x635/0xe90 do_filp_open+0xb4/0x160 ? __create_object.isra.0+0x1de/0x3b0 ? _raw_spin_unlock+0x12/0x30 do_sys_openat2+0x91/0x150 __x64_sys_open+0x6c/0xa0 do_syscall_64+0x3c/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The fix simply adds a call to ext4_check_dir_entry() to validate the directory entry, returning -EFSCORRUPTED if the entry is invalid. CC: stable@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=216540 Signed-off-by: Luís Henriques <lhenriques@suse.de> Link: https://lore.kernel.org/r/20221012131330.32456-1-lhenriques@suse.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-04cifs: fix use-after-free on the link nameChenXiaoSong2-6/+25
xfstests generic/011 reported use-after-free bug as follows: BUG: KASAN: use-after-free in __d_alloc+0x269/0x859 Read of size 15 at addr ffff8880078933a0 by task dirstress/952 CPU: 1 PID: 952 Comm: dirstress Not tainted 6.1.0-rc3+ #77 Call Trace: __dump_stack+0x23/0x29 dump_stack_lvl+0x51/0x73 print_address_description+0x67/0x27f print_report+0x3e/0x5c kasan_report+0x7b/0xa8 kasan_check_range+0x1b2/0x1c1 memcpy+0x22/0x5d __d_alloc+0x269/0x859 d_alloc+0x45/0x20c d_alloc_parallel+0xb2/0x8b2 lookup_open+0x3b8/0x9f9 open_last_lookups+0x63d/0xc26 path_openat+0x11a/0x261 do_filp_open+0xcc/0x168 do_sys_openat2+0x13b/0x3f7 do_sys_open+0x10f/0x146 __se_sys_creat+0x27/0x2e __x64_sys_creat+0x55/0x6a do_syscall_64+0x40/0x96 entry_SYSCALL_64_after_hwframe+0x63/0xcd Allocated by task 952: kasan_save_stack+0x1f/0x42 kasan_set_track+0x21/0x2a kasan_save_alloc_info+0x17/0x1d __kasan_kmalloc+0x7e/0x87 __kmalloc_node_track_caller+0x59/0x155 kstrndup+0x60/0xe6 parse_mf_symlink+0x215/0x30b check_mf_symlink+0x260/0x36a cifs_get_inode_info+0x14e1/0x1690 cifs_revalidate_dentry_attr+0x70d/0x964 cifs_revalidate_dentry+0x36/0x62 cifs_d_revalidate+0x162/0x446 lookup_open+0x36f/0x9f9 open_last_lookups+0x63d/0xc26 path_openat+0x11a/0x261 do_filp_open+0xcc/0x168 do_sys_openat2+0x13b/0x3f7 do_sys_open+0x10f/0x146 __se_sys_creat+0x27/0x2e __x64_sys_creat+0x55/0x6a do_syscall_64+0x40/0x96 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 950: kasan_save_stack+0x1f/0x42 kasan_set_track+0x21/0x2a kasan_save_free_info+0x1c/0x34 ____kasan_slab_free+0x1c1/0x1d5 __kasan_slab_free+0xe/0x13 __kmem_cache_free+0x29a/0x387 kfree+0xd3/0x10e cifs_fattr_to_inode+0xb6a/0xc8c cifs_get_inode_info+0x3cb/0x1690 cifs_revalidate_dentry_attr+0x70d/0x964 cifs_revalidate_dentry+0x36/0x62 cifs_d_revalidate+0x162/0x446 lookup_open+0x36f/0x9f9 open_last_lookups+0x63d/0xc26 path_openat+0x11a/0x261 do_filp_open+0xcc/0x168 do_sys_openat2+0x13b/0x3f7 do_sys_open+0x10f/0x146 __se_sys_creat+0x27/0x2e __x64_sys_creat+0x55/0x6a do_syscall_64+0x40/0x96 entry_SYSCALL_64_after_hwframe+0x63/0xcd When opened a symlink, link name is from 'inode->i_link', but it may be reset to a new value when revalidate the dentry. If some processes get the link name on the race scenario, then UAF will happen on link name. Fix this by implementing 'get_link' interface to duplicate the link name. Fixes: 76894f3e2f71 ("cifs: improve symlink handling for smb2+") Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-04cifs: avoid unnecessary iteration of tcp sessionsShyam Prasad N3-51/+55
In a few places, we do unnecessary iterations of tcp sessions, even when the server struct is provided. The change avoids it and uses the server struct provided. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-04cifs: always iterate smb sessions using primary channelShyam Prasad N4-5/+25
smb sessions and tcons currently hang off primary channel only. Secondary channels have the lists as empty. Whenever there's a need to iterate sessions or tcons, we should use the list in the corresponding primary channel. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-04Merge tag 'xfs-6.1-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds29-386/+670
Pull xfs fixes from Darrick Wong: "Dave and I had thought that this would be a very quiet cycle, but we thought wrong. At first there were the usual trickle of minor bugfixes, but then Zorro pulled -rc1 and noticed complaints about the stronger memcpy checks w.r.t. flex arrays. Analyzing how to fix that revealed a bunch of validation gaps in validating ondisk log items during recovery, and then a customer hit an infinite loop in the refcounting code on a corrupt filesystem. So. This largeish batch of fixes addresses all those problems, I hope. Summary: - Fix a UAF bug during log recovery - Fix memory leaks when mount fails - Detect corrupt bestfree information in a directory block - Fix incorrect return value type for the dax page fault handlers - Fix fortify complaints about memcpy of xfs log item objects - Strengthen inadequate validation of recovered log items - Fix incorrectly declared flex array in EFI log item structs - Log corrupt log items for debugging purposes - Fix infinite loop problems in the refcount code if the refcount btree node block keys are corrupt - Fix infinite loop problems in the refcount code if the refcount btree records suffer MSB bitflips - Add more sanity checking to continued defer ops to prevent overflows from one AG to the next or off EOFS" * tag 'xfs-6.1-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (28 commits) xfs: rename XFS_REFC_COW_START to _COWFLAG xfs: fix uninitialized list head in struct xfs_refcount_recovery xfs: fix agblocks check in the cow leftover recovery function xfs: check record domain when accessing refcount records xfs: remove XFS_FIND_RCEXT_SHARED and _COW xfs: refactor domain and refcount checking xfs: report refcount domain in tracepoints xfs: track cow/shared record domains explicitly in xfs_refcount_irec xfs: refactor refcount record usage in xchk_refcountbt_rec xfs: dump corrupt recovered log intent items to dmesg consistently xfs: move _irec structs to xfs_types.h xfs: actually abort log recovery on corrupt intent-done log items xfs: check deferred refcount op continuation parameters xfs: refactor all the EFI/EFD log item sizeof logic xfs: create a predicate to verify per-AG extents xfs: fix memcpy fortify errors in EFI log format copying xfs: make sure aglen never goes negative in xfs_refcount_adjust_extents xfs: fix memcpy fortify errors in RUI log format copying xfs: fix memcpy fortify errors in CUI log format copying xfs: fix memcpy fortify errors in BUI log format copying ...
2022-11-03Merge tag 'fuse-fixes-6.1-rc4' of ↵Linus Torvalds2-1/+13
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Fix two rarely triggered but long-standing issues" * tag 'fuse-fixes-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add file_modified() to fallocate fuse: fix readdir cache race
2022-11-03Merge tag 'for-6.1-rc3-tag' of ↵Linus Torvalds5-49/+91
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A batch of error handling fixes for resource leaks, fixes for nowait mode in combination with direct and buffered IO: - direct IO + dsync + nowait could miss a sync of the file after write, add handling for this combination - buffered IO + nowait should not fail with ENOSPC, only blocking IO could determine that - error handling fixes: - fix inode reserve space leak due to nowait buffered write - check the correct variable after allocation (direct IO submit) - fix inode list leak during backref walking - fix ulist freeing in self tests" * tag 'for-6.1-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix inode reserve space leak due to nowait buffered write btrfs: fix nowait buffered write returning -ENOSPC btrfs: remove pointless and double ulist frees in error paths of qgroup tests btrfs: fix ulist leaks in error paths of qgroup self tests btrfs: fix inode list leak during backref walking at find_parent_nodes() btrfs: fix inode list leak during backref walking at resolve_indirect_refs() btrfs: fix lost file sync on direct IO write with nowait and dsync iocb btrfs: fix a memory allocation failure test in btrfs_submit_direct
2022-11-02Merge tag 'nfs-for-6.1-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds18-75/+80
Pull NFS client bugfixes from Anna Schumaker: - Fix some coccicheck warnings - Avoid memcpy() run-time warning - Fix up various state reclaim / RECLAIM_COMPLETE errors - Fix a null pointer dereference in sysfs - Fix LOCK races - Fix gss_unwrap_resp_integ() crasher - Fix zero length clones - Fix memleak when allocate slot fails * tag 'nfs-for-6.1-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: nfs4: Fix kmemleak when allocate slot failed NFSv4.2: Fixup CLONE dest file size for zero-length count SUNRPC: Fix crasher in gss_unwrap_resp_integ() NFSv4: Retry LOCK on OLD_STATEID during delegation return SUNRPC: Fix null-ptr-deref when xps sysfs alloc failed NFSv4.1: We must always send RECLAIM_COMPLETE after a reboot NFSv4.1: Handle RECLAIM_COMPLETE trunking errors NFSv4: Fix a potential state reclaim deadlock NFS: Avoid memcpy() run-time warning for struct sockaddr overflows nfs: Remove redundant null checks before kfree
2022-11-02btrfs: fix inode reserve space leak due to nowait buffered writeFilipe Manana1-1/+3
During a nowait buffered write, if we fail to balance dirty pages we exit btrfs_buffered_write() without releasing the delalloc space reserved for an extent, resulting in leaking space from the inode's block reserve. So fix that by releasing the delalloc space for the extent when balancing dirty pages fails. Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/all/202210111304.d369bc32-yujie.liu@intel.com Fixes: 965f47aeb5de ("btrfs: make btrfs_buffered_write nowait compatible") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-02btrfs: fix nowait buffered write returning -ENOSPCFilipe Manana1-0/+3
If we are doing a buffered write in NOWAIT context and we can't reserve metadata space due to -ENOSPC, then we should return -EAGAIN so that we retry the write in a context allowed to block and do metadata reservation with flushing, which might succeed this time due to the allowed flushing. Returning -ENOSPC while in NOWAIT context simply makes some writes fail with -ENOSPC when they would likely succeed after switching from NOWAIT context to blocking context. That is unexpected behaviour and even fio complains about it with a warning like this: fio: io_u error on file /mnt/sdi/task_0.0.0: No space left on device: write offset=1535705088, buflen=65536 fio: pid=592630, err=28/file:io_u.c:1846, func=io_u error, error=No space left on device The fio's job config is this: [global] bs=64K ioengine=io_uring iodepth=1 size=2236962133 nr_files=1 filesize=2236962133 direct=0 runtime=10 fallocate=posix io_size=2236962133 group_reporting time_based [task_0] rw=randwrite directory=/mnt/sdi numjobs=4 So fix this by returning -EAGAIN if we are in NOWAIT context and the metadata reservation failed with -ENOSPC. Fixes: 304e45acdb8f ("btrfs: plumb NOWAIT through the write path") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-02btrfs: remove pointless and double ulist frees in error paths of qgroup testsFilipe Manana1-12/+4
Several places in the qgroup self tests follow the pattern of freeing the ulist pointer they passed to btrfs_find_all_roots() if the call to that function returned an error. That is pointless because that function always frees the ulist in case it returns an error. Also In some places like at test_multiple_refs(), after a call to btrfs_qgroup_account_extent() we also leave "old_roots" and "new_roots" pointing to ulists that were freed, because btrfs_qgroup_account_extent() has freed those ulists, and if after that the next call to btrfs_find_all_roots() fails, we call ulist_free() on the "old_roots" ulist again, resulting in a double free. So remove those calls to reduce the code size and avoid double ulist free in case of an error. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-02btrfs: fix ulist leaks in error paths of qgroup self testsFilipe Manana1-5/+15
In the test_no_shared_qgroup() and test_multiple_refs() qgroup self tests, if we fail to add the tree ref, remove the extent item or remove the extent ref, we are returning from the test function without freeing the "old_roots" ulist that was allocated by the previous calls to btrfs_find_all_roots(). Fix that by calling ulist_free() before returning. Fixes: 442244c96332 ("btrfs: qgroup: Switch self test to extent-oriented qgroup mechanism.") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-02btrfs: fix inode list leak during backref walking at find_parent_nodes()Filipe Manana1-1/+17
During backref walking, at find_parent_nodes(), if we are dealing with a data extent and we get an error while resolving the indirect backrefs, at resolve_indirect_refs(), or in the while loop that iterates over the refs in the direct refs rbtree, we end up leaking the inode lists attached to the direct refs we have in the direct refs rbtree that were not yet added to the refs ulist passed as argument to find_parent_nodes(). Since they were not yet added to the refs ulist and prelim_release() does not free the lists, on error the caller can only free the lists attached to the refs that were added to the refs ulist, all the remaining refs get their inode lists never freed, therefore leaking their memory. Fix this by having prelim_release() always free any attached inode list to each ref found in the rbtree, and have find_parent_nodes() set the ref's inode list to NULL once it transfers ownership of the inode list to a ref added to the refs ulist passed to find_parent_nodes(). Fixes: 86d5f9944252 ("btrfs: convert prelimary reference tracking to use rbtrees") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-02btrfs: fix inode list leak during backref walking at resolve_indirect_refs()Filipe Manana1-19/+17
During backref walking, at resolve_indirect_refs(), if we get an error we jump to the 'out' label and call ulist_free() on the 'parents' ulist, which frees all the elements in the ulist - however that does not free any inode lists that may be attached to elements, through the 'aux' field of a ulist node, so we end up leaking lists if we have any attached to the unodes. Fix this by calling free_leaf_list() instead of ulist_free() when we exit from resolve_indirect_refs(). The static function free_leaf_list() is moved up for this to be possible and it's slightly simplified by removing unnecessary code. Fixes: 3301958b7c1d ("Btrfs: add inodes before dropping the extent lock in find_all_leafs") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-01Merge tag 'nfsd-6.1-3' of ↵Linus Torvalds1-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix a loop that occurs when using multiple net namespaces * tag 'nfsd-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: fix net-namespace logic in __nfsd_file_cache_purge
2022-11-01nfsd: fix net-namespace logic in __nfsd_file_cache_purgeJeff Layton1-3/+2
If the namespace doesn't match the one in "net", then we'll continue, but that doesn't cause another rhashtable_walk_next call, so it will loop infinitely. Fixes: ce502f81ba88 ("NFSD: Convert the filecache to use rhashtable") Reported-by: Petr Vorel <pvorel@suse.cz> Link: https://lore.kernel.org/ltp/Y1%2FP8gDAcWC%2F+VR3@pevik/ Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-10-31Merge tag 'for-6.1-rc3-tag' of ↵Linus Torvalds10-40/+73
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes and regression fixes: - fix a corner case when handling tree-mod-log chagnes in reallocated notes - fix crash on raid0 filesystems created with <5.4 mkfs.btrfs that could lead to division by zero - add missing super block checksum verification after thawing filesystem - handle one more case in send when dealing with orphan files - fix parameter type mismatch for generation when reading dentry - improved error handling in raid56 code - better struct bio packing after recent cleanups" * tag 'for-6.1-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: don't use btrfs_chunk::sub_stripes from disk btrfs: fix type of parameter generation in btrfs_get_dentry btrfs: send: fix send failure of a subcase of orphan inodes btrfs: make thaw time super block check to also verify checksum btrfs: fix tree mod log mishandling of reallocated nodes btrfs: reorder btrfs_bio for better packing btrfs: raid56: avoid double freeing for rbio if full_stripe_write() failed btrfs: raid56: properly handle the error when unable to find the missing stripe
2022-10-31xfs: rename XFS_REFC_COW_START to _COWFLAGDarrick J. Wong3-6/+6
We've been (ab)using XFS_REFC_COW_START as both an integer quantity and a bit flag, even though it's *only* a bit flag. Rename the variable to reflect its nature and update the cast target since we're not supposed to be comparing it to xfs_agblock_t now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: fix uninitialized list head in struct xfs_refcount_recoveryDarrick J. Wong1-4/+6
We're supposed to initialize the list head of an object before adding it to another list. Fix that, and stop using the kmem_{alloc,free} calls from the Irix days. Fixes: 174edb0e46e5 ("xfs: store in-progress CoW allocations in the refcount btree") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: fix agblocks check in the cow leftover recovery functionDarrick J. Wong1-1/+3
As we've seen, refcount records use the upper bit of the rc_startblock field to ensure that all the refcount records are at the right side of the refcount btree. This works because an AG is never allowed to have more than (1U << 31) blocks in it. If we ever encounter a filesystem claiming to have that many blocks, we absolutely do not want reflink touching it at all. However, this test at the start of xfs_refcount_recover_cow_leftovers is slightly incorrect -- it /should/ be checking that agblocks isn't larger than the XFS_MAX_CRC_AG_BLOCKS constant, and it should check that the constant is never large enough to conflict with that CoW flag. Note that the V5 superblock verifier has not historically rejected filesystems where agblocks >= XFS_MAX_CRC_AG_BLOCKS, which is why this ended up in the COW recovery routine. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: check record domain when accessing refcount recordsDarrick J. Wong2-14/+43
Now that we've separated the startblock and CoW/shared extent domain in the incore refcount record structure, check the domain whenever we retrieve a record to ensure that it's still in the domain that we want. Depending on the circumstances, a change in domain either means we're done processing or that we've found a corruption and need to fail out. The refcount check in xchk_xref_is_cow_staging is redundant since _get_rec has done that for a long time now, so we can get rid of it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: remove XFS_FIND_RCEXT_SHARED and _COWDarrick J. Wong1-31/+17
Now that we have an explicit enum for shared and CoW staging extents, we can get rid of the old FIND_RCEXT flags. Omit a couple of conversions that disappear in the next patches. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: refactor domain and refcount checkingDarrick J. Wong3-10/+17
Create a helper function to ensure that CoW staging extent records have a single refcount and that shared extent records have more than 1 refcount. We'll put this to more use in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: report refcount domain in tracepointsDarrick J. Wong2-9/+43
Now that we've broken out the startblock and shared/cow domain in the incore refcount extent record structure, update the tracepoints to report the domain. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: track cow/shared record domains explicitly in xfs_refcount_irecDarrick J. Wong5-67/+151
Just prior to committing the reflink code into upstream, the xfs maintainer at the time requested that I find a way to shard the refcount records into two domains -- one for records tracking shared extents, and a second for tracking CoW staging extents. The idea here was to minimize mount time CoW reclamation by pushing all the CoW records to the right edge of the keyspace, and it was accomplished by setting the upper bit in rc_startblock. We don't allow AGs to have more than 2^31 blocks, so the bit was free. Unfortunately, this was a very late addition to the codebase, so most of the refcount record processing code still treats rc_startblock as a u32 and pays no attention to whether or not the upper bit (the cow flag) is set. This is a weakness is theoretically exploitable, since we're not fully validating the incoming metadata records. Fuzzing demonstrates practical exploits of this weakness. If the cow flag of a node block key record is corrupted, a lookup operation can go to the wrong record block and start returning records from the wrong cow/shared domain. This causes the math to go all wrong (since cow domain is still implicit in the upper bit of rc_startblock) and we can crash the kernel by tricking xfs into jumping into a nonexistent AG and tripping over xfs_perag_get(mp, <nonexistent AG>) returning NULL. To fix this, start tracking the domain as an explicit part of struct xfs_refcount_irec, adjust all refcount functions to check the domain of a returned record, and alter the function definitions to accept them where necessary. Found by fuzzing keys[2].cowflag = add in xfs/464. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: refactor refcount record usage in xchk_refcountbt_recDarrick J. Wong1-30/+24
Consolidate the open-coded xfs_refcount_irec fields into an actual struct and use the existing _btrec_to_irec to decode the ondisk record. This will reduce code churn in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: move _irec structs to xfs_types.hDarrick J. Wong2-20/+20
Structure definitions for incore objects do not belong in the ondisk format header. Move them to the incore types header where they belong. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: check deferred refcount op continuation parametersDarrick J. Wong1-2/+36
If we're in the middle of a deferred refcount operation and decide to roll the transaction to avoid overflowing the transaction space, we need to check the new agbno/aglen parameters that we're about to record in the new intent. Specifically, we need to check that the new extent is completely within the filesystem, and that continuation does not put us into a different AG. If the keys of a node block are wrong, the lookup to resume an xfs_refcount_adjust_extents operation can put us into the wrong record block. If this happens, we might not find that we run out of aglen at an exact record boundary, which will cause the loop control to do the wrong thing. The previous patch should take care of that problem, but let's add this extra sanity check to stop corruption problems sooner than later. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: create a predicate to verify per-AG extentsDarrick J. Wong7-26/+24
Create a predicate function to verify that a given agbno/blockcount pair fit entirely within a single allocation group and don't suffer mathematical overflows. Refactor the existng open-coded logic; we're going to add more calls to this function in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: make sure aglen never goes negative in xfs_refcount_adjust_extentsDarrick J. Wong1-3/+17
Prior to calling xfs_refcount_adjust_extents, we trimmed agbno/aglen such that the end of the range would not be in the middle of a refcount record. If this is no longer the case, something is seriously wrong with the btree. Bail out with a corruption error. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: dump corrupt recovered log intent items to dmesg consistentlyDarrick J. Wong5-20/+43
If log recovery decides that an intent item is corrupt and wants to abort the mount, capture a hexdump of the corrupt log item in the kernel log for further analysis. Some of the log item code already did this, so we're fixing the rest to do it consistently. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: actually abort log recovery on corrupt intent-done log itemsDarrick J. Wong2-5/+21
If log recovery picks up intent-done log items that are not of the correct size it needs to abort recovery and fail the mount. Debug assertions are not good enough. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: refactor all the EFI/EFD log item sizeof logicDarrick J. Wong4-57/+88
Refactor all the open-coded sizeof logic for EFI/EFD log item and log format structures into common helper functions whose names reflect the struct names. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: fix memcpy fortify errors in EFI log format copyingDarrick J. Wong4-22/+36
Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of memcpy. Since we're already fixing problems with BUI item copying, we should fix it everything else. An extra difficulty here is that the ef[id]_extents arrays are declared as single-element arrays. This is not the convention for flex arrays in the modern kernel, and it causes all manner of problems with static checking tools, since they often cannot tell the difference between a single element array and a flex array. So for starters, change those array[1] declarations to array[] declarations to signal that they are proper flex arrays and adjust all the "size-1" expressions to fit the new declaration style. Next, refactor the xfs_efi_copy_format function to handle the copying of the head and the flex array members separately. While we're at it, fix a minor validation deficiency in the recovery function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: fix memcpy fortify errors in RUI log format copyingDarrick J. Wong2-31/+30
Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of memcpy. Since we're already fixing problems with BUI item copying, we should fix it everything else. Refactor the xfs_rui_copy_format function to handle the copying of the head and the flex array members separately. While we're at it, fix a minor validation deficiency in the recovery function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: fix memcpy fortify errors in CUI log format copyingDarrick J. Wong2-24/+25
Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of memcpy. Since we're already fixing problems with BUI item copying, we should fix it everything else. Refactor the xfs_cui_copy_format function to handle the copying of the head and the flex array members separately. While we're at it, fix a minor validation deficiency in the recovery function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: fix memcpy fortify errors in BUI log format copyingDarrick J. Wong2-24/+27
Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of memcpy. Unfortunately, it doesn't handle flex arrays correctly: ------------[ cut here ]------------ memcpy: detected field-spanning write (size 48) of single field "dst_bui_fmt" at fs/xfs/xfs_bmap_item.c:628 (size 16) Fix this by refactoring the xfs_bui_copy_format function to handle the copying of the head and the flex array members separately. While we're at it, fix a minor validation deficiency in the recovery function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31xfs: fix validation in attr log item recoveryDarrick J. Wong1-31/+23
Before we start fixing all the complaints about memcpy'ing log items around, let's fix some inadequate validation in the xattr log item recovery code and get rid of the (now trivial) copy_format function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31btrfs: fix lost file sync on direct IO write with nowait and dsync iocbFilipe Manana3-10/+31
When doing a direct IO write using a iocb with nowait and dsync set, we end up not syncing the file once the write completes. This is because we tell iomap to not call generic_write_sync(), which would result in calling btrfs_sync_file(), in order to avoid a deadlock since iomap can call it while we are holding the inode's lock and btrfs_sync_file() needs to acquire the inode's lock. The deadlock happens only if the write happens synchronously, when iomap_dio_rw() calls iomap_dio_complete() before it returns. Instead we do the sync ourselves at btrfs_do_write_iter(). For a nowait write however we can end up not doing the sync ourselves at at btrfs_do_write_iter() because the write could have been queued, and therefore we get -EIOCBQUEUED returned from iomap in such case. That makes us skip the sync call at btrfs_do_write_iter(), as we don't do it for any error returned from btrfs_direct_write(). We can't simply do the call even if -EIOCBQUEUED is returned, since that would block the task waiting for IO, both for the data since there are bios still in progress as well as potentially blocking when joining a log transaction and when syncing the log (writing log trees, super blocks, etc). So let iomap do the sync call itself and in order to avoid deadlocks for the case of synchronous writes (without nowait), use __iomap_dio_rw() and have ourselves call iomap_dio_complete() after unlocking the inode. A test case will later be sent for fstests, after this is fixed in Linus' tree. Fixes: 51bd9563b678 ("btrfs: fix deadlock due to page faults during direct IO reads and writes") Reported-by: Марк Коренберг <socketpair@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAEmTpZGRKbzc16fWPvxbr6AfFsQoLmz-Lcg-7OgJOZDboJ+SGQ@mail.gmail.com/ CC: stable@vger.kernel.org # 6.0+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-31xfs: fix incorrect return type for fsdax fault handlersDarrick J. Wong1-3/+4
The kernel robot complained about this: >> fs/xfs/xfs_file.c:1266:31: sparse: sparse: incorrect type in return expression (different base types) @@ expected int @@ got restricted vm_fault_t @@ fs/xfs/xfs_file.c:1266:31: sparse: expected int fs/xfs/xfs_file.c:1266:31: sparse: got restricted vm_fault_t fs/xfs/xfs_file.c:1314:21: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted vm_fault_t [usertype] ret @@ got int @@ fs/xfs/xfs_file.c:1314:21: sparse: expected restricted vm_fault_t [usertype] ret fs/xfs/xfs_file.c:1314:21: sparse: got int Fix the incorrect return type for these two functions. While we're at it, make the !fsdax version return VM_FAULT_SIGBUS because a zero return value will cause some callers to try to lock vmf->page, which we never set here. Fixes: ea6c49b784f0 ("xfs: support CoW in fsdax mode") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31btrfs: fix a memory allocation failure test in btrfs_submit_directChristophe JAILLET1-1/+1
After allocation 'dip' is tested instead of 'dip->csums'. Fix it. Fixes: 642c5d34da53 ("btrfs: allocate the btrfs_dio_private as part of the iomap dio bio") CC: stable@vger.kernel.org # 5.19+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-30Merge tag '6.1-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2-3/+11
Pull cifs fixes from Steve French: - use after free fix for reconnect race - two memory leak fixes * tag '6.1-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix use-after-free caused by invalid pointer `hostname` cifs: Fix pages leak when writedata alloc failed in cifs_write_from_iter() cifs: Fix pages array leak when writedata alloc failed in cifs_writedata_alloc()
2022-10-29Merge tag 'mm-hotfixes-stable-2022-10-28' of ↵Linus Torvalds5-15/+23
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "Eight fix pre-6.0 bugs and the remainder address issues which were introduced in the 6.1-rc merge cycle, or address issues which aren't considered sufficiently serious to warrant a -stable backport" * tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits) mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region lib: maple_tree: remove unneeded initialization in mtree_range_walk() mmap: fix remap_file_pages() regression mm/shmem: ensure proper fallback if page faults mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page() x86: fortify: kmsan: fix KMSAN fortify builds x86: asm: make sure __put_user_size() evaluates pointer once Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default x86/purgatory: disable KMSAN instrumentation mm: kmsan: export kmsan_copy_page_meta() mm: migrate: fix return value if all subpages of THPs are migrated successfully mm/uffd: fix vma check on userfault for wp mm: prep_compound_tail() clear page->private mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs mm/page_isolation: fix clang deadcode warning fs/ext4/super.c: remove unused `deprecated_msg' ipc/msg.c: fix percpu_counter use after free memory tier, sysfs: rename attribute "nodes" to "nodelist" MAINTAINERS: git://github.com -> https://github.com for nilfs2 mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops ...
2022-10-28mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off regionSebastian Andrzej Siewior1-1/+1
lru_gen_add_mm() has been added within an IRQ-off region in the commit mentioned below. The other invocations of lru_gen_add_mm() are not within an IRQ-off region. The invocation within IRQ-off region is problematic on PREEMPT_RT because the function is using a spin_lock_t which must not be used within IRQ-disabled regions. The other invocations of lru_gen_add_mm() occur while task_struct::alloc_lock is acquired. Move lru_gen_add_mm() after interrupts are enabled and before task_unlock(). Link: https://lkml.kernel.org/r/20221026134830.711887-1-bigeasy@linutronix.de Fixes: bd74fdaea1460 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28fs/ext4/super.c: remove unused `deprecated_msg'Andrew Morton1-4/+0
fs/ext4/super.c:1744:19: warning: 'deprecated_msg' defined but not used [-Wunused-const-variable=] Reported-by: kernel test robot <lkp@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>