summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-01-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+6
Merge misc fixes from Andrew Morton: "18 patches. Subsystems affected by this patch series: mm (pagealloc, memcg, kasan, memory-failure, and highmem), ubsan, proc, and MAINTAINERS" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MAINTAINERS: add a couple more files to the Clang/LLVM section proc_sysctl: fix oops caused by incorrect command parameters powerpc/mm/highmem: use __set_pte_at() for kmap_local() mips/mm/highmem: use set_pte() for kmap_local() mm/highmem: prepare for overriding set_pte_at() sparc/mm/highmem: flush cache and TLB mm: fix page reference leak in soft_offline_page() ubsan: disable unsigned-overflow check for i386 kasan, mm: fix resetting page_alloc tags for HW_TAGS kasan, mm: fix conflicts with init_on_alloc/free kasan: fix HW_TAGS boot parameters kasan: fix incorrect arguments passing in kasan_add_zero_shadow kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow mm: fix numa stats for thp migration mm: memcg: fix memcg file_dirty numa stat mm: memcg/slab: optimize objcg stock draining mm: fix initialization of struct page for holes in memory layout x86/setup: don't remove E820_TYPE_RAM for pfn 0
2021-01-24Merge tag 'driver-core-5.11-rc5' of ↵Linus Torvalds1-41/+24
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some small driver core fixes for 5.11-rc5 that resolve some reported problems: - revert of a -rc1 patch that was causing problems with some machines - device link device name collision problem fix (busses only have to name devices unique to their bus, not unique to all busses) - kernfs splice bugfixes to resolve firmware loading problems for Qualcomm systems. - other tiny driver core fixes for minor issues reported. All of these have been in linux-next with no reported problems" * tag 'driver-core-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: Fix device link device name collision driver core: Extend device_is_dependent() kernfs: wire up ->splice_read and ->splice_write kernfs: implement ->write_iter kernfs: implement ->read_iter Revert "driver core: Reorder devices on successful probe" Driver core: platform: Add extra error check in devm_platform_get_irqs_affinity() drivers core: Free dma_range_map when driver probe failed
2021-01-24proc_sysctl: fix oops caused by incorrect command parametersXiaoming Ni1-1/+6
The process_sysctl_arg() does not check whether val is empty before invoking strlen(val). If the command line parameter () is incorrectly configured and val is empty, oops is triggered. For example: "hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is triggered. The call stack is as follows: Kernel command line: .... hung_task_panic ...... Call trace: __pi_strlen+0x10/0x98 parse_args+0x278/0x344 do_sysctl_args+0x8c/0xfc kernel_init+0x5c/0xf4 ret_from_fork+0x10/0x30 To fix it, check whether "val" is empty when "phram" is a sysctl field. Error codes are returned in the failure branch, and error logs are generated by parse_args(). Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com Fixes: 3db978d480e2843 ("kernel/sysctl: support setting sysctl parameters from kernel command line") Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: <stable@vger.kernel.org> [5.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-24Merge tag '5.11-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2-4/+4
Pull cifs fixes from Steve French: "An important signal handling patch for stable, and two small cleanup patches" * tag '5.11-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: do not fail __smb_send_rqst if non-fatal signals are pending fs/cifs: Simplify bool comparison. fs/cifs: Assign boolean values to a bool variable
2021-01-23cifs: do not fail __smb_send_rqst if non-fatal signals are pendingRonnie Sahlberg1-2/+2
RHBZ 1848178 The original intent of returning an error in this function in the patch: "CIFS: Mask off signals when sending SMB packets" was to avoid interrupting packet send in the middle of sending the data (and thus breaking an SMB connection), but we also don't want to fail the request for non-fatal signals even before we have had a chance to try to send it (the reported problem could be reproduced e.g. by exiting a child process when the parent process was in the midst of calling futimens to update a file's timestamps). In addition, since the signal may remain pending when we enter the sending loop, we may end up not sending the whole packet before TCP buffers become full. In this case the code returns -EINTR but what we need here is to return -ERESTARTSYS instead to allow system calls to be restarted. Fixes: b30c74c73c78 ("CIFS: Mask off signals when sending SMB packets") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-01-22Merge tag 'ceph-for-5.11-rc5' of git://github.com/ceph/ceph-clientLinus Torvalds1-17/+17
Pull ceph fixes from Ilya Dryomov: "A patch to zero out sensitive cryptographic data and two minor cleanups prompted by the fact that a bunch of code was moved in this cycle" * tag 'ceph-for-5.11-rc5' of git://github.com/ceph/ceph-client: libceph: fix "Boolean result is used in bitwise operation" warning libceph, ceph: disambiguate ceph_connection_operations handlers libceph: zero out session key and connection secret
2021-01-21Merge tag 'fs_for_v5.11-rc5' of ↵Linus Torvalds2-14/+17
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fs and udf fixes from Jan Kara: "A lazytime handling fix from Eric Biggers and a fix of UDF session handling for large devices" * tag 'fs_for_v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: fix the problem that the disc content is not displayed fs: fix lazytime expiration handling in __writeback_single_inode()
2021-01-21kernfs: wire up ->splice_read and ->splice_writeChristoph Hellwig1-0/+2
Wire up the splice_read and splice_write methods to the default helpers using ->read_iter and ->write_iter now that those are implemented for kernfs. This restores support to use splice and sendfile on kernfs files. Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") Reported-by: Siddharth Gupta <sidgup@codeaurora.org> Tested-by: Siddharth Gupta <sidgup@codeaurora.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210120204631.274206-4-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-21kernfs: implement ->write_iterChristoph Hellwig1-18/+10
Switch kernfs to implement the write_iter method instead of plain old write to prepare to supporting splice and sendfile again. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210120204631.274206-3-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-21kernfs: implement ->read_iterChristoph Hellwig1-23/+12
Switch kernfs to implement the read_iter method instead of plain old read to prepare to supporting splice and sendfile again. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210120204631.274206-2-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-20Merge tag 'for-5.11-rc4-tag' of ↵Linus Torvalds6-11/+29
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more one line fixes for various bugs, stable material. - fix send when emitting clone operation from the same file and root - fix double free on error when cleaning backrefs - lockdep fix during relocation - handle potential error during reloc when starting transaction - skip running delayed refs during commit (leftover from code removal in this dev cycle)" * tag 'for-5.11-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: don't clear ret in btrfs_start_dirty_block_groups btrfs: fix lockdep splat in btrfs_recover_relocation btrfs: do not double free backref nodes on error btrfs: don't get an EINTR during drop_snapshot for reloc btrfs: send: fix invalid clone operations when cloning from the same file and root btrfs: no need to run delayed refs after commit_fs_roots during commit
2021-01-20cachefiles: Drop superfluous readpages aops NULL checkTakashi Iwai1-2/+0
After the recent actions to convert readpages aops to readahead, the NULL checks of readpages aops in cachefiles_read_or_alloc_page() may hit falsely. More badly, it's an ASSERT() call, and this panics. Drop the superfluous NULL checks for fixing this regression. [DH: Note that cachefiles never actually used readpages, so this check was never actually necessary] BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208883 BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1175245 Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-19Merge tag 'nfsd-5.11-2' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Avoid exposing parent of root directory in NFSv3 READDIRPLUS results - Fix a tracepoint change that went in the initial 5.11 merge * tag 'nfsd-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Move the svc_xdr_recvfrom tracepoint again nfsd4: readdirplus shouldn't return parent of export
2021-01-18btrfs: don't clear ret in btrfs_start_dirty_block_groupsJosef Bacik1-1/+2
If we fail to update a block group item in the loop we'll break, however we'll do btrfs_run_delayed_refs and lose our error value in ret, and thus not clean up properly. Fix this by only running the delayed refs if there was no failure. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-18btrfs: fix lockdep splat in btrfs_recover_relocationJosef Bacik1-0/+2
While testing the error paths of relocation I hit the following lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 5.10.0-rc6+ #217 Not tainted ------------------------------------------------------ mount/779 is trying to acquire lock: ffffa0e676945418 (&fs_info->balance_mutex){+.+.}-{3:3}, at: btrfs_recover_balance+0x2f0/0x340 but task is already holding lock: ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (btrfs-root-00){++++}-{3:3}: down_read_nested+0x43/0x130 __btrfs_tree_read_lock+0x27/0x100 btrfs_read_lock_root_node+0x31/0x40 btrfs_search_slot+0x462/0x8f0 btrfs_update_root+0x55/0x2b0 btrfs_drop_snapshot+0x398/0x750 clean_dirty_subvols+0xdf/0x120 btrfs_recover_relocation+0x534/0x5a0 btrfs_start_pre_rw_mount+0xcb/0x170 open_ctree+0x151f/0x1726 btrfs_mount_root.cold+0x12/0xea legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 path_mount+0x433/0xc10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (sb_internal#2){.+.+}-{0:0}: start_transaction+0x444/0x700 insert_balance_item.isra.0+0x37/0x320 btrfs_balance+0x354/0xf40 btrfs_ioctl_balance+0x2cf/0x380 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_info->balance_mutex){+.+.}-{3:3}: __lock_acquire+0x1120/0x1e10 lock_acquire+0x116/0x370 __mutex_lock+0x7e/0x7b0 btrfs_recover_balance+0x2f0/0x340 open_ctree+0x1095/0x1726 btrfs_mount_root.cold+0x12/0xea legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 path_mount+0x433/0xc10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &fs_info->balance_mutex --> sb_internal#2 --> btrfs-root-00 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-root-00); lock(sb_internal#2); lock(btrfs-root-00); lock(&fs_info->balance_mutex); *** DEADLOCK *** 2 locks held by mount/779: #0: ffffa0e60dc040e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380 #1: ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100 stack backtrace: CPU: 0 PID: 779 Comm: mount Not tainted 5.10.0-rc6+ #217 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x8b/0xb0 check_noncircular+0xcf/0xf0 ? trace_call_bpf+0x139/0x260 __lock_acquire+0x1120/0x1e10 lock_acquire+0x116/0x370 ? btrfs_recover_balance+0x2f0/0x340 __mutex_lock+0x7e/0x7b0 ? btrfs_recover_balance+0x2f0/0x340 ? btrfs_recover_balance+0x2f0/0x340 ? rcu_read_lock_sched_held+0x3f/0x80 ? kmem_cache_alloc_trace+0x2c4/0x2f0 ? btrfs_get_64+0x5e/0x100 btrfs_recover_balance+0x2f0/0x340 open_ctree+0x1095/0x1726 btrfs_mount_root.cold+0x12/0xea ? rcu_read_lock_sched_held+0x3f/0x80 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 ? __kmalloc_track_caller+0x2f2/0x320 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 ? capable+0x3a/0x60 path_mount+0x433/0xc10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is straightforward to fix, simply release the path before we setup the balance_ctl. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-18btrfs: do not double free backref nodes on errorJosef Bacik1-1/+1
Zygo reported the following KASAN splat: BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420 Read of size 8 at addr ffff888112402950 by task btrfs/28836 CPU: 0 PID: 28836 Comm: btrfs Tainted: G W 5.10.0-e35f27394290-for-next+ #23 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: dump_stack+0xbc/0xf9 ? btrfs_backref_cleanup_node+0x18a/0x420 print_address_description.constprop.8+0x21/0x210 ? record_print_text.cold.34+0x11/0x11 ? btrfs_backref_cleanup_node+0x18a/0x420 ? btrfs_backref_cleanup_node+0x18a/0x420 kasan_report.cold.10+0x20/0x37 ? btrfs_backref_cleanup_node+0x18a/0x420 __asan_load8+0x69/0x90 btrfs_backref_cleanup_node+0x18a/0x420 btrfs_backref_release_cache+0x83/0x1b0 relocate_block_group+0x394/0x780 ? merge_reloc_roots+0x4a0/0x4a0 btrfs_relocate_block_group+0x26e/0x4c0 btrfs_relocate_chunk+0x52/0x120 btrfs_balance+0xe2e/0x1900 ? check_flags.part.50+0x6c/0x1e0 ? btrfs_relocate_chunk+0x120/0x120 ? kmem_cache_alloc_trace+0xa06/0xcb0 ? _copy_from_user+0x83/0xc0 btrfs_ioctl_balance+0x3a7/0x460 btrfs_ioctl+0x24c8/0x4360 ? __kasan_check_read+0x11/0x20 ? check_chain_key+0x1f4/0x2f0 ? __asan_loadN+0xf/0x20 ? btrfs_ioctl_get_supported_features+0x30/0x30 ? kvm_sched_clock_read+0x18/0x30 ? check_chain_key+0x1f4/0x2f0 ? lock_downgrade+0x3f0/0x3f0 ? handle_mm_fault+0xad6/0x2150 ? do_vfs_ioctl+0xfc/0x9d0 ? ioctl_file_clone+0xe0/0xe0 ? check_flags.part.50+0x6c/0x1e0 ? check_flags.part.50+0x6c/0x1e0 ? check_flags+0x26/0x30 ? lock_is_held_type+0xc3/0xf0 ? syscall_enter_from_user_mode+0x1b/0x60 ? do_syscall_64+0x13/0x80 ? rcu_read_lock_sched_held+0xa1/0xd0 ? __kasan_check_read+0x11/0x20 ? __fget_light+0xae/0x110 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f4c4bdfe427 Allocated by task 28836: kasan_save_stack+0x21/0x50 __kasan_kmalloc.constprop.18+0xbe/0xd0 kasan_kmalloc+0x9/0x10 kmem_cache_alloc_trace+0x410/0xcb0 btrfs_backref_alloc_node+0x46/0xf0 btrfs_backref_add_tree_node+0x60d/0x11d0 build_backref_tree+0xc5/0x700 relocate_tree_blocks+0x2be/0xb90 relocate_block_group+0x2eb/0x780 btrfs_relocate_block_group+0x26e/0x4c0 btrfs_relocate_chunk+0x52/0x120 btrfs_balance+0xe2e/0x1900 btrfs_ioctl_balance+0x3a7/0x460 btrfs_ioctl+0x24c8/0x4360 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 28836: kasan_save_stack+0x21/0x50 kasan_set_track+0x20/0x30 kasan_set_free_info+0x1f/0x30 __kasan_slab_free+0xf3/0x140 kasan_slab_free+0xe/0x10 kfree+0xde/0x200 btrfs_backref_error_cleanup+0x452/0x530 build_backref_tree+0x1a5/0x700 relocate_tree_blocks+0x2be/0xb90 relocate_block_group+0x2eb/0x780 btrfs_relocate_block_group+0x26e/0x4c0 btrfs_relocate_chunk+0x52/0x120 btrfs_balance+0xe2e/0x1900 btrfs_ioctl_balance+0x3a7/0x460 btrfs_ioctl+0x24c8/0x4360 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This occurred because we freed our backref node in btrfs_backref_error_cleanup(), but then tried to free it again in btrfs_backref_release_cache(). This is because btrfs_backref_release_cache() will cycle through all of the cache->leaves nodes and free them up. However btrfs_backref_error_cleanup() freed the backref node with btrfs_backref_free_node(), which simply kfree()d the backref node without unlinking it from the cache. Change this to a btrfs_backref_drop_node(), which does the appropriate cleanup and removes the node from the cache->leaves list, so when we go to free the remaining cache we don't trip over items we've already dropped. Fixes: 75bfb9aff45e ("Btrfs: cleanup error handling in build_backref_tree") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-18btrfs: don't get an EINTR during drop_snapshot for relocJosef Bacik1-1/+9
This was partially fixed by f3e3d9cc3525 ("btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree"), however it missed a spot when we restart a trans handle because we need to end the transaction. The fix is the same, simply use btrfs_join_transaction() instead of btrfs_start_transaction() when deleting reloc roots. Fixes: f3e3d9cc3525 ("btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-18udf: fix the problem that the disc content is not displayedlianzhi chang1-3/+4
When the capacity of the disc is too large (assuming the 4.7G specification), the disc (UDF file system) will be burned multiple times in the windows (Multisession Usage). When the remaining capacity of the CD is less than 300M (estimated value, for reference only), open the CD in the Linux system, the content of the CD is displayed as blank (the kernel will say "No VRS found"). Windows can display the contents of the CD normally. Through analysis, in the "fs/udf/super.c": udf_check_vsd function, the actual value of VSD_MAX_SECTOR_OFFSET may be much larger than 0x800000. According to the current code logic, it is found that the type of sbi->s_session is "__s32", when the remaining capacity of the disc is less than 300M (take a set of test values: sector=3154903040, sbi->s_session=1540464, sb->s_blocksize_bits=11 ), the calculation result of "sbi->s_session << sb->s_blocksize_bits" will overflow. Therefore, it is necessary to convert the type of s_session to "loff_t" (when udf_check_vsd starts, assign a value to _sector, which is also converted in this way), so that the result will not overflow, and then the content of the disc can be displayed normally. Link: https://lore.kernel.org/r/20210114075741.30448-1-changlianzhi@uniontech.com Signed-off-by: lianzhi chang <changlianzhi@uniontech.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-01-17fs/cifs: Simplify bool comparison.Jiapeng Zhong1-1/+1
Fix the follow warnings: ./fs/cifs/connect.c: WARNING: Comparison of 0/1 to bool variable Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-01-17fs/cifs: Assign boolean values to a bool variableJiapeng Zhong1-1/+1
Fix the following coccicheck warnings: ./fs/cifs/connect.c:3386:2-21: WARNING: Assignment of 0/1 to bool variable. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-01-17Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-2/+5
Pull misc vfs fixes from Al Viro: "Several assorted fixes. I still think that audit ->d_name race is better fixed this way for the benefit of backports, with any possibly fancier variants done on top of it" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: dump_common_audit_data(): fix racy accesses to ->d_name iov_iter: fix the uaccess area in copy_compat_iovec_from_user umount(2): move the flag validity checks first
2021-01-16Merge tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-blockLinus Torvalds1-5/+41
Pull io_uring fixes from Jens Axboe: "We still have a pending fix for a cancelation issue, but it's still being investigated. In the meantime: - Dead mm handling fix (Pavel) - SQPOLL setup error handling (Pavel) - Flush timeout sequence fix (Marcelo) - Missing finish_wait() for one exit case" * tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block: io_uring: ensure finish_wait() is always called in __io_uring_task_cancel() io_uring: flush timeouts that should already have expired io_uring: do sqo disable on install_fd error io_uring: fix null-deref in io_disable_sqo_submit io_uring: don't take files/mm for a dead task io_uring: drop mm and files after task_work_run
2021-01-16mm: don't play games with pinned pages in clear_page_refsLinus Torvalds1-0/+21
Turning a pinned page read-only breaks the pinning after COW. Don't do it. The whole "track page soft dirty" state doesn't work with pinned pages anyway, since the page might be dirtied by the pinning entity without ever being noticed in the page tables. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-16mm: fix clear_refs_write lockingLinus Torvalds1-23/+9
Turning page table entries read-only requires the mmap_sem held for writing. So stop doing the odd games with turning things from read locks to write locks and back. Just get the write lock. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-15io_uring: ensure finish_wait() is always called in __io_uring_task_cancel()Jens Axboe1-0/+1
If we enter with requests pending and performm cancelations, we'll have a different inflight count before and after calling prepare_to_wait(). This causes the loop to restart. If we actually ended up canceling everything, or everything completed in-between, then we'll break out of the loop without calling finish_wait() on the waitqueue. This can trigger a warning on exit_signals(), as we leave the task state in TASK_UNINTERRUPTIBLE. Put a finish_wait() after the loop to catch that case. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-15Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds10-129/+186
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A number of bug fixes for ext4: - Fix for the new fast_commit feature - Fix some error handling codepaths in whiteout handling and mountpoint sampling - Fix how we write ext4_error information so it goes through the journal when journalling is active, to avoid races that can lead to lost error information, superblock checksum failures, or DIF/DIX features" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: remove expensive flush on fast commit ext4: fix bug for rename with RENAME_WHITEOUT ext4: fix wrong list_splice in ext4_fc_cleanup ext4: use IS_ERR instead of IS_ERR_OR_NULL and set inode null when IS_ERR ext4: don't leak old mountpoint samples ext4: drop ext4_handle_dirty_super() ext4: fix superblock checksum failure when setting password salt ext4: use sbi instead of EXT4_SB(sb) in ext4_update_super() ext4: save error info to sb through journal if available ext4: protect superblock modifications with a buffer lock ext4: drop sync argument of ext4_commit_super() ext4: combine ext4_handle_error() and save_error_info()
2021-01-15Merge tag '5.11-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds5-7/+6
Pull cifs fixes from Steve French: "Two small cifs fixes for stable (including an important handle leak fix) and three small cleanup patches" * tag '5.11-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: style: replace one-element array with flexible-array cifs: connect: style: Simplify bool comparison fs: cifs: remove unneeded variable in smb3_fs_context_dup cifs: fix interrupted close commands cifs: check pointer before freeing
2021-01-15ext4: remove expensive flush on fast commitDaejun Park1-5/+5
In the fast commit, it adds REQ_FUA and REQ_PREFLUSH on each fast commit block when barrier is enabled. However, in recovery phase, ext4 compares CRC value in the tail. So it is sufficient to add REQ_FUA and REQ_PREFLUSH on the block that has tail. Signed-off-by: Daejun Park <daejun7.park@samsung.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20210106013242epcms2p5b6b4ed8ca86f29456fdf56aa580e74b4@epcms2p5 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-01-15ext4: fix bug for rename with RENAME_WHITEOUTyangerkun1-8/+9
We got a "deleted inode referenced" warning cross our fsstress test. The bug can be reproduced easily with following steps: cd /dev/shm mkdir test/ fallocate -l 128M img mkfs.ext4 -b 1024 img mount img test/ dd if=/dev/zero of=test/foo bs=1M count=128 mkdir test/dir/ && cd test/dir/ for ((i=0;i<1000;i++)); do touch file$i; done # consume all block cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD, /dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in ext4_rename will return ENOSPC!! cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1 We will get the output: "ls: cannot access 'test/dir/file1': Structure needs cleaning" and the dmesg show: "EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls: deleted inode referenced: 139" ext4_rename will create a special inode for whiteout and use this 'ino' to replace the source file's dir entry 'ino'. Once error happens latter(the error above was the ENOSPC return from ext4_add_entry in ext4_rename since all space has been consumed), the cleanup do drop the nlink for whiteout, but forget to restore 'ino' with source file. This will trigger the bug describle as above. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Fixes: cd808deced43 ("ext4: support RENAME_WHITEOUT") Link: https://lore.kernel.org/r/20210105062857.3566-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-01-15ext4: fix wrong list_splice in ext4_fc_cleanupDaejun Park1-1/+1
After full/fast commit, entries in staging queue are promoted to main queue. In ext4_fs_cleanup function, it splice to staging queue to staging queue. Fixes: aa75f4d3daaeb ("ext4: main fast-commit commit path") Signed-off-by: Daejun Park <daejun7.park@samsung.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201230094851epcms2p6eeead8cc984379b37b2efd21af90fd1a@epcms2p6 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2021-01-15ext4: use IS_ERR instead of IS_ERR_OR_NULL and set inode null when IS_ERRYi Li1-11/+12
1: ext4_iget/ext4_find_extent never returns NULL, use IS_ERR instead of IS_ERR_OR_NULL to fix this. 2: ext4_fc_replay_inode should set the inode to NULL when IS_ERR. and go to call iput properly. Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Yi Li <yili@winhong.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201230033827.3996064-1-yili@winhong.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2021-01-15io_uring: flush timeouts that should already have expiredMarcelo Diop-Gonzalez1-4/+30
Right now io_flush_timeouts() checks if the current number of events is equal to ->timeout.target_seq, but this will miss some timeouts if there have been more than 1 event added since the last time they were flushed (possible in io_submit_flush_completions(), for example). Fix it by recording the last sequence at which timeouts were flushed so that the number of events seen can be compared to the number of events needed without overflow. Signed-off-by: Marcelo Diop-Gonzalez <marcelo827@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-13cifs: style: replace one-element array with flexible-arrayYANG LI1-1/+1
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use "flexible array members"[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9/process/ deprecated.html#zero-length-and-one-element-arrays Signed-off-by: YANG LI <abaci-bugfix@linux.alibaba.com> Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-01-13cifs: connect: style: Simplify bool comparisonYANG LI1-1/+1
Fix the following coccicheck warning: ./fs/cifs/connect.c:3740:6-21: WARNING: Comparison of 0/1 to bool variable Signed-off-by: YANG LI <abaci-bugfix@linux.alibaba.com> Reported-by: Abaci Robot<abaci@linux.alibaba.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-01-13fs: cifs: remove unneeded variable in smb3_fs_context_dupMenglong Dong1-3/+1
'rc' in smb3_fs_context_dup is not used and can be removed. Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-01-13cifs: fix interrupted close commandsPaulo Alcantara1-1/+1
Retry close command if it gets interrupted to not leak open handles on the server. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reported-by: Duncan Findlay <duncf@duncf.ca> Suggested-by: Pavel Shilovsky <pshilov@microsoft.com> Fixes: 6988a619f5b7 ("cifs: allow syscalls to be restarted in __smb_send_rqst()") Cc: stable@vger.kernel.org Reviewd-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-01-13cifs: check pointer before freeingTom Rix1-1/+2
clang static analysis reports this problem dfs_cache.c:591:2: warning: Argument to kfree() is a constant address (18446744073709551614), which is not memory allocated by malloc() kfree(vi); ^~~~~~~~~ In dfs_cache_del_vol() the volume info pointer 'vi' being freed is the return of a call to find_vol(). The large constant address is find_vol() returning an error. Add an error check to dfs_cache_del_vol() similar to the one done in dfs_cache_update_vol(). Fixes: 54be1f6c1c37 ("cifs: Add DFS cache routines") Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> CC: <stable@vger.kernel.org> # v5.0+ Signed-off-by: Steve French <stfrench@microsoft.com>
2021-01-13fs: fix lazytime expiration handling in __writeback_single_inode()Eric Biggers1-11/+13
When lazytime is enabled and an inode is being written due to its in-memory updated timestamps having expired, either due to a sync() or syncfs() system call or due to dirtytime_expire_interval having elapsed, the VFS needs to inform the filesystem so that the filesystem can copy the inode's timestamps out to the on-disk data structures. This is done by __writeback_single_inode() calling mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC). However, this occurs after __writeback_single_inode() has already cleared the dirty flags from ->i_state. This causes two bugs: - mark_inode_dirty_sync() redirties the inode, causing it to remain dirty. This wastefully causes the inode to be written twice. But more importantly, it breaks cases where sync_filesystem() is expected to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY ioctl (as reported at https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well as possibly filesystem freezing (freeze_super()). - Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is called from __writeback_single_inode() for lazytime expiration, xfs_fs_dirty_inode() ignores the notification. (XFS only cares about lazytime expirations, and it assumes that i_state will contain I_DIRTY_TIME during those.) Therefore, lazy timestamps aren't persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS. Fix this by moving the call to mark_inode_dirty_sync() to earlier in __writeback_single_inode(), before the dirty flags are cleared from i_state. This makes filesystems be properly notified of the timestamp expiration, and it avoids incorrectly redirtying the inode. This fixes xfstest generic/580 (which tests FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime enabled. It also fixes the new lazytime xfstest I've proposed, which reproduces the above-mentioned XFS bug (https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org). Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the right thing to do because mark_inode_dirty_sync() now knows not to move the inode to a writeback list if it is currently queued for sync. Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option") Cc: stable@vger.kernel.org Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback") Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.org Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-01-13io_uring: do sqo disable on install_fd errorPavel Begunkov1-0/+1
WARNING: CPU: 0 PID: 8494 at fs/io_uring.c:8717 io_ring_ctx_wait_and_kill+0x4f2/0x600 fs/io_uring.c:8717 Call Trace: io_uring_release+0x3e/0x50 fs/io_uring.c:8759 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 failed io_uring_install_fd() is a special case, we don't do io_ring_ctx_wait_and_kill() directly but defer it to fput, though still need to io_disable_sqo_submit() before. note: it doesn't fix any real problem, just a warning. That's because sqring won't be available to the userspace in this case and so SQPOLL won't submit anything. Reported-by: syzbot+9c9c35374c0ecac06516@syzkaller.appspotmail.com Fixes: d9d05217cb69 ("io_uring: stop SQPOLL submit on creator's death") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-13io_uring: fix null-deref in io_disable_sqo_submitPavel Begunkov1-1/+2
general protection fault, probably for non-canonical address 0xdffffc0000000022: 0000 [#1] KASAN: null-ptr-deref in range [0x0000000000000110-0x0000000000000117] RIP: 0010:io_ring_set_wakeup_flag fs/io_uring.c:6929 [inline] RIP: 0010:io_disable_sqo_submit+0xdb/0x130 fs/io_uring.c:8891 Call Trace: io_uring_create fs/io_uring.c:9711 [inline] io_uring_setup+0x12b1/0x38e0 fs/io_uring.c:9739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 io_disable_sqo_submit() might be called before user rings were allocated, don't do io_ring_set_wakeup_flag() in those cases. Reported-by: syzbot+ab412638aeb652ded540@syzkaller.appspotmail.com Fixes: d9d05217cb69 ("io_uring: stop SQPOLL submit on creator's death") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-12Merge tag 'nfs-for-5.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds7-81/+98
Pull NFS client fixes from Trond Myklebust: "Highlights include: - Fix parsing of link-local IPv6 addresses - Fix confusing logging of mount errors that was introduced by the fsopen() patchset. - Fix a tracing use after free in _nfs4_do_setlk() - Layout return-on-close fixes when called from nfs4_evict_inode() - Layout segments were being leaked in pnfs_generic_clear_request_commit() - Don't leak DS commits in pnfs_generic_retry_commit() - Fix an Oopsable use-after-free when nfs_delegation_find_inode_server() calls iput() on an inode after the super block has gone away" * tag 'nfs-for-5.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: nfs_igrab_and_active must first reference the superblock NFS: nfs_delegation_find_inode_server must first reference the superblock NFS/pNFS: Fix a leak of the layout 'plh_outstanding' counter NFS/pNFS: Don't leak DS commits in pnfs_generic_retry_commit() NFS/pNFS: Don't call pnfs_free_bucket_lseg() before removing the request pNFS: Stricter ordering of layoutget and layoutreturn pNFS: Clean up pnfs_layoutreturn_free_lsegs() pNFS: We want return-on-close to complete when evicting the inode pNFS: Mark layout for return if return-on-close was not sent net: sunrpc: interpret the return value of kstrtou32 correctly NFS: Adjust fs_context error logging NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lock
2021-01-12btrfs: send: fix invalid clone operations when cloning from the same file ↵Filipe Manana1-0/+15
and root When an incremental send finds an extent that is shared, it checks which file extent items in the range refer to that extent, and for those it emits clone operations, while for others it emits regular write operations to avoid corruption at the destination (as described and fixed by commit d906d49fc5f4 ("Btrfs: send, fix file corruption due to incorrect cloning operations")). However when the root we are cloning from is the send root, we are cloning from the inode currently being processed and the source file range has several extent items that partially point to the desired extent, with an offset smaller than the offset in the file extent item for the range we want to clone into, it can cause the algorithm to issue a clone operation that starts at the current eof of the file being processed in the receiver side, in which case the receiver will fail, with EINVAL, when attempting to execute the clone operation. Example reproducer: $ cat test-send-clone.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV >/dev/null mount $DEV $MNT # Create our test file with a single and large extent (1M) and with # different content for different file ranges that will be reflinked # later. xfs_io -f \ -c "pwrite -S 0xab 0 128K" \ -c "pwrite -S 0xcd 128K 128K" \ -c "pwrite -S 0xef 256K 256K" \ -c "pwrite -S 0x1a 512K 512K" \ $MNT/foobar btrfs subvolume snapshot -r $MNT $MNT/snap1 btrfs send -f /tmp/snap1.send $MNT/snap1 # Now do a series of changes to our file such that we end up with # different parts of the extent reflinked into different file offsets # and we overwrite a large part of the extent too, so no file extent # items refer to that part that was overwritten. This used to confuse # the algorithm used by the kernel to figure out which file ranges to # clone, making it attempt to clone from a source range starting at # the current eof of the file, resulting in the receiver to fail since # it is an invalid clone operation. # xfs_io -c "reflink $MNT/foobar 64K 1M 960K" \ -c "reflink $MNT/foobar 0K 512K 256K" \ -c "reflink $MNT/foobar 512K 128K 256K" \ -c "pwrite -S 0x73 384K 640K" \ $MNT/foobar btrfs subvolume snapshot -r $MNT $MNT/snap2 btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2 echo -e "\nFile digest in the original filesystem:" md5sum $MNT/snap2/foobar # Now unmount the filesystem, create a new one, mount it and try to # apply both send streams to recreate both snapshots. umount $DEV mkfs.btrfs -f $DEV >/dev/null mount $DEV $MNT btrfs receive -f /tmp/snap1.send $MNT btrfs receive -f /tmp/snap2.send $MNT # Must match what we got in the original filesystem of course. echo -e "\nFile digest in the new filesystem:" md5sum $MNT/snap2/foobar umount $MNT When running the reproducer, the incremental send operation fails due to an invalid clone operation: $ ./test-send-clone.sh wrote 131072/131072 bytes at offset 0 128 KiB, 32 ops; 0.0015 sec (80.906 MiB/sec and 20711.9741 ops/sec) wrote 131072/131072 bytes at offset 131072 128 KiB, 32 ops; 0.0013 sec (90.514 MiB/sec and 23171.6148 ops/sec) wrote 262144/262144 bytes at offset 262144 256 KiB, 64 ops; 0.0025 sec (98.270 MiB/sec and 25157.2327 ops/sec) wrote 524288/524288 bytes at offset 524288 512 KiB, 128 ops; 0.0052 sec (95.730 MiB/sec and 24506.9883 ops/sec) Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1' At subvol /mnt/sdi/snap1 linked 983040/983040 bytes at offset 1048576 960 KiB, 1 ops; 0.0006 sec (1.419 GiB/sec and 1550.3876 ops/sec) linked 262144/262144 bytes at offset 524288 256 KiB, 1 ops; 0.0020 sec (120.192 MiB/sec and 480.7692 ops/sec) linked 262144/262144 bytes at offset 131072 256 KiB, 1 ops; 0.0018 sec (133.833 MiB/sec and 535.3319 ops/sec) wrote 655360/655360 bytes at offset 393216 640 KiB, 160 ops; 0.0093 sec (66.781 MiB/sec and 17095.8436 ops/sec) Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2' At subvol /mnt/sdi/snap2 File digest in the original filesystem: 9c13c61cb0b9f5abf45344375cb04dfa /mnt/sdi/snap2/foobar At subvol snap1 At snapshot snap2 ERROR: failed to clone extents to foobar: Invalid argument File digest in the new filesystem: 132f0396da8f48d2e667196bff882cfc /mnt/sdi/snap2/foobar The clone operation is invalid because its source range starts at the current eof of the file in the receiver, causing the receiver to get an EINVAL error from the clone operation when attempting it. For the example above, what happens is the following: 1) When processing the extent at file offset 1M, the algorithm checks that the extent is shared and can be (fully or partially) found at file offset 0. At this point the file has a size (and eof) of 1M at the receiver; 2) It finds that our extent item at file offset 1M has a data offset of 64K and, since the file extent item at file offset 0 has a data offset of 0, it issues a clone operation, from the same file and root, that has a source range offset of 64K, destination offset of 1M and a length of 64K, since the extent item at file offset 0 refers only to the first 128K of the shared extent. After this clone operation, the file size (and eof) at the receiver is increased from 1M to 1088K (1M + 64K); 3) Now there's still 896K (960K - 64K) of data left to clone or write, so it checks for the next file extent item, which starts at file offset 128K. This file extent item has a data offset of 0 and a length of 256K, so a clone operation with a source range offset of 256K, a destination offset of 1088K (1M + 64K) and length of 128K is issued. After this operation the file size (and eof) at the receiver increases from 1088K to 1216K (1088K + 128K); 4) Now there's still 768K (896K - 128K) of data left to clone or write, so it checks for the next file extent item, located at file offset 384K. This file extent item points to a different extent, not the one we want to clone, with a length of 640K. So we issue a write operation into the file range 1216K (1088K + 128K, end of the last clone operation), with a length of 640K and with a data matching the one we can find for that range in send root. After this operation, the file size (and eof) at the receiver increases from 1216K to 1856K (1216K + 640K); 5) Now there's still 128K (768K - 640K) of data left to clone or write, so we look into the file extent item, which is for file offset 1M and it points to the extent we want to clone, with a data offset of 64K and a length of 960K. However this matches the file offset we started with, the start of the range to clone into. So we can't for sure find any file extent item from here onwards with the rest of the data we want to clone, yet we proceed and since the file extent item points to the shared extent, with a data offset of 64K, we issue a clone operation with a source range starting at file offset 1856K, which matches the file extent item's offset, 1M, plus the amount of data cloned and written so far, which is 64K (step 2) + 128K (step 3) + 640K (step 4). This clone operation is invalid since the source range offset matches the current eof of the file in the receiver. We should have stopped looking for extents to clone at this point and instead fallback to write, which would simply the contain the data in the file range from 1856K to 1856K + 128K. So fix this by stopping the loop that looks for file ranges to clone at clone_range() when we reach the current eof of the file being processed, if we are cloning from the same file and using the send root as the clone root. This ensures any data not yet cloned will be sent to the receiver through a write operation. A test case for fstests will follow soon. Reported-by: Massimo B. <massimo.b@gmx.net> Link: https://lore.kernel.org/linux-btrfs/6ae34776e85912960a253a8327068a892998e685.camel@gmx.net/ Fixes: 11f2069c113e ("Btrfs: send, allow clone operations within the same file") CC: stable@vger.kernel.org # 5.5+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-12btrfs: no need to run delayed refs after commit_fs_roots during commitDavid Sterba1-8/+0
The inode number cache has been removed in this dev cycle, there's one more leftover. We don't need to run the delayed refs again after commit_fs_roots as stated in the comment, because btrfs_save_ino_cache is no more since 5297199a8bca ("btrfs: remove inode number cache feature"). Nothing else between commit_fs_roots and btrfs_qgroup_account_extents could create new delayed refs so the qgroup consistency should be safe. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-12nfsd4: readdirplus shouldn't return parent of exportJ. Bruce Fields1-1/+6
If you export a subdirectory of a filesystem, a READDIRPLUS on the root of that export will return the filehandle of the parent with the ".." entry. The filehandle is optional, so let's just not return the filehandle for ".." if we're at the root of an export. Note that once the client learns one filehandle outside of the export, they can trivially access the rest of the export using further lookups. However, it is also not very difficult to guess filehandles outside of the export. So exporting a subdirectory of a filesystem should considered equivalent to providing access to the entire filesystem. To avoid confusion, we recommend only exporting entire filesystems. Reported-by: Youjipeng <wangzhibei1999@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-11Merge tag 'for-5.11-rc3-tag' of ↵Linus Torvalds8-29/+67
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "More material for stable trees. - tree-checker: check item end overflow - fix false warning during relocation regarding extent type - fix inode flushing logic, caused notable performance regression (since 5.10) - debugging fixups: - print correct offset for reloc tree key - pass reliable fs_info pointer to error reporting helper" * tag 'for-5.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: shrink delalloc pages instead of full inodes btrfs: reloc: fix wrong file extent type check to avoid false ENOENT btrfs: tree-checker: check if chunk item end overflows btrfs: prevent NULL pointer dereference in extent_io_tree_panic btrfs: print the actual offset in btrfs_root_name
2021-01-11Merge tag 'nfsd-5.11-1' of git://git.linux-nfs.org/projects/cel/cel-2.6Linus Torvalds4-27/+41
Pull nfsd fixes from Chuck Lever: - Fix major TCP performance regression - Get NFSv4.2 READ_PLUS regression tests to pass - Improve NFSv4 COMPOUND memory allocation - Fix sparse warning * tag 'nfsd-5.11-1' of git://git.linux-nfs.org/projects/cel/cel-2.6: NFSD: Restore NFSv4 decoding's SAVEMEM functionality SUNRPC: Handle TCP socket sends with kernel_sendpage() again NFSD: Fix sparse warning in nfssvc.c nfsd: Don't set eof on a truncated READ_PLUS nfsd: Fixes for nfsd4_encode_read_plus_data()
2021-01-11io_uring: don't take files/mm for a dead taskPavel Begunkov1-0/+5
In rare cases a task may be exiting while io_ring_exit_work() trying to cancel/wait its requests. It's ok for __io_sq_thread_acquire_mm() because of SQPOLL check, but is not for __io_sq_thread_acquire_files(). Play safe and fail for both of them. Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-11io_uring: drop mm and files after task_work_runPavel Begunkov1-0/+2
__io_req_task_submit() run by task_work can set mm and files, but io_sq_thread() in some cases, and because __io_sq_thread_acquire_mm() and __io_sq_thread_acquire_files() do a simple current->mm/files check it may end up submitting IO with mm/files of another task. We also need to drop it after in the end to drop potentially grabbed references to them. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-10NFS: nfs_igrab_and_active must first reference the superblockTrond Myklebust1-5/+7
Before referencing the inode, we must ensure that the superblock can be referenced. Otherwise, we can end up with iput() calling superblock operations that are no longer valid or accessible. Fixes: ea7c38fef0b7 ("NFSv4: Ensure we reference the inode for return-on-close in delegreturn") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10NFS: nfs_delegation_find_inode_server must first reference the superblockTrond Myklebust1-5/+7
Before referencing the inode, we must ensure that the superblock can be referenced. Otherwise, we can end up with iput() calling superblock operations that are no longer valid or accessible. Fixes: e39d8a186ed0 ("NFSv4: Fix an Oops during delegation callbacks") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>