summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2022-08-09Merge tag 'fscache-fixes-20220809' of ↵Linus Torvalds1-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache updates from David Howells: - Fix a cookie access ref leak if a cookie is invalidated a second time before the first invalidation is actually processed. - Add a tracepoint to log cookie lookup failure * tag 'fscache-fixes-20220809' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache: add tracepoint when failing cookie fscache: don't leak cookie access refs if invalidation is in progress or failed
2022-08-09Merge tag 'afs-fixes-20220802' of ↵Linus Torvalds8-98/+106
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: "Fix AFS refcount handling. The first patch converts afs to use refcount_t for its refcounts and the second patch fixes afs_put_call() and afs_put_server() to save the values they're going to log in the tracepoint before decrementing the refcount" * tag 'afs-fixes-20220802' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix access after dec in put functions afs: Use refcount_t rather than atomic_t
2022-08-09Merge tag 'fs.setgid.v6.0' of ↵Linus Torvalds4-19/+100
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull setgid updates from Christian Brauner: "This contains the work to move setgid stripping out of individual filesystems and into the VFS itself. Creating files that have both the S_IXGRP and S_ISGID bit raised in directories that themselves have the S_ISGID bit set requires additional privileges to avoid security issues. When a filesystem creates a new inode it needs to take care that the caller is either in the group of the newly created inode or they have CAP_FSETID in their current user namespace and are privileged over the parent directory of the new inode. If any of these two conditions is true then the S_ISGID bit can be raised for an S_IXGRP file and if not it needs to be stripped. However, there are several key issues with the current implementation: - S_ISGID stripping logic is entangled with umask stripping. For example, if the umask removes the S_IXGRP bit from the file about to be created then the S_ISGID bit will be kept. The inode_init_owner() helper is responsible for S_ISGID stripping and is called before posix_acl_create(). So we can end up with two different orderings: 1. FS without POSIX ACL support First strip umask then strip S_ISGID in inode_init_owner(). In other words, if a filesystem doesn't support or enable POSIX ACLs then umask stripping is done directly in the vfs before calling into the filesystem: 2. FS with POSIX ACL support First strip S_ISGID in inode_init_owner() then strip umask in posix_acl_create(). In other words, if the filesystem does support POSIX ACLs then unmask stripping may be done in the filesystem itself when calling posix_acl_create(). Note that technically filesystems are free to impose their own ordering between posix_acl_create() and inode_init_owner() meaning that there's additional ordering issues that influence S_ISGID inheritance. (Note that the commit message of commit 1639a49ccdce ("fs: move S_ISGID stripping into the vfs_*() helpers") gets the ordering between inode_init_owner() and posix_acl_create() the wrong way around. I realized this too late.) - Filesystems that don't rely on inode_init_owner() don't get S_ISGID stripping logic. While that may be intentional (e.g. network filesystems might just defer setgid stripping to a server) it is often just a security issue. Note that mandating the use of inode_init_owner() was proposed as an alternative solution but that wouldn't fix the ordering issues and there are examples such as afs where the use of inode_init_owner() isn't possible. In any case, we should also try the cleaner and generalized solution first before resorting to this approach. - We still have S_ISGID inheritance bugs years after the initial round of S_ISGID inheritance fixes: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") All of this led us to conclude that the current state is too messy. While we won't be able to make it completely clean as posix_acl_create() is still a filesystem specific call we can improve the S_SIGD stripping situation quite a bit by hoisting it out of inode_init_owner() and into the respective vfs creation operations. The obvious advantage is that we don't need to rely on individual filesystems getting S_ISGID stripping right and instead can standardize the ordering between S_ISGID and umask stripping directly in the VFS. A few short implementation notes: - The stripping logic needs to happen in vfs_*() helpers for the sake of stacking filesystems such as overlayfs that rely on these helpers taking care of S_ISGID stripping. - Security hooks have never seen the mode as it is ultimately seen by the filesystem because of the ordering issue we mentioned. Nothing is changed for them. We simply continue to strip the umask before passing the mode down to the security hooks. - The following filesystems use inode_init_owner() and thus relied on S_ISGID stripping: spufs, 9p, bfs, btrfs, ext2, ext4, f2fs, hfsplus, hugetlbfs, jfs, minix, nilfs2, ntfs3, ocfs2, omfs, overlayfs, ramfs, reiserfs, sysv, ubifs, udf, ufs, xfs, zonefs, bpf, tmpfs. We've audited all callchains as best as we could. More details can be found in the commit message to 1639a49ccdce ("fs: move S_ISGID stripping into the vfs_*() helpers")" * tag 'fs.setgid.v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: ceph: rely on vfs for setgid stripping fs: move S_ISGID stripping into the vfs_*() helpers fs: Add missing umask strip in vfs_tmpfile fs: add mode_strip_sgid() helper
2022-08-09fscache: add tracepoint when failing cookieJeff Layton1-0/+2
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com>
2022-08-09fscache: don't leak cookie access refs if invalidation is in progress or failedJeff Layton1-2/+5
It's possible for a request to invalidate a fscache_cookie will come in while we're already processing an invalidation. If that happens we currently take an extra access reference that will leak. Only call __fscache_begin_cookie_access if the FSCACHE_COOKIE_DO_INVALIDATE bit was previously clear. Also, ensure that we attempt to clear the bit when the cookie is "FAILED" and put the reference to avoid an access leak. Fixes: 85e4ea1049c7 ("fscache: Fix invalidation/lookup race") Suggested-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com>
2022-08-08Merge tag '5.20-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds19-220/+321
Pull ksmbd updates from Steve French: - fixes for memory access bugs (out of bounds access, oops, leak) - multichannel fixes - session disconnect performance improvement, and session register improvement - cleanup * tag '5.20-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix heap-based overflow in set_ntacl_dacl() ksmbd: prevent out of bound read for SMB2_TREE_CONNNECT ksmbd: prevent out of bound read for SMB2_WRITE ksmbd: fix use-after-free bug in smb2_tree_disconect ksmbd: fix memory leak in smb2_handle_negotiate ksmbd: fix racy issue while destroying session on multichannel ksmbd: use wait_event instead of schedule_timeout() ksmbd: fix kernel oops from idr_remove() ksmbd: add channel rwlock ksmbd: replace sessions list in connection with xarray MAINTAINERS: ksmbd: add entry for documentation ksmbd: remove unused ksmbd_share_configs_cleanup function
2022-08-08Merge tag 'pull-work.iov_iter-rebased' of ↵Linus Torvalds13-90/+48
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more iov_iter updates from Al Viro: - more new_sync_{read,write}() speedups - ITER_UBUF introduction - ITER_PIPE cleanups - unification of iov_iter_get_pages/iov_iter_get_pages_alloc and switching them to advancing semantics - making ITER_PIPE take high-order pages without splitting them - handling copy_page_from_iter() for high-order pages properly * tag 'pull-work.iov_iter-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits) fix copy_page_from_iter() for compound destinations hugetlbfs: copy_page_to_iter() can deal with compound pages copy_page_to_iter(): don't split high-order page in case of ITER_PIPE expand those iov_iter_advance()... pipe_get_pages(): switch to append_pipe() get rid of non-advancing variants ceph: switch the last caller of iov_iter_get_pages_alloc() 9p: convert to advancing variant of iov_iter_get_pages_alloc() af_alg_make_sg(): switch to advancing variant of iov_iter_get_pages() iter_to_pipe(): switch to advancing variant of iov_iter_get_pages() block: convert to advancing variants of iov_iter_get_pages{,_alloc}() iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() iov_iter: saner helper for page array allocation fold __pipe_get_pages() into pipe_get_pages() ITER_XARRAY: don't open-code DIV_ROUND_UP() unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts unify xarray_get_pages() and xarray_get_pages_alloc() unify pipe_get_pages() and pipe_get_pages_alloc() iov_iter_get_pages(): sanity-check arguments iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper ...
2022-08-08hugetlbfs: copy_page_to_iter() can deal with compound pagesAl Viro1-30/+1
... since April 2021 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08ceph: switch the last caller of iov_iter_get_pages_alloc()Al Viro1-1/+1
here nothing even looks at the iov_iter after the call, so we couldn't care less whether it advances or not. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08iter_to_pipe(): switch to advancing variant of iov_iter_get_pages()Al Viro1-23/+24
... and untangle the cleanup on failure to add into pipe. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()Al Viro7-18/+9
Most of the users immediately follow successful iov_iter_get_pages() with advancing by the amount it had returned. Provide inline wrappers doing that, convert trivial open-coded uses of those. BTW, iov_iter_get_pages() never returns more than it had been asked to; such checks in cifs ought to be removed someday... Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08splice: stop abusing iov_iter_advance() to flush a pipeAl Viro1-5/+2
Use pipe_discard_from() explicitly in generic_file_read_iter(); don't bother with rather non-obvious use of iov_iter_advance() in there. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08switch new_sync_{read,write}() to ITER_UBUFAl Viro1-4/+2
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08new iov_iter flavour - ITER_UBUFAl Viro8-9/+9
Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(), checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC ones. We are going to expose the things like ->write_iter() et.al. to those in subsequent commits. New predicate (user_backed_iter()) that is true for ITER_IOVEC and ITER_UBUF; places like direct-IO handling should use that for checking that pages we modify after getting them from iov_iter_get_pages() would need to be dirtied. DO NOT assume that replacing iter_is_iovec() with user_backed_iter() will solve all problems - there's code that uses iter_is_iovec() to decide how to poke around in iov_iter guts and for that the predicate replacement obviously won't suffice. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08mm: hugetlb_vmemmap: introduce the name HVOMuchun Song1-7/+5
It it inconvenient to mention the feature of optimizing vmemmap pages associated with HugeTLB pages when communicating with others since there is no specific or abbreviated name for it when it is first introduced. Let us give it a name HVO (HugeTLB Vmemmap Optimization) from now. This commit also updates the document about "hugetlb_free_vmemmap" by the way discussed in thread [1]. Link: https://lore.kernel.org/all/21aae898-d54d-cc4b-a11f-1bb7fddcfffa@redhat.com/ [1] Link: https://lkml.kernel.org/r/20220628092235.91270-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Will Deacon <will@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08NFS: don't unhash dentry during unlink/renameNeilBrown1-18/+54
NFS unlink() (and rename over existing target) must determine if the file is open, and must perform a "silly rename" instead of an unlink (or before rename) if it is. Otherwise the client might hold a file open which has been removed on the server. Consequently if it determines that the file isn't open, it must block any subsequent opens until the unlink/rename has been completed on the server. This is currently achieved by unhashing the dentry. This forces any open attempt to the slow-path for lookup which will block on i_rwsem on the directory until the unlink/rename completes. A future patch will change the VFS to only get a shared lock on i_rwsem for unlink, so this will no longer work. Instead we introduce an explicit interlock. A special value is stored in dentry->d_fsdata while the unlink/rename is running and ->d_revalidate blocks while that value is present. When ->d_revalidate unblocks, the dentry will be invalid. This closes the race without requiring exclusion on i_rwsem. d_fsdata is already used in two different ways. 1/ an IS_ROOT directory dentry might have a "devname" stored in d_fsdata. Such a dentry doesn't have a name and so cannot be the target of unlink or rename. For safety we check if an old devname is still stored, and remove it if it is. 2/ a dentry with DCACHE_NFSFS_RENAMED set will have a 'struct nfs_unlinkdata' stored in d_fsdata. While this is set maydelete() will fail, so an unlink or rename will never proceed on such a dentry. Neither of these can be in effect when a dentry is the target of unlink or rename. So we can expect d_fsdata to be NULL, and store a special value ((void*)1) which is given the name NFS_FSDATA_BLOCKED to indicate that any lookup will be blocked. The d_count() is incremented under d_lock() when a lookup finds the dentry, so we check d_count() is low, and set NFS_FSDATA_BLOCKED under the same lock to avoid any races. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-08-08Merge tag 'f2fs-for-6.0' of ↵Linus Torvalds13-253/+526
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, we mainly fixed some corner cases that manipulate a per-file compression flag inappropriately. And, we found f2fs counted valid blocks in a section incorrectly when zone capacity is set, and thus, fixed it with additional sysfs entry to check it easily. Lastly, this series includes several patches with respect to the new atomic write support such as a couple of bug fixes and re-adding atomic_write_abort support that we removed by mistake in the previous release. Enhancements: - add sysfs entries to understand atomic write operations and zone capacity - introduce memory mode to get a hint for low-memory devices - adjust the waiting time of foreground GC - decompress clusters under softirq to avoid non-deterministic latency - do not skip updating inode when retrying to flush node page - enforce single zone capacity Bug fixes: - set the compression/no-compression flags correctly - revive F2FS_IOC_ABORT_VOLATILE_WRITE - check inline_data during compressed inode conversion - understand zone capacity when calculating valid block count As usual, the series includes several minor clean-ups and sanity checks" * tag 'f2fs-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (29 commits) f2fs: use onstack pages instead of pvec f2fs: intorduce f2fs_all_cluster_page_ready f2fs: clean up f2fs_abort_atomic_write() f2fs: handle decompress only post processing in softirq f2fs: do not allow to decompress files have FI_COMPRESS_RELEASED f2fs: do not set compression bit if kernel doesn't support f2fs: remove device type check for direct IO f2fs: fix null-ptr-deref in f2fs_get_dnode_of_data f2fs: revive F2FS_IOC_ABORT_VOLATILE_WRITE f2fs: fix to do sanity check on segment type in build_sit_entries() f2fs: obsolete unused MAX_DISCARD_BLOCKS f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page() f2fs: fix to remove F2FS_COMPR_FL and tag F2FS_NOCOMP_FL at the same time f2fs: introduce sysfs atomic write statistics f2fs: don't bother wait_ms by foreground gc f2fs: invalidate meta pages only for post_read required inode f2fs: allow compression of files without blocks f2fs: fix to check inline_data during compressed inode conversion f2fs: Delete f2fs_copy_page() and replace with memcpy_page() f2fs: fix to invalidate META_MAPPING before DIO write ...
2022-08-08Merge tag 'fuse-update-6.0' of ↵Linus Torvalds8-28/+106
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Fix an issue with reusing the bdi in case of block based filesystems - Allow root (in init namespace) to access fuse filesystems in user namespaces if expicitly enabled with a module param - Misc fixes * tag 'fuse-update-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: retire block-device-based superblock on force unmount vfs: function to prevent re-use of block-device-based superblocks virtio_fs: Modify format for virtio_fs_direct_access virtiofs: delete unused parameter for virtio_fs_cleanup_vqs fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other fuse: Remove the control interface for virtio-fs fuse: ioctl: translate ENOSYS fuse: limit nsec fuse: avoid unnecessary spinlock bump fuse: fix deadlock between atomic O_TRUNC and page invalidation fuse: write inode in fuse_release()
2022-08-08Merge tag 'ovl-update-6.0' of ↵Linus Torvalds5-8/+21
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: "Just a small update" * tag 'ovl-update-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix spelling mistakes ovl: drop WARN_ON() dentry is NULL in ovl_encode_fh() ovl: improve ovl_get_acl() if POSIX ACL support is off ovl: fix some kernel-doc comments ovl: warn if trusted xattr creation fails
2022-08-08Merge tag 'exfat-for-5.20-rc1' of ↵Linus Torvalds8-138/+53
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - fix the error code of rename syscall - cleanup and suppress the superfluous error messages - remove duplicate directory entry update - add exfat git tree in MAINTAINERS * tag 'exfat-for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: MAINTAINERS: Add Namjae's exfat git tree exfat: Drop superfluous new line for error messages exfat: Downgrade ENAMETOOLONG error message to debug messages exfat: Expand exfat_err() and co directly to pr_*() macro exfat: Define NLS_NAME_* as bit flags explicitly exfat: Return ENAMETOOLONG consistently for oversized paths exfat: remove duplicate write inode for extending dir/file exfat: remove duplicate write inode for truncating file exfat: reuse __exfat_write_inode() to update directory entry
2022-08-08vfs: Check the truncate maximum size in inode_newsize_ok()David Howells1-0/+2
If something manages to set the maximum file size to MAX_OFFSET+1, this can cause the xfs and ext4 filesystems at least to become corrupt. Ordinarily, the kernel protects against userspace trying this by checking the value early in the truncate() and ftruncate() system calls calls - but there are at least two places that this check is bypassed: (1) Cachefiles will round up the EOF of the backing file to DIO block size so as to allow DIO on the final block - but this might push the offset negative. It then calls notify_change(), but this inadvertently bypasses the checking. This can be triggered if someone puts an 8EiB-1 file on a server for someone else to try and access by, say, nfs. (2) ksmbd doesn't check the value it is given in set_end_of_file_info() and then calls vfs_truncate() directly - which also bypasses the check. In both cases, it is potentially possible for a network filesystem to cause a disk filesystem to be corrupted: cachefiles in the client's cache filesystem; ksmbd in the server's filesystem. nfsd is okay as it checks the value, but we can then remove this check too. Fix this by adding a check to inode_newsize_ok(), as called from setattr_prepare(), thereby catching the issue as filesystems set up to perform the truncate with minimal opportunity for bypassing the new check. Fixes: 1f08c925e7a3 ("cachefiles: Implement backing file wrangling") Fixes: f44158485826 ("cifsd: add file operations") Signed-off-by: David Howells <dhowells@redhat.com> Reported-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Cc: stable@kernel.org Acked-by: Alexander Viro <viro@zeniv.linux.org.uk> cc: Steve French <sfrench@samba.org> cc: Hyunchul Lee <hyc.lee@gmail.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-07Merge tag '5.20-rc-smb3-client-fixes-part1' of ↵Linus Torvalds26-905/+1099
git://git.samba.org/sfrench/cifs-2.6 Pull cifs updates from Steve French: "Mostly cleanup, including smb1 refactoring: - multichannel perf improvement - move additional SMB1 code to not be compiled in when legacy support is disabled. - bug fixes, including one important one for memory leak - various cleanup patches We are still working on and testing some deferred close improvements including an important lease break fix for case when multiple deferred closes are still open, and also some additional perf improvements - those are not included here" * tag '5.20-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module number cifs: alloc_mid function should be marked as static cifs: remove "cifs_" prefix from init/destroy mids functions cifs: remove useless DeleteMidQEntry() cifs: when insecure legacy is disabled shrink amount of SMB1 code cifs: trivial style fixup cifs: fix wrong unlock before return from cifs_tree_connect() cifs: avoid use of global locks for high contention data cifs: remove remaining build warnings cifs: list_for_each() -> list_for_each_entry() cifs: update MAINTAINERS file with reviewers smb2: small refactor in smb2_check_message() cifs: Fix memory leak when using fscache cifs: remove minor build warning cifs: remove some camelCase and also some static build warnings cifs: remove unnecessary (void*) conversions. cifs: remove unnecessary type castings cifs: remove redundant initialization to variable mnt_sign_enabled smb3: check xattr value length earlier
2022-08-07Merge tag 'mm-nonmm-stable-2022-08-06-2' of ↵Linus Torvalds32-340/+636
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc updates from Andrew Morton: "Updates to various subsystems which I help look after. lib, ocfs2, fatfs, autofs, squashfs, procfs, etc. A relatively small amount of material this time" * tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits) scripts/gdb: ensure the absolute path is generated on initial source MAINTAINERS: kunit: add David Gow as a maintainer of KUnit mailmap: add linux.dev alias for Brendan Higgins mailmap: update Kirill's email profile: setup_profiling_timer() is moslty not implemented ocfs2: fix a typo in a comment ocfs2: use the bitmap API to simplify code ocfs2: remove some useless functions lib/mpi: fix typo 'the the' in comment proc: add some (hopefully) insightful comments bdi: remove enum wb_congested_state kernel/hung_task: fix address space of proc_dohung_task_timeout_secs lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t() squashfs: support reading fragments in readahead call squashfs: implement readahead squashfs: always build "file direct" version of page actor Revert "squashfs: provide backing_dev_info in order to disable read-ahead" fs/ocfs2: Fix spelling typo in comment ia64: old_rr4 added under CONFIG_HUGETLB_PAGE proc: fix test for "vsyscall=xonly" boot option ...
2022-08-06Merge tag '9p-for-5.20' of https://github.com/martinetd/linuxLinus Torvalds10-149/+124
Pull 9p updates from Dominique Martinet: - a couple of fixes - add a tracepoint for fid refcounting - some cleanup/followup on fid lookup - some cleanup around req refcounting * tag '9p-for-5.20' of https://github.com/martinetd/linux: net/9p: Initialize the iounit field during fid creation net: 9p: fix refcount leak in p9_read_work() error handling 9p: roll p9_tag_remove into p9_req_put 9p: Add client parameter to p9_req_put() 9p: Drop kref usage 9p: Fix some kernel-doc comments 9p fid refcount: cleanup p9_fid_put calls 9p fid refcount: add a 9p_fid_ref tracepoint 9p fid refcount: add p9_fid_get/put wrappers 9p: Fix minor typo in code comment 9p: Remove unnecessary variable for old fids while walking from d_parent 9p: Make the path walk logic more clear about when cloning is required 9p: Track the root fid with its own variable during lookups
2022-08-06Merge tag 'gfs2-v5.19-rc4-fixes' of ↵Linus Torvalds14-187/+111
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Instantiate glocks ouside of the glock state engine, in the contect of the process taking the glock. This moves unnecessary complexity out of the core glock code. Clean up the instantiate logic to be more sensible. - In gfs2_glock_async_wait(), cancel pending locking request upon failure. Make sure all glocks are left in a consistent state. - Various other minor cleanups and fixes. * tag 'gfs2-v5.19-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: List traversal in do_promote is safe gfs2: do_promote glock holder stealing fix gfs2: Use better variable name gfs2: Make go_instantiate take a glock gfs2: Add new go_held glock operation gfs2: Revert 'Fix "truncate in progress" hang' gfs2: Instantiate glocks ouside of glock state engine gfs2: Fix up gfs2_glock_async_wait gfs2: Minor gfs2_glock_nq_m cleanup gfs2: Fix spelling mistake in comment gfs2: Rewrap overlong comment in do_promote gfs2: Remove redundant NULL check before kfree
2022-08-05xfs: Fix false ENOSPC when performing direct write on a delalloc extent in ↵Chandan Babu R1-35/+163
cow fork On a higly fragmented filesystem a Direct IO write can fail with -ENOSPC error even though the filesystem has sufficient number of free blocks. This occurs if the file offset range on which the write operation is being performed has a delalloc extent in the cow fork and this delalloc extent begins much before the Direct IO range. In such a scenario, xfs_reflink_allocate_cow() invokes xfs_bmapi_write() to allocate the blocks mapped by the delalloc extent. The extent thus allocated may not cover the beginning of file offset range on which the Direct IO write was issued. Hence xfs_reflink_allocate_cow() ends up returning -ENOSPC. The following script reliably recreates the bug described above. #!/usr/bin/bash device=/dev/loop0 shortdev=$(basename $device) mntpnt=/mnt/ file1=${mntpnt}/file1 file2=${mntpnt}/file2 fragmentedfile=${mntpnt}/fragmentedfile punchprog=/root/repos/xfstests-dev/src/punch-alternating errortag=/sys/fs/xfs/${shortdev}/errortag/bmap_alloc_minlen_extent umount $device > /dev/null 2>&1 echo "Create FS" mkfs.xfs -f -m reflink=1 $device > /dev/null 2>&1 if [[ $? != 0 ]]; then echo "mkfs failed." exit 1 fi echo "Mount FS" mount $device $mntpnt > /dev/null 2>&1 if [[ $? != 0 ]]; then echo "mount failed." exit 1 fi echo "Create source file" xfs_io -f -c "pwrite 0 32M" $file1 > /dev/null 2>&1 sync echo "Create Reflinked file" xfs_io -f -c "reflink $file1" $file2 &>/dev/null echo "Set cowextsize" xfs_io -c "cowextsize 16M" $file1 > /dev/null 2>&1 echo "Fragment FS" xfs_io -f -c "pwrite 0 64M" $fragmentedfile > /dev/null 2>&1 sync $punchprog $fragmentedfile echo "Allocate block sized extent from now onwards" echo -n 1 > $errortag echo "Create 16MiB delalloc extent in CoW fork" xfs_io -c "pwrite 0 4k" $file1 > /dev/null 2>&1 sync echo "Direct I/O write at offset 12k" xfs_io -d -c "pwrite 12k 8k" $file1 This commit fixes the bug by invoking xfs_bmapi_write() in a loop until disk blocks are allocated for atleast the starting file offset of the Direct IO write range. Fixes: 3c68d44a2b49 ("xfs: allocate direct I/O COW blocks in iomap_begin") Reported-and-Root-caused-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: slight editing to make the locking less grody, and fix some style things] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-08-05xfs: fix intermittent hang during quotacheckDarrick J. Wong1-0/+5
Every now and then, I see the following hang during mount time quotacheck when running fstests. Turning on KASAN seems to make it happen somewhat more frequently. I've edited the backtrace for brevity. XFS (sdd): Quotacheck needed: Please wait. XFS: Assertion failed: bp->b_flags & _XBF_DELWRI_Q, file: fs/xfs/xfs_buf.c, line: 2411 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1831409 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs] CPU: 0 PID: 1831409 Comm: mount Tainted: G W 5.19.0-rc6-xfsx #rc6 09911566947b9f737b036b4af85e399e4b9aef64 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:assfail+0x46/0x4a [xfs] Code: a0 8f 41 a0 e8 45 fe ff ff 8a 1d 2c 36 10 00 80 fb 01 76 0f 0f b6 f3 48 c7 c7 c0 f0 4f a0 e8 10 f0 02 e1 80 e3 01 74 02 0f 0b <0f> 0b 5b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 RSP: 0018:ffffc900078c7b30 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8880099ac000 RCX: 000000007fffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0418fa0 RBP: ffff8880197bc1c0 R08: 0000000000000000 R09: 000000000000000a R10: 000000000000000a R11: f000000000000000 R12: ffffc900078c7d20 R13: 00000000fffffff5 R14: ffffc900078c7d20 R15: 0000000000000000 FS: 00007f0449903800(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005610ada631f0 CR3: 0000000014dd8002 CR4: 00000000001706f0 Call Trace: <TASK> xfs_buf_delwri_pushbuf+0x150/0x160 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_flush_one+0xd6/0x130 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_dquot_walk.isra.0+0x109/0x1e0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_quotacheck+0x319/0x490 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_mount_quotas+0x65/0x2c0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_mountfs+0x6b5/0xab0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_fs_fill_super+0x781/0x990 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] get_tree_bdev+0x175/0x280 vfs_get_tree+0x1a/0x80 path_mount+0x6f5/0xaa0 __x64_sys_mount+0x103/0x140 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 I /think/ this can happen if xfs_qm_flush_one is racing with xfs_qm_dquot_isolate (i.e. dquot reclaim) when the second function has taken the dquot flush lock but xfs_qm_dqflush hasn't yet locked the dquot buffer, let alone queued it to the delwri list. In this case, flush_one will fail to get the dquot flush lock, but it can lock the incore buffer, but xfs_buf_delwri_pushbuf will then trip over this ASSERT, which checks that the buffer isn't on a delwri list. The hang results because the _delwri_submit_buffers ignores non DELWRI_Q buffers, which means that xfs_buf_iowait waits forever for an IO that has not yet been scheduled. AFAICT, a reasonable solution here is to detect a dquot buffer that is not on a DELWRI list, drop it, and return -EAGAIN to try the flush again. It's not /that/ big of a deal if quotacheck writes the dquot buffer repeatedly before we even set QUOTA_CHKD. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-08-05xfs: check return codes when flushing block devicesDarrick J. Wong2-10/+24
If a blkdev_issue_flush fails, fsync needs to report that to upper levels. Modify xfs_file_fsync to capture the errors, while trying to flush as much data and log updates to disk as possible. If log writes cannot flush the data device, we need to shut down the log immediately because we've violated a log invariant. Modify this code to check the return value of blkdev_issue_flush as well. This behavior seems to go back to about 2.6.15 or so, which makes this fixes tag a bit misleading. Link: https://elixir.bootlin.com/linux/v2.6.15/source/fs/xfs/xfs_vnodeops.c#L1187 Fixes: b5071ada510a ("xfs: remove xfs_blkdev_issue_flush") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-08-05Merge tag 'mm-stable-2022-08-03' of ↵Linus Torvalds38-128/+806
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...
2022-08-05Merge part of branch 'for-next.instantiate' into for-nextAndreas Gruenbacher393-9255/+12055
2022-08-05cifs: update internal module numberSteve French1-2/+2
To 2.38 Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-05cifs: alloc_mid function should be marked as staticSteve French2-4/+1
It is only used in transport.c. Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-05cifs: remove "cifs_" prefix from init/destroy mids functionsEnzo Matsumiya1-7/+5
Rename generic mid functions to same style, i.e. without "cifs_" prefix. cifs_{init,destroy}_mids() -> {init,destroy}_mids() Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-05cifs: remove useless DeleteMidQEntry()Enzo Matsumiya7-49/+43
DeleteMidQEntry() was just a proxy for cifs_mid_q_entry_release(). - remove DeleteMidQEntry() - rename cifs_mid_q_entry_release() to release_mid() - rename kref_put() callback _cifs_mid_q_entry_release to __release_mid - rename AllocMidQEntry() to alloc_mid() - rename cifs_delete_mid() to delete_mid() Update callers to use new names. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-05cifs: when insecure legacy is disabled shrink amount of SMB1 codeSteve French13-463/+582
Currently much of the smb1 code is built even when CONFIG_CIFS_ALLOW_INSECURE_LEGACY is disabled. Move cifssmb.c to only be compiled when insecure legacy is disabled, and move various SMB1/CIFS helper functions to that ifdef. Some functions that were not SMB1/CIFS specific needed to be moved out of cifssmb.c This shrinks cifs.ko by more than 10% which is good - but also will help with the eventual movement of the legacy code to a distinct module. Follow on patches can shrink the number of ifdefs by code restructuring where smb1 code is wedged in functions that should be calling dialect specific helper functions instead, and also by moving some functions from file.c/dir.c/inode.c into smb1 specific c files. Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-05f2fs: use onstack pages instead of pvecFengnan Chang3-14/+14
Since pvec have 15 pages, it not a multiple of 4, when write compressed pages, write in 64K as a unit, it will call pagevec_lookup_range_tag agagin, sometimes this will take a lot of time. Use onstack pages instead of pvec to mitigate this problem. Signed-off-by: Fengnan Chang <fengnanchang@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-05f2fs: intorduce f2fs_all_cluster_page_readyFengnan Chang3-11/+22
When write total cluster, all pages is uptodate, there is not need to call f2fs_prepare_compress_overwrite, intorduce f2fs_all_cluster_page_ready to avoid this. Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-05f2fs: clean up f2fs_abort_atomic_write()Chao Yu4-23/+19
f2fs_abort_atomic_write() has checked whether current inode is atomic_write one or not, it's redundant to check in its caller, remove it for cleanup. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-05f2fs: handle decompress only post processing in softirqDaeho Jeong3-93/+179
Now decompression is being handled in workqueue and it makes read I/O latency non-deterministic, because of the non-deterministic scheduling nature of workqueues. So, I made it handled in softirq context only if possible, not in low memory devices, since this modification will maintain decompresion related memory a little longer. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-05f2fs: do not allow to decompress files have FI_COMPRESS_RELEASEDJaewook Kim1-0/+10
If a file has FI_COMPRESS_RELEASED, all writes for it should not be allowed. However, as of now, in case of compress_mode=user, writes triggered by IOCTLs like F2FS_IOC_DE/COMPRESS_FILE are allowed unexpectly, which could crash that file. To fix it, let's do not allow F2FS_IOC_DE/COMPRESS_IOCTL if a file already has FI_COMPRESS_RELEASED flag. This is the reproduction process: 1. $ touch ./file 2. $ chattr +c ./file 3. $ dd if=/dev/random of=./file bs=4096 count=30 conv=notrunc 4. $ dd if=/dev/zero of=./file bs=4096 count=34 seek=30 conv=notrunc 5. $ sync 6. $ do_compress ./file ; call F2FS_IOC_COMPRESS_FILE 7. $ get_compr_blocks ./file ; call F2FS_IOC_GET_COMPRESS_BLOCKS 8. $ release ./file ; call F2FS_IOC_RELEASE_COMPRESS_BLOCKS 9. $ do_compress ./file ; call F2FS_IOC_COMPRESS_FILE again 10. $ get_compr_blocks ./file ; call F2FS_IOC_GET_COMPRESS_BLOCKS again This reproduction process is tested in 128kb cluster size. You can find compr_blocks has a negative value. Fixes: 5fdb322ff2c2b ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Signed-off-by: Junbeom Yeom <junbeom.yeom@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Signed-off-by: Jaewook Kim <jw5454.kim@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-05f2fs: do not set compression bit if kernel doesn't supportJaegeuk Kim2-3/+8
If kernel doesn't have CONFIG_F2FS_FS_COMPRESSION, a file having FS_COMPR_FL via ioctl(FS_IOC_SETFLAGS) is unaccessible due to f2fs_is_compress_backend_ready(). Let's avoid it. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-05f2fs: remove device type check for direct IOEunhee Rho1-6/+1
To ensure serialized IOs, f2fs allows only LFS mode for zoned device. Remove redundant check for direct IO. Signed-off-by: Eunhee Rho <eunhee83.rho@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-05f2fs: fix null-ptr-deref in f2fs_get_dnode_of_dataYe Bin3-3/+9
There is issue as follows when test f2fs atomic write: F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock F2FS-fs (loop0): invalid crc_offset: 0 F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=1, run fsck to fix. F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=2, run fsck to fix. ================================================================== BUG: KASAN: null-ptr-deref in f2fs_get_dnode_of_data+0xac/0x16d0 Read of size 8 at addr 0000000000000028 by task rep/1990 CPU: 4 PID: 1990 Comm: rep Not tainted 5.19.0-rc6-next-20220715 #266 Call Trace: <TASK> dump_stack_lvl+0x6e/0x91 print_report.cold+0x49a/0x6bb kasan_report+0xa8/0x130 f2fs_get_dnode_of_data+0xac/0x16d0 f2fs_do_write_data_page+0x2a5/0x1030 move_data_page+0x3c5/0xdf0 do_garbage_collect+0x2015/0x36c0 f2fs_gc+0x554/0x1d30 f2fs_balance_fs+0x7f5/0xda0 f2fs_write_single_data_page+0xb66/0xdc0 f2fs_write_cache_pages+0x716/0x1420 f2fs_write_data_pages+0x84f/0x9a0 do_writepages+0x130/0x3a0 filemap_fdatawrite_wbc+0x87/0xa0 file_write_and_wait_range+0x157/0x1c0 f2fs_do_sync_file+0x206/0x12d0 f2fs_sync_file+0x99/0xc0 vfs_fsync_range+0x75/0x140 f2fs_file_write_iter+0xd7b/0x1850 vfs_write+0x645/0x780 ksys_write+0xf1/0x1e0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd As 3db1de0e582c commit changed atomic write way which new a cow_inode for atomic write file, and also mark cow_inode as FI_ATOMIC_FILE. When f2fs_do_write_data_page write cow_inode will use cow_inode's cow_inode which is NULL. Then will trigger null-ptr-deref. To solve above issue, introduce FI_COW_FILE flag for COW inode. Fiexes: 3db1de0e582c("f2fs: change the current atomic write way") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-05f2fs: revive F2FS_IOC_ABORT_VOLATILE_WRITEDaeho Jeong1-2/+28
F2FS_IOC_ABORT_VOLATILE_WRITE was used to abort a atomic write before. However it was removed accidentally. So revive it by changing the name, since volatile write had gone. Signed-off-by: Daeho Jeong <daehojeong@google.com> Fiexes: 7bc155fec5b3("f2fs: kill volatile write support") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-04Merge tag 'xfs-5.20-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds85-1657/+1984
Pull xfs updates from Darrick Wong: "The biggest changes for this release are the log scalability improvements, lockless lookups for the buffer cache, and making the attr fork a permanent part of the incore inode in preparation for directory parent pointers. There's also a bunch of bug fixes that have accumulated since -rc5. I might send you a second pull request with some more bug fixes that I'm still working on. Once the merge window ends, I will hand maintainership back to Dave Chinner until the 6.1-rc1 release so that I can conduct the design review for the online fsck feature, and try to get it merged. Summary: - Improve scalability of the XFS log by removing spinlocks and global synchronization points. - Add security labels to whiteout inodes to match the other filesystems. - Clean up per-ag pointer passing to simplify call sites. - Reduce verifier overhead by precalculating more AG geometry. - Implement fast-path lockless lookups in the buffer cache to reduce spinlock hammering. - Make attr forks a permanent part of the inode structure to fix a UAF bug and because most files these days tend to have security labels and soon will have parent pointers too. - Clean up XFS_IFORK_Q usage and give it a better name. - Fix more UAF bugs in the xattr code. - SOB my tags. - Fix some typos in the timestamp range documentation. - Fix a few more memory leaks. - Code cleanups and typo fixes. - Fix an unlocked inode fork pointer access in getbmap" * tag 'xfs-5.20-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (61 commits) xfs: delete extra space and tab in blank line xfs: fix NULL pointer dereference in xfs_getbmap() xfs: Fix typo 'the the' in comment xfs: Fix comment typo xfs: don't leak memory when attr fork loading fails xfs: fix for variable set but not used warning xfs: xfs_buf cache destroy isn't RCU safe xfs: delete unnecessary NULL checks xfs: fix comment for start time value of inode with bigtime enabled xfs: fix use-after-free in xattr node block inactivation xfs: lockless buffer lookup xfs: remove a superflous hash lookup when inserting new buffers xfs: reduce the number of atomic when locking a buffer after lookup xfs: merge xfs_buf_find() and xfs_buf_get_map() xfs: break up xfs_buf_find() into individual pieces xfs: add in-memory iunlink log item xfs: add log item precommit operation xfs: combine iunlink inode update functions xfs: clean up xfs_iunlink_update_inode() xfs: double link the unlinked inode list ...
2022-08-04Merge tag 'ext4_for_linus' of ↵Linus Torvalds25-372/+651
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Add new ioctls to set and get the file system UUID in the ext4 superblock and improved the performance of the online resizing of file systems with bigalloc enabled. Fixed a lot of bugs, in particular for the inline data feature, potential races when creating and deleting inodes with shared extended attribute blocks, and the handling of directory blocks which are corrupted" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (37 commits) ext4: add ioctls to get/set the ext4 superblock uuid ext4: avoid resizing to a partial cluster size ext4: reduce computation of overhead during resize jbd2: fix assertion 'jh->b_frozen_data == NULL' failure when journal aborted ext4: block range must be validated before use in ext4_mb_clear_bb() mbcache: automatically delete entries from cache on freeing mbcache: Remove mb_cache_entry_delete() ext2: avoid deleting xattr block that is being reused ext2: unindent codeblock in ext2_xattr_set() ext2: factor our freeing of xattr block reference ext4: fix race when reusing xattr blocks ext4: unindent codeblock in ext4_xattr_block_set() ext4: remove EA inode entry from mbcache on inode eviction mbcache: add functions to delete entry if unused mbcache: don't reclaim used entries ext4: make sure ext4_append() always allocates new block ext4: check if directory block is within i_size ext4: reflect mb_optimize_scan value in options file ext4: avoid remove directory when directory is corrupted ext4: aligned '*' in comments ...
2022-08-04Merge tag 'driver-core-6.0-rc1' of ↵Linus Torvalds4-73/+162
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core / kernfs updates from Greg KH: "Here is the set of driver core and kernfs changes for 6.0-rc1. The "biggest" thing in here is some scalability improvements for kernfs for large systems. Other than that, included in here are: - arch topology and cache info changes that have been reviewed and discussed a lot. - potential error path cleanup fixes - deferred driver probe cleanups - firmware loader cleanups and tweaks - documentation updates - other small things All of these have been in the linux-next tree for a while with no reported problems" * tag 'driver-core-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (63 commits) docs: embargoed-hardware-issues: fix invalid AMD contact email firmware_loader: Replace kmap() with kmap_local_page() sysfs docs: ABI: Fix typo in comment kobject: fix Kconfig.debug "its" grammar kernfs: Fix typo 'the the' in comment docs: driver-api: firmware: add driver firmware guidelines. (v3) arch_topology: Fix cache attributes detection in the CPU hotplug path ACPI: PPTT: Leave the table mapped for the runtime usage cacheinfo: Use atomic allocation for percpu cache attributes drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist MAINTAINERS: Change mentions of mpm to olivia docs: ABI: sysfs-devices-soc: Update Lee Jones' email address docs: ABI: sysfs-class-pwm: Update Lee Jones' email address Documentation/process: Add embargoed HW contact for LLVM Revert "kernfs: Change kernfs_notify_list to llist." ACPI: Remove the unused find_acpi_cpu_cache_topology() arch_topology: Warn that topology for nested clusters is not supported arch_topology: Add support for parsing sockets in /cpu-map arch_topology: Set cluster identifier in each core/thread from /cpu-map arch_topology: Limit span of cpu_clustergroup_mask() ...
2022-08-04ksmbd: fix heap-based overflow in set_ntacl_dacl()Namjae Jeon4-57/+119
The testcase use SMB2_SET_INFO_HE command to set a malformed file attribute under the label `security.NTACL`. SMB2_QUERY_INFO_HE command in testcase trigger the following overflow. [ 4712.003781] ================================================================== [ 4712.003790] BUG: KASAN: slab-out-of-bounds in build_sec_desc+0x842/0x1dd0 [ksmbd] [ 4712.003807] Write of size 1060 at addr ffff88801e34c068 by task kworker/0:0/4190 [ 4712.003813] CPU: 0 PID: 4190 Comm: kworker/0:0 Not tainted 5.19.0-rc5 #1 [ 4712.003850] Workqueue: ksmbd-io handle_ksmbd_work [ksmbd] [ 4712.003867] Call Trace: [ 4712.003870] <TASK> [ 4712.003873] dump_stack_lvl+0x49/0x5f [ 4712.003935] print_report.cold+0x5e/0x5cf [ 4712.003972] ? ksmbd_vfs_get_sd_xattr+0x16d/0x500 [ksmbd] [ 4712.003984] ? cmp_map_id+0x200/0x200 [ 4712.003988] ? build_sec_desc+0x842/0x1dd0 [ksmbd] [ 4712.004000] kasan_report+0xaa/0x120 [ 4712.004045] ? build_sec_desc+0x842/0x1dd0 [ksmbd] [ 4712.004056] kasan_check_range+0x100/0x1e0 [ 4712.004060] memcpy+0x3c/0x60 [ 4712.004064] build_sec_desc+0x842/0x1dd0 [ksmbd] [ 4712.004076] ? parse_sec_desc+0x580/0x580 [ksmbd] [ 4712.004088] ? ksmbd_acls_fattr+0x281/0x410 [ksmbd] [ 4712.004099] smb2_query_info+0xa8f/0x6110 [ksmbd] [ 4712.004111] ? psi_group_change+0x856/0xd70 [ 4712.004148] ? update_load_avg+0x1c3/0x1af0 [ 4712.004152] ? asym_cpu_capacity_scan+0x5d0/0x5d0 [ 4712.004157] ? xas_load+0x23/0x300 [ 4712.004162] ? smb2_query_dir+0x1530/0x1530 [ksmbd] [ 4712.004173] ? _raw_spin_lock_bh+0xe0/0xe0 [ 4712.004179] handle_ksmbd_work+0x30e/0x1020 [ksmbd] [ 4712.004192] process_one_work+0x778/0x11c0 [ 4712.004227] ? _raw_spin_lock_irq+0x8e/0xe0 [ 4712.004231] worker_thread+0x544/0x1180 [ 4712.004234] ? __cpuidle_text_end+0x4/0x4 [ 4712.004239] kthread+0x282/0x320 [ 4712.004243] ? process_one_work+0x11c0/0x11c0 [ 4712.004246] ? kthread_complete_and_exit+0x30/0x30 [ 4712.004282] ret_from_fork+0x1f/0x30 This patch add the buffer validation for security descriptor that is stored by malformed SMB2_SET_INFO_HE command. and allocate large response buffer about SMB2_O_INFO_SECURITY file info class. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17771 Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-04lockd: detect and reject lock arguments that overflowJeff Layton2-17/+10
lockd doesn't currently vet the start and length in nlm4 requests like it should, and can end up generating lock requests with arguments that overflow when passed to the filesystem. The NLM4 protocol uses unsigned 64-bit arguments for both start and length, whereas struct file_lock tracks the start and end as loff_t values. By the time we get around to calling nlm4svc_retrieve_args, we've lost the information that would allow us to determine if there was an overflow. Start tracking the actual start and len for NLM4 requests in the nlm_lock. In nlm4svc_retrieve_args, vet these values to ensure they won't cause an overflow, and return NLM4_FBIG if they do. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=392 Reported-by: Jan Kasiak <j.kasiak@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> # 5.14+
2022-08-04NFSD: discard fh_locked flag and fh_lock/fh_unlockNeilBrown3-70/+6
As all inode locking is now fully balanced, fh_put() does not need to call fh_unlock(). fh_lock() and fh_unlock() are no longer used, so discard them. These are the only real users of ->fh_locked, so discard that too. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>