summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2020-06-12proc: Use new_inode not new_inode_pseudoEric W. Biederman3-3/+3
Recently syzbot reported that unmounting proc when there is an ongoing inotify watch on the root directory of proc could result in a use after free when the watch is removed after the unmount of proc when the watcher exits. Commit 69879c01a0c3 ("proc: Remove the now unnecessary internal mount of proc") made it easier to unmount proc and allowed syzbot to see the problem, but looking at the code it has been around for a long time. Looking at the code the fsnotify watch should have been removed by fsnotify_sb_delete in generic_shutdown_super. Unfortunately the inode was allocated with new_inode_pseudo instead of new_inode so the inode was not on the sb->s_inodes list. Which prevented fsnotify_unmount_inodes from finding the inode and removing the watch as well as made it so the "VFS: Busy inodes after unmount" warning could not find the inodes to warn about them. Make all of the inodes in proc visible to generic_shutdown_super, and fsnotify_sb_delete by using new_inode instead of new_inode_pseudo. The only functional difference is that new_inode places the inodes on the sb->s_inodes list. I wrote a small test program and I can verify that without changes it can trigger this issue, and by replacing new_inode_pseudo with new_inode the issues goes away. Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/000000000000d788c905a7dfa3f4@google.com Reported-by: syzbot+7d2debdcdb3cb93c1e5e@syzkaller.appspotmail.com Fixes: 0097875bd415 ("proc: Implement /proc/thread-self to point at the directory of the current thread") Fixes: 021ada7dff22 ("procfs: switch /proc/self away from proc_dir_entry") Fixes: 51f0885e5415 ("vfs,proc: guarantee unique inodes in /proc") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-06-12ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error ↵zhangyi (F)2-16/+17
handlers In the ext4 filesystem with errors=panic, if one process is recording errno in the superblock when invoking jbd2_journal_abort() due to some error cases, it could be raced by another __ext4_abort() which is setting the SB_RDONLY flag but missing panic because errno has not been recorded. jbd2_journal_commit_transaction() jbd2_journal_abort() journal->j_flags |= JBD2_ABORT; jbd2_journal_update_sb_errno() | ext4_journal_check_start() | __ext4_abort() | sb->s_flags |= SB_RDONLY; | if (!JBD2_REC_ERR) | return; journal->j_flags |= JBD2_REC_ERR; Finally, it will no longer trigger panic because the filesystem has already been set read-only. Fix this by introduce j_abort_mutex to make sure journal abort is completed before panic, and remove JBD2_REC_ERR flag. Fixes: 4327ba52afd03 ("ext4, jbd2: ensure entering into panic after recording an error in superblock") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200609073540.3810702-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-12cifs: fix chown and chgrp when idsfromsid mount option enabledSteve French1-15/+42
idsfromsid was ignored in chown and chgrp causing it to fail when upcalls were not configured for lookup. idsfromsid allows mapping users when setting user or group ownership using "special SID" (reserved for this). Add support for chmod and chgrp when idsfromsid mount option is enabled. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-06-12smb3: allow uid and gid owners to be set on create with idsfromsid mount optionSteve French4-21/+129
Currently idsfromsid mount option allows querying owner information from the special sids used to represent POSIX uids and gids but needed changes to populate the security descriptor context with the owner information when idsfromsid mount option was used. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-06-12ext4: support xattr gnu.* namespace for the HurdJan (janneke) Nieuwenhuizen4-1/+56
The Hurd gained[0] support for moving the translator and author fields out of the inode and into the "gnu.*" xattr namespace. In anticipation of that, an xattr INDEX was reserved[1]. The Hurd has now been brought into compliance[2] with that. This patch adds support for reading and writing such attributes from Linux; you can now do something like mkdir -p hurd-root/servers/socket touch hurd-root/servers/socket/1 setfattr --name=gnu.translator --value='"/hurd/pflocal\0"' \ hurd-root/servers/socket/1 getfattr --name=gnu.translator hurd-root/servers/socket/1 # file: 1 gnu.translator="/hurd/pflocal" to setup a pipe translator, which is being used to create[3] a vm-image for the Hurd from GNU Guix. [0] https://summerofcode.withgoogle.com/projects/#5869799859027968 [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3980bd3b406addb327d858aebd19e229ea340b9a [2] https://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=a04c7bf83172faa7cb080fbe3b6c04a8415ca645 [3] https://git.savannah.gnu.org/cgit/guix.git/log/?h=wip-hurd-vm Signed-off-by: Jan Nieuwenhuizen <janneke@gnu.org> Link: https://lore.kernel.org/r/20200525193940.878-1-janneke@gnu.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-12smb311: Add tracepoints for new compound posix query infoSteve French2-6/+6
Add dynamic tracepoints for new SMB3.1.1. posix extensions query info level (100) Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-06-12smb311: add support for using info level for posix extensions querySteve French4-4/+15
Adds calls to the newer info level for query info using SMB3.1.1 posix extensions. The remaining two places that call the older query info (non-SMB3.1.1 POSIX) require passing in the fid and can be updated in a later patch. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-06-12smb311: Add support for lookup with posix extensions query infoSteve French1-1/+3
Improve support for lookup when using SMB3.1.1 posix mounts. Use new info level 100 (posix query info) Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-06-12smb311: Add support for SMB311 query info (non-compounded)Steve French2-0/+15
Add worker function for non-compounded SMB3.1.1 POSIX Extensions query info. This is needed for revalidate of root (cached) directory for example. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-06-12SMB311: Add support for query info using posix extensions (level 100)Steve French6-2/+316
Adds support for better query info on dentry revalidation (using the SMB3.1.1 POSIX extensions level 100). Followon patch will add support for translating the UID/GID from the SID and also will add support for using the posix query info on lookup. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-06-12smb3: add indatalen that can be a non-zero value to calculation of credit ↵Namjae Jeon1-1/+3
charge in smb2 ioctl Some of tests in xfstests failed with cifsd kernel server since commit e80ddeb2f70e. cifsd kernel server validates credit charge from client by calculating it base on max((InputCount + OutputCount) and (MaxInputResponse + MaxOutputResponse)) according to specification. MS-SMB2 specification describe credit charge calculation of smb2 ioctl : If Connection.SupportsMultiCredit is TRUE, the server MUST validate CreditCharge based on the maximum of (InputCount + OutputCount) and (MaxInputResponse + MaxOutputResponse), as specified in section 3.3.5.2.5. If the validation fails, it MUST fail the IOCTL request with STATUS_INVALID_PARAMETER. This patch add indatalen that can be a non-zero value to calculation of credit charge in SMB2_ioctl_init(). Fixes: e80ddeb2f70e ("smb3: fix incorrect number of credits when ioctl MaxOutputResponse > 64K") Cc: Stable <stable@vger.kernel.org> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Cc: Steve French <smfrench@gmail.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-06-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Pull updates from Andrew Morton: "A few fixes and stragglers. Subsystems affected by this patch series: mm/memory-failure, ocfs2, lib/lzo, misc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: amdgpu: a NULL ->mm does not mean a thread is a kthread lib/lzo: fix ambiguous encoding bug in lzo-rle ocfs2: fix build failure when TCP/IP is disabled mm/memory-failure: send SIGBUS(BUS_MCEERR_AR) only to current thread mm/memory-failure: prioritize prctl(PR_MCE_KILL) over vm.memory_failure_early_kill
2020-06-11ocfs2: fix build failure when TCP/IP is disabledTom Seewald1-1/+1
After commit 12abc5ee7873 ("tcp: add tcp_sock_set_nodelay") and commit c488aeadcbd0 ("tcp: add tcp_sock_set_user_timeout"), building the kernel with OCFS2_FS=y but without INET=y causes it to fail with: ld: fs/ocfs2/cluster/tcp.o: in function `o2net_accept_many': tcp.c:(.text+0x21b1): undefined reference to `tcp_sock_set_nodelay' ld: tcp.c:(.text+0x21c1): undefined reference to `tcp_sock_set_user_timeout' ld: fs/ocfs2/cluster/tcp.o: in function `o2net_start_connect': tcp.c:(.text+0x2633): undefined reference to `tcp_sock_set_nodelay' ld: tcp.c:(.text+0x2643): undefined reference to `tcp_sock_set_user_timeout' This is due to tcp_sock_set_nodelay() and tcp_sock_set_user_timeout() being declared in linux/tcp.h and defined in net/ipv4/tcp.c, which depend on TCP/IP being enabled. To fix this, make OCFS2_FS depend on INET=y which already requires NET=y. Fixes: 12abc5ee7873 ("tcp: add tcp_sock_set_nodelay") Fixes: c488aeadcbd0 ("tcp: add tcp_sock_set_user_timeout") Signed-off-by: Tom Seewald <tseewald@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: David S. Miller <davem@davemloft.net> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200606190827.23954-1-tseewald@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-11Merge tag 'io_uring-5.8-2020-06-11' of git://git.kernel.dk/linux-blockLinus Torvalds3-241/+201
Pull io_uring fixes from Jens Axboe: "A few late stragglers in here. In particular: - Validate full range for provided buffers (Bijan) - Fix bad use of kfree() in buffer registration failure (Denis) - Don't allow close of ring itself, it's not fully safe. Making it fully safe would require making the system call more expensive, which isn't worth it. - Buffer selection fix - Regression fix for O_NONBLOCK retry - Make IORING_OP_ACCEPT honor O_NONBLOCK (Jiufei) - Restrict opcode handling for SQ/IOPOLL (Pavel) - io-wq work handling cleanups and improvements (Pavel, Xiaoguang) - IOPOLL race fix (Xiaoguang)" * tag 'io_uring-5.8-2020-06-11' of git://git.kernel.dk/linux-block: io_uring: fix io_kiocb.flags modification race in IOPOLL mode io_uring: check file O_NONBLOCK state for accept io_uring: avoid unnecessary io_wq_work copy for fast poll feature io_uring: avoid whole io_wq_work copy for requests completed inline io_uring: allow O_NONBLOCK async retry io_wq: add per-wq work handler instead of per work io_uring: don't arm a timeout through work.func io_uring: remove custom ->func handlers io_uring: don't derive close state from ->func io_uring: use kvfree() in io_sqe_buffer_register() io_uring: validate the full range of provided buffers for access io_uring: re-set iov base/len for buffer select retry io_uring: move send/recv IOPOLL check into prep io_uring: deduplicate io_openat{,2}_prep() io_uring: do build_open_how() only once io_uring: fix {SQ,IO}POLL with unsupported opcodes io_uring: disallow close of ring itself
2020-06-11afs: Fix afs_store_data() to set mtime in new operation descriptorDavid Howells1-0/+1
Fix afs_store_data() so that it sets the mtime in the new operation descriptor otherwise the mtime on the server gets set to 0 when a write is stored to the server. Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Reported-by: Dave Botsch <botsch@cnf.cornell.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-19/+10
Merge some more updates from Andrew Morton: - various hotfixes and minor things - hch's use_mm/unuse_mm clearnups Subsystems affected by this patch series: mm/hugetlb, scripts, kcov, lib, nilfs, checkpatch, lib, mm/debug, ocfs2, lib, misc. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kernel: set USER_DS in kthread_use_mm kernel: better document the use_mm/unuse_mm API contract kernel: move use_mm/unuse_mm to kthread.c kernel: move use_mm/unuse_mm to kthread.c stacktrace: cleanup inconsistent variable type lib: test get_count_order/long in test_bitops.c mm: add comments on pglist_data zones ocfs2: fix spelling mistake and grammar mm/debug_vm_pgtable: fix kernel crash by checking for THP support lib: fix bitmap_parse() on 64-bit big endian archs checkpatch: correct check for kernel parameters doc nilfs2: fix null pointer dereference at nilfs_segctor_do_construct() lib/lz4/lz4_decompress.c: document deliberate use of `&' kcov: check kcov_softirq in kcov_remote_stop() scripts/spelling: add a few more typos khugepaged: selftests: fix timeout condition in wait_for_scan()
2020-06-11Merge tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds9-10/+125
Pull NFS client updates from Anna Schumaker: "New features and improvements: - Sunrpc receive buffer sizes only change when establishing a GSS credentials - Add more sunrpc tracepoints - Improve on tracepoints to capture internal NFS I/O errors Other bugfixes and cleanups: - Move a dprintk() to after a call to nfs_alloc_fattr() - Fix off-by-one issues in rpc_ntop6 - Fix a few coccicheck warnings - Use the correct SPDX license identifiers - Fix rpc_call_done assignment for BIND_CONN_TO_SESSION - Replace zero-length array with flexible array - Remove duplicate headers - Set invalid blocks after NFSv4 writes to update space_used attribute - Fix direct WRITE throughput regression" * tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits) NFS: Fix direct WRITE throughput regression SUNRPC: rpc_xprt lifetime events should record xprt->state xprtrdma: Make xprt_rdma_slot_table_entries static nfs: set invalid blocks after NFSv4 writes NFS: remove redundant initialization of variable result sunrpc: add missing newline when printing parameter 'auth_hashtable_size' by sysfs NFS: Add a tracepoint in nfs_set_pgio_error() NFS: Trace short NFS READs NFS: nfs_xdr_status should record the procedure name SUNRPC: Set SOFTCONN when destroying GSS contexts SUNRPC: rpc_call_null_helper() should set RPC_TASK_SOFT SUNRPC: rpc_call_null_helper() already sets RPC_TASK_NULLCREDS SUNRPC: trace RPC client lifetime events SUNRPC: Trace transport lifetime events SUNRPC: Split the xdr_buf event class SUNRPC: Add tracepoint to rpc_call_rpcerror() SUNRPC: Update the RPC_SHOW_SOCKET() macro SUNRPC: Update the rpc_show_task_flags() macro SUNRPC: Trace GSS context lifetimes SUNRPC: receive buffer size estimation values almost never change ...
2020-06-11Merge tag 'vfs-5.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-88/+20
Pull DAX updates part three from Darrick Wong: "Now that the xfs changes have landed, this third piece changes the FS_XFLAG_DAX ioctl code in xfs to request that the inode be reloaded after the last program closes the file, if doing so would make a S_DAX change happen. The goal here is to make dax access mode switching quicker when possible. Summary: - Teach XFS to ask the VFS to drop an inode if the administrator changes the FS_XFLAG_DAX inode flag such that the S_DAX state would change. This can result in files changing access modes without requiring an unmount cycle" * tag 'vfs-5.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fs/xfs: Update xfs_ioctl_setattr_dax_invalidate() fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() fs/xfs: Create function xfs_inode_should_enable_dax() fs/xfs: Make DAX mount option a tri-state fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS fs/xfs: Remove unnecessary initialization of i_rwsem
2020-06-11NFS: Fix direct WRITE throughput regressionChuck Lever1-0/+2
I measured a 50% throughput regression for large direct writes. The observed on-the-wire behavior is that the client sends every NFS WRITE twice: once as an UNSTABLE WRITE plus a COMMIT, and once as a FILE_SYNC WRITE. This is because the nfs_write_match_verf() check in nfs_direct_commit_complete() fails for every WRITE. Buffered writes use nfs_write_completion(), which sets req->wb_verf correctly. Direct writes use nfs_direct_write_completion(), which does not set req->wb_verf at all. This leaves req->wb_verf set to all zeroes for every direct WRITE, and thus nfs_direct_commit_completion() always sets NFS_ODIRECT_RESCHED_WRITES. This fix appears to restore nearly all of the lost performance. Fixes: 1f28476dcb98 ("NFS: Fix O_DIRECT commit verifier handling") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11nfs: set invalid blocks after NFSv4 writesZheng Bin1-3/+11
Use the following command to test nfsv4(size of file1M is 1MB): mount -t nfs -o vers=4.0,actimeo=60 127.0.0.1/dir1 /mnt cp file1M /mnt du -h /mnt/file1M -->0 within 60s, then 1M When write is done(cp file1M /mnt), will call this: nfs_writeback_done nfs4_write_done nfs4_write_done_cb nfs_writeback_update_inode nfs_post_op_update_inode_force_wcc_locked(change, ctime, mtime nfs_post_op_update_inode_force_wcc_locked nfs_set_cache_invalid nfs_refresh_inode_locked nfs_update_inode nfsd write response contains change, ctime, mtime, the flag will be clear after nfs_update_inode. Howerver, write response does not contain space_used, previous open response contains space_used whose value is 0, so inode->i_blocks is still 0. nfs_getattr -->called by "du -h" do_update |= force_sync || nfs_attribute_cache_expired -->false in 60s cache_validity = READ_ONCE(NFS_I(inode)->cache_validity) do_update |= cache_validity & (NFS_INO_INVALID_ATTR -->false if (do_update) { __nfs_revalidate_inode } Within 60s, does not send getattr request to nfsd, thus "du -h /mnt/file1M" is 0. Add a NFS_INO_INVALID_BLOCKS flag, set it when nfsv4 write is done. Fixes: 16e143751727 ("NFS: More fine grained attribute tracking") Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11NFS: remove redundant initialization of variable resultColin Ian King1-1/+1
The variable result is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11NFS: Add a tracepoint in nfs_set_pgio_error()Chuck Lever2-0/+46
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11NFS: Trace short NFS READsChuck Lever2-0/+49
A short read can generate an -EIO error without there being an error on the wire. This tracepoint acts as an eyecatcher when there is no obvious I/O error. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11NFS: nfs_xdr_status should record the procedure nameChuck Lever1-2/+13
When sunrpc trace points are not enabled, the recorded task ID information alone is not helpful. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11Merge tag 'nfsd-5.8' of git://linux-nfs.org/~bfields/linuxLinus Torvalds12-132/+569
Pull nfsd updates from Bruce Fields: "Highlights: - Keep nfsd clients from unnecessarily breaking their own delegations. Note this requires a small kthreadd addition. The result is Tejun Heo's suggestion (see link), and he was OK with this going through my tree. - Patch nfsd/clients/ to display filenames, and to fix byte-order when displaying stateid's. - fix a module loading/unloading bug, from Neil Brown. - A big series from Chuck Lever with RPC/RDMA and tracing improvements, and lay some groundwork for RPC-over-TLS" Link: https://lore.kernel.org/r/1588348912-24781-1-git-send-email-bfields@redhat.com * tag 'nfsd-5.8' of git://linux-nfs.org/~bfields/linux: (49 commits) sunrpc: use kmemdup_nul() in gssp_stringify() nfsd: safer handling of corrupted c_type nfsd4: make drc_slab global, not per-net SUNRPC: Remove unreachable error condition in rpcb_getport_async() nfsd: Fix svc_xprt refcnt leak when setup callback client failed sunrpc: clean up properly in gss_mech_unregister() sunrpc: svcauth_gss_register_pseudoflavor must reject duplicate registrations. sunrpc: check that domain table is empty at module unload. NFSD: Fix improperly-formatted Doxygen comments NFSD: Squash an annoying compiler warning SUNRPC: Clean up request deferral tracepoints NFSD: Add tracepoints for monitoring NFSD callbacks NFSD: Add tracepoints to the NFSD state management code NFSD: Add tracepoints to NFSD's duplicate reply cache SUNRPC: svc_show_status() macro should have enum definitions SUNRPC: Restructure svc_udp_recvfrom() SUNRPC: Refactor svc_recvfrom() SUNRPC: Clean up svc_release_skb() functions SUNRPC: Refactor recvfrom path dealing with incomplete TCP receives SUNRPC: Replace dprintk() call sites in TCP receive path ...
2020-06-11io_uring: fix io_kiocb.flags modification race in IOPOLL modeXiaoguang Wang1-6/+6
While testing io_uring in arm, we found sometimes io_sq_thread() keeps polling io requests even though there are not inflight io requests in block layer. After some investigations, found a possible race about io_kiocb.flags, see below race codes: 1) in the end of io_write() or io_read() req->flags &= ~REQ_F_NEED_CLEANUP; kfree(iovec); return ret; 2) in io_complete_rw_iopoll() if (res != -EAGAIN) req->flags |= REQ_F_IOPOLL_COMPLETED; In IOPOLL mode, io requests still maybe completed by interrupt, then above codes are not safe, concurrent modifications to req->flags, which is not protected by lock or is not atomic modifications. I also had disassemble io_complete_rw_iopoll() in arm: req->flags |= REQ_F_IOPOLL_COMPLETED; 0xffff000008387b18 <+76>: ldr w0, [x19,#104] 0xffff000008387b1c <+80>: orr w0, w0, #0x1000 0xffff000008387b20 <+84>: str w0, [x19,#104] Seems that the "req->flags |= REQ_F_IOPOLL_COMPLETED;" is load and modification, two instructions, which obviously is not atomic. To fix this issue, add a new iopoll_completed in io_kiocb to indicate whether io request is completed. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-11ext4: mballoc: Use this_cpu_read instead of this_cpu_ptrRitesh Harjani1-1/+1
Simplify reading a seq variable by directly using this_cpu_read API instead of doing this_cpu_ptr and then dereferencing it. This also avoid the below kernel BUG: which happens when CONFIG_DEBUG_PREEMPT is enabled BUG: using smp_processor_id() in preemptible [00000000] code: syz-fuzzer/6927 caller is ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711 CPU: 1 PID: 6927 Comm: syz-fuzzer Not tainted 5.7.0-next-20200602-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x18f/0x20d lib/dump_stack.c:118 check_preemption_disabled+0x20d/0x220 lib/smp_processor_id.c:48 ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711 ext4_ext_map_blocks+0x201b/0x33e0 fs/ext4/extents.c:4244 ext4_map_blocks+0x4cb/0x1640 fs/ext4/inode.c:626 ext4_getblk+0xad/0x520 fs/ext4/inode.c:833 ext4_bread+0x7c/0x380 fs/ext4/inode.c:883 ext4_append+0x153/0x360 fs/ext4/namei.c:67 ext4_init_new_dir fs/ext4/namei.c:2757 [inline] ext4_mkdir+0x5e0/0xdf0 fs/ext4/namei.c:2802 vfs_mkdir+0x419/0x690 fs/namei.c:3632 do_mkdirat+0x21e/0x280 fs/namei.c:3655 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 42f56b7a4a7d ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling") Suggested-by: Borislav Petkov <bp@alien8.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reported-by: syzbot+82f324bb69744c5f6969@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/534f275016296996f54ecf65168bb3392b6f653d.1591699601.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-11ext4: avoid utf8_strncasecmp() with unstable nameEric Biggers1-0/+16
If the dentry name passed to ->d_compare() fits in dentry::d_iname, then it may be concurrently modified by a rename. This can cause undefined behavior (possibly out-of-bounds memory accesses or crashes) in utf8_strncasecmp(), since fs/unicode/ isn't written to handle strings that may be concurrently modified. Fix this by first copying the filename to a stack buffer if needed. This way we get a stable snapshot of the filename. Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups") Cc: <stable@vger.kernel.org> # v5.2+ Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Daniel Rosenberg <drosen@google.com> Cc: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20200601200543.59417-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-11ext4: stop overwrite the errcode in ext4_setup_superyangerkun1-0/+1
Now the errcode from ext4_commit_super will overwrite EROFS exists in ext4_setup_super. Actually, no need to call ext4_commit_super since we will return EROFS. Fix it by goto done directly. Fixes: c89128a00838 ("ext4: handle errors on ext4_commit_super") Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200601073404.3712492-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-11ext4: fix partial cluster initialization when splitting extentJeffle Xu1-1/+1
Fix the bug when calculating the physical block number of the first block in the split extent. This bug will cause xfstests shared/298 failure on ext4 with bigalloc enabled occasionally. Ext4 error messages indicate that previously freed blocks are being freed again, and the following fsck will fail due to the inconsistency of block bitmap and bg descriptor. The following is an example case: 1. First, Initialize a ext4 filesystem with cluster size '16K', block size '4K', in which case, one cluster contains four blocks. 2. Create one file (e.g., xxx.img) on this ext4 filesystem. Now the extent tree of this file is like: ... 36864:[0]4:220160 36868:[0]14332:145408 51200:[0]2:231424 ... 3. Then execute PUNCH_HOLE fallocate on this file. The hole range is like: .. ext4_ext_remove_space: dev 254,16 ino 12 since 49506 end 49506 depth 1 ext4_ext_remove_space: dev 254,16 ino 12 since 49544 end 49546 depth 1 ext4_ext_remove_space: dev 254,16 ino 12 since 49605 end 49607 depth 1 ... 4. Then the extent tree of this file after punching is like ... 49507:[0]37:158047 49547:[0]58:158087 ... 5. Detailed procedure of punching hole [49544, 49546] 5.1. The block address space: ``` lblk ~49505 49506 49507~49543 49544~49546 49547~ ---------+------+-------------+----------------+-------- extent | hole | extent | hole | extent ---------+------+-------------+----------------+-------- pblk ~158045 158046 158047~158083 158084~158086 158087~ ``` 5.2. The detailed layout of cluster 39521: ``` cluster 39521 <-------------------------------> hole extent <----------------------><-------- lblk 49544 49545 49546 49547 +-------+-------+-------+-------+ | | | | | +-------+-------+-------+-------+ pblk 158084 1580845 158086 158087 ``` 5.3. The ftrace output when punching hole [49544, 49546]: - ext4_ext_remove_space (start 49544, end 49546) - ext4_ext_rm_leaf (start 49544, end 49546, last_extent [49507(158047), 40], partial [pclu 39522 lblk 0 state 2]) - ext4_remove_blocks (extent [49507(158047), 40], from 49544 to 49546, partial [pclu 39522 lblk 0 state 2] - ext4_free_blocks: (block 158084 count 4) - ext4_mballoc_free (extent 1/6753/1) 5.4. Ext4 error message in dmesg: EXT4-fs error (device vdb): mb_free_blocks:1457: group 1, block 158084:freeing already freed block (bit 6753); block bitmap corrupt. EXT4-fs error (device vdb): ext4_mb_generate_buddy:747: group 1, block bitmap and bg descriptor inconsistent: 19550 vs 19551 free clusters In this case, the whole cluster 39521 is freed mistakenly when freeing pblock 158084~158086 (i.e., the first three blocks of this cluster), although pblock 158087 (the last remaining block of this cluster) has not been freed yet. The root cause of this isuue is that, the pclu of the partial cluster is calculated mistakenly in ext4_ext_remove_space(). The correct partial_cluster.pclu (i.e., the cluster number of the first block in the next extent, that is, lblock 49597 (pblock 158086)) should be 39521 rather than 39522. Fixes: f4226d9ea400 ("ext4: fix partial cluster initialization") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Eric Whitney <enwlinux@gmail.com> Cc: stable@kernel.org # v3.19+ Link: https://lore.kernel.org/r/1590121124-37096-1-git-send-email-jefflexu@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-11ext4: avoid race conditions when remounting with options that change daxTheodore Ts'o1-18/+24
Trying to change dax mount options when remounting could allow mount options to be enabled for a small amount of time, and then the mount option change would be reverted. In the case of "mount -o remount,dax", this can cause a race where files would temporarily treated as DAX --- and then not. Cc: stable@kernel.org Reported-by: syzbot+bca9799bf129256190da@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-11Enable ext4 support for per-file/directory dax operationsTheodore Ts'o11-47/+194
This adds the same per-file/per-directory DAX support for ext4 as was done for xfs, now that we finally have consensus over what the interface should be.
2020-06-10kernel: set USER_DS in kthread_use_mmChristoph Hellwig2-10/+2
Some architectures like arm64 and s390 require USER_DS to be set for kernel threads to access user address space, which is the whole purpose of kthread_use_mm, but other like x86 don't. That has lead to a huge mess where some callers are fixed up once they are tested on said architectures, while others linger around and yet other like io_uring try to do "clever" optimizations for what usually is just a trivial asignment to a member in the thread_struct for most architectures. Make kthread_use_mm set USER_DS, and kthread_unuse_mm restore to the previous value instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20200404094101.672954-7-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10kernel: better document the use_mm/unuse_mm API contractChristoph Hellwig2-5/+5
Switch the function documentation to kerneldoc comments, and add WARN_ON_ONCE asserts that the calling thread is a kernel thread and does not have ->mm set (or has ->mm set in the case of unuse_mm). Also give the functions a kthread_ prefix to better document the use case. [hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio] Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de [sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename] Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [usb] Acked-by: Haren Myneni <haren@linux.ibm.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10kernel: move use_mm/unuse_mm to kthread.cChristoph Hellwig3-3/+0
Patch series "improve use_mm / unuse_mm", v2. This series improves the use_mm / unuse_mm interface by better documenting the assumptions, and my taking the set_fs manipulations spread over the callers into the core API. This patch (of 3): Use the proper API instead. Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de These helpers are only for use with kernel threads, and I will tie them more into the kthread infrastructure going forward. Also move the prototypes to kthread.h - mmu_context.h was a little weird to start with as it otherwise contains very low-level MM bits. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de Link: http://lkml.kernel.org/r/20200416053158.586887-1-hch@lst.de Link: http://lkml.kernel.org/r/20200404094101.672954-5-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10ocfs2: fix spelling mistake and grammarKeyur Patel1-1/+1
./ocfs2/mmap.c:65: bebongs ==> belonging Signed-off-by: Keyur Patel <iamkeyur96@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200608014818.102358-1-iamkeyur96@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10nilfs2: fix null pointer dereference at nilfs_segctor_do_construct()Ryusuke Konishi1-0/+2
After commit c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping has no dirty pages"), the following null pointer dereference has been reported on nilfs2: BUG: kernel NULL pointer dereference, address: 00000000000000a8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... RIP: 0010:percpu_counter_add_batch+0xa/0x60 ... Call Trace: __test_set_page_writeback+0x2d3/0x330 nilfs_segctor_do_construct+0x10d3/0x2110 [nilfs2] nilfs_segctor_construct+0x168/0x260 [nilfs2] nilfs_segctor_thread+0x127/0x3b0 [nilfs2] kthread+0xf8/0x130 ... This crash turned out to be caused by set_page_writeback() call for segment summary buffers at nilfs_segctor_prepare_write(). set_page_writeback() can call inc_wb_stat(inode_to_wb(inode), WB_WRITEBACK) where inode_to_wb(inode) is NULL if the inode of underlying block device does not have an associated wb. This fixes the issue by calling inode_attach_wb() in advance to ensure to associate the bdev inode with its wb. Fixes: c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping has no dirty pages") Reported-by: Walton Hoops <me@waltonhoops.com> Reported-by: Tomas Hlavaty <tom@logand.com> Reported-by: ARAI Shun-ichi <hermes@ceres.dti.ne.jp> Reported-by: Hideki EIRAKU <hdk1983@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> [5.4+] Link: http://lkml.kernel.org/r/20200608.011819.1399059588922299158.konishi.ryusuke@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10Merge branch 'work.epoll' of ↵Linus Torvalds1-26/+38
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull epoll update from Al Viro: "epoll conversion to read_iter from Jens; I thought there might be more epoll stuff this cycle, but uaccess took too much time" * 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: eventfd: convert to f_op->read_iter()
2020-06-10io_uring: check file O_NONBLOCK state for acceptJiufei Xue1-0/+3
If the socket is O_NONBLOCK, we should complete the accept request with -EAGAIN when data is not ready. Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-10io_uring: avoid unnecessary io_wq_work copy for fast poll featureXiaoguang Wang1-4/+9
Basically IORING_OP_POLL_ADD command and async armed poll handlers for regular commands don't touch io_wq_work, so only REQ_F_WORK_INITIALIZED is set, can we do io_wq_work copy and restore. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-10io_uring: avoid whole io_wq_work copy for requests completed inlineXiaoguang Wang2-9/+36
If requests can be submitted and completed inline, we don't need to initialize whole io_wq_work in io_init_req(), which is an expensive operation, add a new 'REQ_F_WORK_INITIALIZED' to determine whether io_wq_work is initialized and add a helper io_req_init_async(), users must call io_req_init_async() for the first time touching any members of io_wq_work. I use /dev/nullb0 to evaluate performance improvement in my physical machine: modprobe null_blk nr_devices=1 completion_nsec=0 sudo taskset -c 60 fio -name=fiotest -filename=/dev/nullb0 -iodepth=128 -thread -rw=read -ioengine=io_uring -direct=1 -bs=4k -size=100G -numjobs=1 -time_based -runtime=120 before this patch: Run status group 0 (all jobs): READ: bw=724MiB/s (759MB/s), 724MiB/s-724MiB/s (759MB/s-759MB/s), io=84.8GiB (91.1GB), run=120001-120001msec With this patch: Run status group 0 (all jobs): READ: bw=761MiB/s (798MB/s), 761MiB/s-761MiB/s (798MB/s-798MB/s), io=89.2GiB (95.8GB), run=120001-120001msec About 5% improvement. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-10Merge branch 'work.misc' of ↵Linus Torvalds2-5/+2
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A couple of trivial patches that fell through the cracks last cycle" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: fix indentation in deactivate_super() vfs: Remove duplicated d_mountpoint check in __is_local_mountpoint
2020-06-10Merge branch 'work.sysctl' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull sysctl fixes from Al Viro: "Fixups to regressions in sysctl series" * 'work.sysctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: sysctl: reject gigantic reads/write to sysctl files cdrom: fix an incorrect __user annotation on cdrom_sysctl_info trace: fix an incorrect __user annotation on stack_trace_sysctl random: fix an incorrect __user annotation on proc_do_entropy net/sysctl: remove leftover __user annotations on neigh_proc_dointvec* net/sysctl: use cpumask_parse in flow_limit_cpu_sysctl
2020-06-10Merge branch 'uaccess.misc' of ↵Linus Torvalds4-76/+105
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc uaccess updates from Al Viro: "Assorted uaccess patches for this cycle - the stuff that didn't fit into thematic series" * 'uaccess.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: bpf: make bpf_check_uarg_tail_zero() use check_zeroed_user() x86: kvm_hv_set_msr(): use __put_user() instead of 32bit __clear_user() user_regset_copyout_zero(): use clear_user() TEST_ACCESS_OK _never_ had been checked anywhere x86: switch cp_stat64() to unsafe_put_user() binfmt_flat: don't use __put_user() binfmt_elf_fdpic: don't use __... uaccess primitives binfmt_elf: don't bother with __{put,copy_to}_user() pselect6() and friends: take handling the combined 6th/7th args into helper
2020-06-10Merge branch 'proc-linus' of ↵Linus Torvalds1-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull proc fix from Eric Biederman: "Syzbot found a NULL pointer dereference if kzalloc of s_fs_info fails" * 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: s_fs_info may be NULL when proc_kill_sb is called
2020-06-10proc: s_fs_info may be NULL when proc_kill_sb is calledAlexey Gladkov1-4/+6
syzbot found that proc_fill_super() fails before filling up sb->s_fs_info, deactivate_locked_super() will be called and sb->s_fs_info will be NULL. The proc_kill_sb() does not expect fs_info to be NULL which is wrong. Link: https://lore.kernel.org/lkml/0000000000002d7ca605a7b8b1c5@google.com Reported-by: syzbot+4abac52934a48af5ff19@syzkaller.appspotmail.com Fixes: fa10fed30f25 ("proc: allow to mount many instances of proc in one pid namespace") Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-06-10sysctl: reject gigantic reads/write to sysctl filesChristoph Hellwig1-0/+4
Instead of triggering a WARN_ON deep down in the page allocator just give up early on allocations that are way larger than the usual sysctl values. Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-06-10smb3: fix typo in mount options displayed in /proc/mountsSteve French1-1/+1
Missing the final 's' in "max_channels" mount option when displayed in /proc/mounts (or by mount command) CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
2020-06-09io_uring: allow O_NONBLOCK async retryJens Axboe1-3/+7
We can assume that O_NONBLOCK is always honored, even if we don't have a ->read/write_iter() for the file type. Also unify the read/write checking for allowing async punt, having the write side factoring in the REQ_F_NOWAIT flag as well. Cc: stable@vger.kernel.org Fixes: 490e89676a52 ("io_uring: only force async punt if poll based retry can't handle it") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-09Merge tag 'fuse-update-5.8' of ↵Linus Torvalds6-85/+205
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Fix a rare deadlock in virtiofs - Fix st_blocks in writeback cache mode - Fix wrong checks in splice move causing spurious warnings - Fix a race between a GETATTR request and a FUSE_NOTIFY_INVAL_INODE notification - Use rb-tree instead of linear search for pages currently under writeout by userspace - Fix copy_file_range() inconsistencies * tag 'fuse-update-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: copy_file_range should truncate cache fuse: fix copy_file_range cache issues fuse: optimize writepages search fuse: update attr_version counter on fuse_notify_inval_inode() fuse: don't check refcount after stealing page fuse: fix weird page warning fuse: use dump_page virtiofs: do not use fuse_fill_super_common() for device installation fuse: always allow query of st_dev fuse: always flush dirty data on close(2) fuse: invalidate inode attr in writeback cache mode fuse: Update stale comment in queue_interrupt() fuse: BUG_ON correction in fuse_dev_splice_write() virtiofs: Add mount option and atime behavior to the doc virtiofs: schedule blocking async replies in separate worker