summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2020-11-21Merge tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds8-41/+124
Pull xfs fixes from Darrick Wong: "The critical fixes are for a crash that someone reported in the xattr code on 32-bit arm last week; and a revert of the rmap key comparison change from last week as it was totally wrong. I need a vacation. :( Summary: - Fix various deficiencies in online fsck's metadata checking code - Fix an integer casting bug in the xattr code on 32-bit systems - Fix a hang in an inode walk when the inode index is corrupt - Fix error codes being dropped when initializing per-AG structures - Fix nowait directio writes that partially succeed but return EAGAIN - Revert last week's rmap comparison patch because it was wrong" * tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: revert "xfs: fix rmap key and record comparison functions" xfs: don't allow NOWAIT DIO across extent boundaries xfs: return corresponding errcode if xfs_initialize_perag() fail xfs: ensure inobt record walks always make forward progress xfs: fix forkoff miscalculation related to XFS_LITINO(mp) xfs: directory scrub should check the null bestfree entries too xfs: strengthen rmap record flags checking xfs: fix the minrecs logic when dealing with inode root child blocks
2020-11-21Merge tag 'fsnotify_for_v5.10-rc5' of ↵Linus Torvalds1-5/+7
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify fix from Jan Kara: "A single fanotify fix from Amir" * tag 'fsnotify_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: fix logic of reporting name info with watched parent
2020-11-20Merge tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-blockLinus Torvalds2-15/+49
Pull io_uring fixes from Jens Axboe: "Mostly regression or stable fodder: - Disallow async path resolution of /proc/self - Tighten constraints for segmented async buffered reads - Fix double completion for a retry error case - Fix for fixed file life times (Pavel)" * tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block: io_uring: order refnode recycling io_uring: get an active ref_node from files_data io_uring: don't double complete failed reissue request mm: never attempt async page lock if we've transferred data already io_uring: handle -EOPNOTSUPP on path resolution proc: don't allow async path resolution of /proc/self components
2020-11-19ext4: fix bogus warning in ext4_update_dx_flag()Jan Kara1-1/+2
The idea of the warning in ext4_update_dx_flag() is that we should warn when we are clearing EXT4_INODE_INDEX on a filesystem with metadata checksums enabled since after clearing the flag, checksums for internal htree nodes will become invalid. So there's no need to warn (or actually do anything) when EXT4_INODE_INDEX is not set. Link: https://lore.kernel.org/r/20201118153032.17281-1-jack@suse.cz Fixes: 48a34311953d ("ext4: fix checksum errors with indexed dirs") Reported-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2020-11-19jbd2: fix kernel-doc markupsMauro Carvalho Chehab2-31/+34
Kernel-doc markup should use this format: identifier - description They should not have any type before that, as otherwise the parser won't do the right thing. Also, some identifiers have different names between their prototypes and the kernel-doc markup. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/72f5c6628f5f278d67625f60893ffbc2ca28d46e.1605521731.git.mchehab+huawei@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-19xfs: revert "xfs: fix rmap key and record comparison functions"Darrick J. Wong1-8/+8
This reverts commit 6ff646b2ceb0eec916101877f38da0b73e3a5b7f. Your maintainer committed a major braino in the rmap code by adding the attr fork, bmbt, and unwritten extent usage bits into rmap record key comparisons. While XFS uses the usage bits *in the rmap records* for cross-referencing metadata in xfs_scrub and xfs_repair, it only needs the owner and offset information to distinguish between reverse mappings of the same physical extent into the data fork of a file at multiple offsets. The other bits are not important for key comparisons for index lookups, and never have been. Eric Sandeen reports that this causes regressions in generic/299, so undo this patch before it does more damage. Reported-by: Eric Sandeen <sandeen@sandeen.net> Fixes: 6ff646b2ceb0 ("xfs: fix rmap key and record comparison functions") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2020-11-19ext4: drop fast_commit from /proc/mountsTheodore Ts'o1-4/+0
The options in /proc/mounts must be valid mount options --- and fast_commit is not a mount option. Otherwise, command sequences like this will fail: # mount /dev/vdc /vdc # mkdir -p /vdc/phoronix_test_suite /pts # mount --bind /vdc/phoronix_test_suite /pts # mount -o remount,nodioread_nolock /pts mount: /pts: mount point not mounted or bad option. And in the system logs, you'll find: EXT4-fs (vdc): Unrecognized mount option "fast_commit" or missing value Fixes: 995a3ed67fc8 ("ext4: add fast_commit feature and handling for extended mount options") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-19xfs: don't allow NOWAIT DIO across extent boundariesDave Chinner1-0/+29
Jens has reported a situation where partial direct IOs can be issued and completed yet still return -EAGAIN. We don't want this to report a short IO as we want XFS to complete user DIO entirely or not at all. This partial IO situation can occur on a write IO that is split across an allocated extent and a hole, and the second mapping is returning EAGAIN because allocation would be required. The trivial reproducer: $ sudo xfs_io -fdt -c "pwrite 0 4k" -c "pwrite -V 1 -b 8k -N 0 8k" /mnt/scr/foo wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0.0001 sec (27.509 MiB/sec and 7042.2535 ops/sec) pwrite: Resource temporarily unavailable $ The pwritev2(0, 8kB, RWF_NOWAIT) call returns EAGAIN having done the first 4kB write: xfs_file_direct_write: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 0x2000 iomap_apply: dev 259:1 ino 0x83 pos 0 length 8192 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor xfs_ilock_nowait: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap xfs_iunlock: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin xfs_iomap_found: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 8192 fork data startoff 0x0 startblock 24 blockcount 0x1 iomap_apply_dstmap: dev 259:1 ino 0x83 bdev 259:1 addr 102400 offset 0 length 4096 type MAPPED flags DIRTY Here the first iomap loop has mapped the first 4kB of the file and issued the IO, and we enter the second iomap_apply loop: iomap_apply: dev 259:1 ino 0x83 pos 4096 length 4096 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor xfs_ilock_nowait: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap xfs_iunlock: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin And we exit with -EAGAIN out because we hit the allocate case trying to make the second 4kB block. Then IO completes on the first 4kB and the original IO context completes and unlocks the inode, returning -EAGAIN to userspace: xfs_end_io_direct_write: dev 259:1 ino 0x83 isize 0x1000 disize 0x1000 offset 0x0 count 4096 xfs_iunlock: dev 259:1 ino 0x83 flags IOLOCK_SHARED caller xfs_file_dio_aio_write There are other vectors to the same problem when we re-enter the mapping code if we have to make multiple mappinfs under NOWAIT conditions. e.g. failing trylocks, COW extents being found, allocation being required, and so on. Avoid all these potential problems by only allowing IOMAP_NOWAIT IO to go ahead if the mapping we retrieve for the IO spans an entire allocated extent. This avoids the possibility of subsequent mappings to complete the IO from triggering NOWAIT semantics by any means as NOWAIT IO will now only enter the mapping code once per NOWAIT IO. Reported-and-tested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-11-18xfs: return corresponding errcode if xfs_initialize_perag() failYu Kuai1-3/+8
In xfs_initialize_perag(), if kmem_zalloc(), xfs_buf_hash_init(), or radix_tree_preload() failed, the returned value 'error' is not set accordingly. Reported-as-fixing: 8b26c5825e02 ("xfs: handle ENOMEM correctly during initialisation of perag structures") Fixes: 9b2471797942 ("xfs: cache unlinked pointers in an rhashtable") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-11-18xfs: ensure inobt record walks always make forward progressDarrick J. Wong1-3/+24
The aim of the inode btree record iterator function is to call a callback on every record in the btree. To avoid having to tear down and recreate the inode btree cursor around every callback, it caches a certain number of records in a memory buffer. After each batch of callback invocations, we have to perform a btree lookup to find the next record after where we left off. However, if the keys of the inode btree are corrupt, the lookup might put us in the wrong part of the inode btree, causing the walk function to loop forever. Therefore, we add extra cursor tracking to make sure that we never go backwards neither when performing the lookup nor when jumping to the next inobt record. This also fixes an off by one error where upon resume the lookup should have been for the inode /after/ the point at which we stopped. Found by fuzzing xfs/460 with keys[2].startino = ones causing bulkstat and quotacheck to hang. Fixes: a211432c27ff ("xfs: create simplified inode walk function") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2020-11-18xfs: fix forkoff miscalculation related to XFS_LITINO(mp)Gao Xiang1-1/+7
Currently, commit e9e2eae89ddb dropped a (int) decoration from XFS_LITINO(mp), and since sizeof() expression is also involved, the result of XFS_LITINO(mp) is simply as the size_t type (commonly unsigned long). Considering the expression in xfs_attr_shortform_bytesfit(): offset = (XFS_LITINO(mp) - bytes) >> 3; let "bytes" be (int)340, and "XFS_LITINO(mp)" be (unsigned long)336. on 64-bit platform, the expression is offset = ((unsigned long)336 - (int)340) >> 3 = (int)(0xfffffffffffffffcUL >> 3) = -1 but on 32-bit platform, the expression is offset = ((unsigned long)336 - (int)340) >> 3 = (int)(0xfffffffcUL >> 3) = 0x1fffffff instead. so offset becomes a large positive number on 32-bit platform, and cause xfs_attr_shortform_bytesfit() returns maxforkoff rather than 0. Therefore, one result is "ASSERT(new_size <= XFS_IFORK_SIZE(ip, whichfork));" assertion failure in xfs_idata_realloc(), which was also the root cause of the original bugreport from Dennis, see: https://bugzilla.redhat.com/show_bug.cgi?id=1894177 And it can also be manually triggered with the following commands: $ touch a; $ setfattr -n user.0 -v "`seq 0 80`" a; $ setfattr -n user.1 -v "`seq 0 80`" a on 32-bit platform. Fix the case in xfs_attr_shortform_bytesfit() by bailing out "XFS_LITINO(mp) < bytes" in advance suggested by Eric and a misleading comment together with this bugfix suggested by Darrick. It seems the other users of XFS_LITINO(mp) are not impacted. Fixes: e9e2eae89ddb ("xfs: only check the superblock version for dinode size calculation") Cc: <stable@vger.kernel.org> # 5.7+ Reported-and-tested-by: Dennis Gilmore <dgilmore@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-11-18xfs: directory scrub should check the null bestfree entries tooDarrick J. Wong1-4/+17
Teach the directory scrubber to check all the bestfree entries, including the null ones. We want to be able to detect the case where the entry is null but there actually /is/ a directory data block. Found by fuzzing lbests[0] = ones in xfs/391. Fixes: df481968f33b ("xfs: scrub directory freespace") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-11-18xfs: strengthen rmap record flags checkingDarrick J. Wong1-4/+4
We always know the correct state of the rmap record flags (attr, bmbt, unwritten) so check them by direct comparison. Fixes: d852657ccfc0 ("xfs: cross-reference reverse-mapping btree") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-11-18xfs: fix the minrecs logic when dealing with inode root child blocksDarrick J. Wong1-18/+27
The comment and logic in xchk_btree_check_minrecs for dealing with inode-rooted btrees isn't quite correct. While the direct children of the inode root are allowed to have fewer records than what would normally be allowed for a regular ondisk btree block, this is only true if there is only one child block and the number of records don't fit in the inode root. Fixes: 08a3a692ef58 ("xfs: btree scrub should check minrecs") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-11-18gfs2: Fix regression in freeze_go_syncBob Peterson1-1/+12
Patch 541656d3a513 ("gfs2: freeze should work on read-only mounts") changed the check for glock state in function freeze_go_sync() from "gl->gl_state == LM_ST_SHARED" to "gl->gl_req == LM_ST_EXCLUSIVE". That's wrong and it regressed gfs2's freeze/thaw mechanism because it caused only the freezing node (which requests the glock in EX) to queue freeze work. All nodes go through this go_sync code path during the freeze to drop their SHared hold on the freeze glock, allowing the freezing node to acquire it in EXclusive mode. But all the nodes must freeze access to the file system locally, so they ALL must queue freeze work. The freeze_work calls freeze_func, which makes a request to reacquire the freeze glock in SH, effectively blocking until the thaw from the EX holder. Once thawed, the freezing node drops its EX hold on the freeze glock, then the (blocked) freeze_func reacquires the freeze glock in SH again (on all nodes, including the freezer) so all nodes go back to a thawed state. This patch changes the check back to gl_state == LM_ST_SHARED like it was prior to 541656d3a513. Fixes: 541656d3a513 ("gfs2: freeze should work on read-only mounts") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-18io_uring: order refnode recyclingPavel Begunkov1-10/+23
Don't recycle a refnode until we're done with all requests of nodes ejected before. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-18io_uring: get an active ref_node from files_dataPavel Begunkov1-3/+1
An active ref_node always can be found in ctx->files_data, it's much safer to get it this way instead of poking into files_data->ref_list. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-17io_uring: don't double complete failed reissue requestJens Axboe1-1/+0
Zorro reports that an xfstest test case is failing, and it turns out that for the reissue path we can potentially issue a double completion on the request for the failure path. There's an issue around the retry as well, but for now, at least just make sure that we handle the error path correctly. Cc: stable@vger.kernel.org Fixes: b63534c41e20 ("io_uring: re-issue block requests that failed because of resources") Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-15smb3: Handle error case during offload read pathRohith Surabattula1-1/+19
Mid callback needs to be called only when valid data is read into pages. These patches address a problem found during decryption offload: CIFS: VFS: trying to dequeue a deleted mid that could cause a refcount use after free: Workqueue: smb3decryptd smb2_decrypt_offload [cifs] Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> #5.4+ Signed-off-by: Steve French <stfrench@microsoft.com>
2020-11-15smb3: Avoid Mid pending list corruptionRohith Surabattula1-9/+46
When reconnect happens Mid queue can be corrupted when both demultiplex and offload thread try to dequeue the MID from the pending list. These patches address a problem found during decryption offload: CIFS: VFS: trying to dequeue a deleted mid that could cause a refcount use after free: Workqueue: smb3decryptd smb2_decrypt_offload [cifs] Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> #5.4+ Signed-off-by: Steve French <stfrench@microsoft.com>
2020-11-15smb3: Call cifs reconnect from demultiplex threadRohith Surabattula1-5/+8
cifs_reconnect needs to be called only from demultiplex thread. skip cifs_reconnect in offload thread. So, cifs_reconnect will be called by demultiplex thread in subsequent request. These patches address a problem found during decryption offload: CIFS: VFS: trying to dequeue a deleted mid that can cause a refcount use after free: [ 1271.389453] Workqueue: smb3decryptd smb2_decrypt_offload [cifs] [ 1271.389456] RIP: 0010:refcount_warn_saturate+0xae/0xf0 [ 1271.389457] Code: fa 1d 6a 01 01 e8 c7 44 b1 ff 0f 0b 5d c3 80 3d e7 1d 6a 01 00 75 91 48 c7 c7 d8 be 1d a2 c6 05 d7 1d 6a 01 01 e8 a7 44 b1 ff <0f> 0b 5d c3 80 3d c5 1d 6a 01 00 0f 85 6d ff ff ff 48 c7 c7 30 bf [ 1271.389458] RSP: 0018:ffffa4cdc1f87e30 EFLAGS: 00010286 [ 1271.389458] RAX: 0000000000000000 RBX: ffff9974d2809f00 RCX: ffff9974df898cc8 [ 1271.389459] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9974df898cc0 [ 1271.389460] RBP: ffffa4cdc1f87e30 R08: 0000000000000004 R09: 00000000000002c0 [ 1271.389460] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9974b7fdb5c0 [ 1271.389461] R13: ffff9974d2809f00 R14: ffff9974ccea0a80 R15: ffff99748e60db80 [ 1271.389462] FS: 0000000000000000(0000) GS:ffff9974df880000(0000) knlGS:0000000000000000 [ 1271.389462] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1271.389463] CR2: 000055c60f344fe4 CR3: 0000001031a3c002 CR4: 00000000003706e0 [ 1271.389465] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1271.389465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1271.389466] Call Trace: [ 1271.389483] cifs_mid_q_entry_release+0xce/0x110 [cifs] [ 1271.389499] smb2_decrypt_offload+0xa9/0x1c0 [cifs] [ 1271.389501] process_one_work+0x1e8/0x3b0 [ 1271.389503] worker_thread+0x50/0x370 [ 1271.389504] kthread+0x12f/0x150 [ 1271.389506] ? process_one_work+0x3b0/0x3b0 [ 1271.389507] ? __kthread_bind_mask+0x70/0x70 [ 1271.389509] ret_from_fork+0x22/0x30 Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> #5.4+ Signed-off-by: Steve French <stfrench@microsoft.com>
2020-11-15cifs: fix a memleak with modefromsidNamjae Jeon1-0/+1
kmemleak reported a memory leak allocated in query_info() when cifs is working with modefromsid. backtrace: [<00000000aeef6a1e>] slab_post_alloc_hook+0x58/0x510 [<00000000b2f7a440>] __kmalloc+0x1a0/0x390 [<000000006d470ebc>] query_info+0x5b5/0x700 [cifs] [<00000000bad76ce0>] SMB2_query_acl+0x2b/0x30 [cifs] [<000000001fa09606>] get_smb2_acl_by_path+0x2f3/0x720 [cifs] [<000000001b6ebab7>] get_smb2_acl+0x75/0x90 [cifs] [<00000000abf43904>] cifs_acl_to_fattr+0x13b/0x1d0 [cifs] [<00000000a5372ec3>] cifs_get_inode_info+0x4cd/0x9a0 [cifs] [<00000000388e0a04>] cifs_revalidate_dentry_attr+0x1cd/0x510 [cifs] [<0000000046b6b352>] cifs_getattr+0x8a/0x260 [cifs] [<000000007692c95e>] vfs_getattr_nosec+0xa1/0xc0 [<00000000cbc7d742>] vfs_getattr+0x36/0x40 [<00000000de8acf67>] vfs_statx_fd+0x4a/0x80 [<00000000a58c6adb>] __do_sys_newfstat+0x31/0x70 [<00000000300b3b4e>] __x64_sys_newfstat+0x16/0x20 [<000000006d8e9c48>] do_syscall_64+0x37/0x80 This patch add missing kfree for pntsd when mounting modefromsid option. Cc: Stable <stable@vger.kernel.org> # v5.4+ Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-11-15fix return values of seq_read_iter()Al Viro1-30/+27
Unlike ->read(), ->read_iter() instances *must* return the amount of data they'd left in iterator. For ->read() returning less than it has actually copied is a QoI issue; read(fd, unmapped_page - 5, 8) is allowed to fill all 5 bytes of destination and return 4; it's not nice to caller, but POSIX allows pretty much anything in such situation, up to and including a SIGSEGV. generic_file_splice_read() uses pipe-backed iterator as destination; there a short copy comes from pipe being full, not from running into an un{mapped,writable} page in the middle of destination as we have for iovec-backed iterators read(2) uses. And there we rely upon the ->read_iter() reporting the actual amount it has left in destination. Conversion of a ->read() instance into ->read_iter() has to watch out for that. If you really need an "all or nothing" kind of behaviour somewhere, you need to do iov_iter_revert() to prune the partial copy. In case of seq_read_iter() we can handle short copy just fine; the data is in m->buf and next call will fetch it from there. Fixes: d4d50710a8b4 (seq_file: add seq_read_iter) Tested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-11-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+1
Merge fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: mm (migration, vmscan, slub, gup, memcg, hugetlbfs), mailmap, kbuild, reboot, watchdog, panic, and ocfs2" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: ocfs2: initialize ip_next_orphan panic: don't dump stack twice on warn hugetlbfs: fix anon huge page migration race mm: memcontrol: fix missing wakeup polling thread kernel/watchdog: fix watchdog_allowed_mask not used warning reboot: fix overflow parsing reboot cpu number Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint" compiler.h: fix barrier_data() on clang mm/gup: use unpin_user_pages() in __gup_longterm_locked() mm/slub: fix panic in slab_alloc_node() mailmap: fix entry for Dmitry Baryshkov/Eremin-Solenikov mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate mm/compaction: count pages and stop correctly during page isolation
2020-11-14afs: Fix afs_write_end() when called with copied == 0 [ver #3]David Howells1-1/+4
When afs_write_end() is called with copied == 0, it tries to set the dirty region, but there's no way to actually encode a 0-length region in the encoding in page->private. "0,0", for example, indicates a 1-byte region at offset 0. The maths miscalculates this and sets it incorrectly. Fix it to just do nothing but unlock and put the page in this case. We don't actually need to mark the page dirty as nothing presumably changed. Fixes: 65dd2d6072d3 ("afs: Alter dirty range encoding in page->private") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-14ocfs2: initialize ip_next_orphanWengang Wang1-0/+1
Though problem if found on a lower 4.1.12 kernel, I think upstream has same issue. In one node in the cluster, there is the following callback trace: # cat /proc/21473/stack __ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2] ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2] ocfs2_evict_inode+0x152/0x820 [ocfs2] evict+0xae/0x1a0 iput+0x1c6/0x230 ocfs2_orphan_filldir+0x5d/0x100 [ocfs2] ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2] ocfs2_dir_foreach+0x29/0x30 [ocfs2] ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2] ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2] process_one_work+0x169/0x4a0 worker_thread+0x5b/0x560 kthread+0xcb/0xf0 ret_from_fork+0x61/0x90 The above stack is not reasonable, the final iput shouldn't happen in ocfs2_orphan_filldir() function. Looking at the code, 2067 /* Skip inodes which are already added to recover list, since dio may 2068 * happen concurrently with unlink/rename */ 2069 if (OCFS2_I(iter)->ip_next_orphan) { 2070 iput(iter); 2071 return 0; 2072 } 2073 The logic thinks the inode is already in recover list on seeing ip_next_orphan is non-NULL, so it skip this inode after dropping a reference which incremented in ocfs2_iget(). While, if the inode is already in recover list, it should have another reference and the iput() at line 2070 should not be the final iput (dropping the last reference). So I don't think the inode is really in the recover list (no vmcore to confirm). Note that ocfs2_queue_orphans(), though not shown up in the call back trace, is holding cluster lock on the orphan directory when looking up for unlinked inodes. The on disk inode eviction could involve a lot of IOs which may need long time to finish. That means this node could hold the cluster lock for very long time, that can lead to the lock requests (from other nodes) to the orhpan directory hang for long time. Looking at more on ip_next_orphan, I found it's not initialized when allocating a new ocfs2_inode_info structure. This causes te reflink operations from some nodes hang for very long time waiting for the cluster lock on the orphan directory. Fix: initialize ip_next_orphan as NULL. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201109171746.27884-1-wen.gang.wang@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-14io_uring: handle -EOPNOTSUPP on path resolutionJens Axboe1-1/+18
Any attempt to do path resolution on /proc/self from an async worker will yield -EOPNOTSUPP. We can safely do that resolution from the task itself, and without blocking, so retry it from there. Ideally io_uring would know this upfront and not have to go through the worker thread to find out, but that doesn't currently seem feasible. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-13Merge tag 'vfs-5.10-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds3-52/+2
Pull fs freeze fix and cleanups from Darrick Wong: "A single vfs fix for 5.10, along with two subsequent cleanups. A very long time ago, a hack was added to the vfs fs freeze protection code to work around lockdep complaints about XFS, which would try to run a transaction (which requires intwrite protection) to finalize an xfs freeze (by which time the vfs had already taken intwrite). Fast forward a few years, and XFS fixed the recursive intwrite problem on its own, and the hack became unnecessary. Fast forward almost a decade, and latent bugs in the code converting this hack from freeze flags to freeze locks combine with lockdep bugs to make this reproduce frequently enough to notice page faults racing with freeze. Since the hack is unnecessary and causes thread race errors, just get rid of it completely. Making this kind of vfs change midway through a cycle makes me nervous, but a large enough number of the usual VFS/ext4/XFS/btrfs suspects have said this looks good and solves a real problem vector. And once that removal is done, __sb_start_write is now simple enough that it becomes possible to refactor the function into smaller, simpler static inline helpers in linux/fs.h. The cleanup is straightforward. Summary: - Finally remove the "convert to trylock" weirdness in the fs freezer code. It was necessary 10 years ago to deal with nested transactions in XFS, but we've long since removed that; and now this is causing subtle race conditions when lockdep goes offline and sb_start_* aren't prepared to retry a trylock failure. - Minor cleanups of the sb_start_* fs freeze helpers" * tag 'vfs-5.10-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: vfs: move __sb_{start,end}_write* to fs.h vfs: separate __sb_start_write into blocking and non-blocking helpers vfs: remove lockdep bogosity in __sb_start_write
2020-11-13Merge tag 'xfs-5.10-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds5-15/+15
Pull xfs fixes from Darrick Wong: - Fix a fairly serious problem where the reverse mapping btree key comparison functions were silently ignoring parts of the keyspace when doing comparisons - Fix a thinko in the online refcount scrubber - Fix a missing unlock in the pnfs code * tag 'xfs-5.10-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix a missing unlock on error in xfs_fs_map_blocks xfs: fix brainos in the refcount scrubber's rmap fragment processor xfs: fix rmap key and record comparison functions xfs: set the unwritten bit in rmap lookup flags in xchk_bmap_get_rmapextents xfs: fix flags argument to rmap lookup when converting shared file rmaps
2020-11-13proc: don't allow async path resolution of /proc/self componentsJens Axboe1-0/+7
If this is attempted by a kthread, then return -EOPNOTSUPP as we don't currently support that. Once we can get task_pid_ptr() doing the right thing, then this can go away again. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-13Merge tag 'io_uring-5.10-2020-11-13' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull io_uring fix from Jens Axboe: "A single fix in here, for a missed rounding case at setup time, which caused an otherwise legitimate setup case to return -EINVAL if used with unaligned ring size values" * tag 'io_uring-5.10-2020-11-13' of git://git.kernel.dk/linux-block: io_uring: round-up cq size before comparing with rounded sq size
2020-11-13btrfs: tree-checker: add missing return after error in root_itemDaniel Xu1-0/+1
There's a missing return statement after an error is found in the root_item, this can cause further problems when a crafted image triggers the error. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210181 Fixes: 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-13btrfs: qgroup: don't commit transaction when we already hold the handleQu Wenruo1-1/+19
[BUG] When running the following script, btrfs will trigger an ASSERT(): #/bin/bash mkfs.btrfs -f $dev mount $dev $mnt xfs_io -f -c "pwrite 0 1G" $mnt/file sync btrfs quota enable $mnt btrfs quota rescan -w $mnt # Manually set the limit below current usage btrfs qgroup limit 512M $mnt $mnt # Crash happens touch $mnt/file The dmesg looks like this: assertion failed: refcount_read(&trans->use_count) == 1, in fs/btrfs/transaction.c:2022 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3230! invalid opcode: 0000 [#1] SMP PTI RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs] btrfs_commit_transaction.cold+0x11/0x5d [btrfs] try_flush_qgroup+0x67/0x100 [btrfs] __btrfs_qgroup_reserve_meta+0x3a/0x60 [btrfs] btrfs_delayed_update_inode+0xaa/0x350 [btrfs] btrfs_update_inode+0x9d/0x110 [btrfs] btrfs_dirty_inode+0x5d/0xd0 [btrfs] touch_atime+0xb5/0x100 iterate_dir+0xf1/0x1b0 __x64_sys_getdents64+0x78/0x110 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fb5afe588db [CAUSE] In try_flush_qgroup(), we assume we don't hold a transaction handle at all. This is true for data reservation and mostly true for metadata. Since data space reservation always happens before we start a transaction, and for most metadata operation we reserve space in start_transaction(). But there is an exception, btrfs_delayed_inode_reserve_metadata(). It holds a transaction handle, while still trying to reserve extra metadata space. When we hit EDQUOT inside btrfs_delayed_inode_reserve_metadata(), we will join current transaction and commit, while we still have transaction handle from qgroup code. [FIX] Let's check current->journal before we join the transaction. If current->journal is unset or BTRFS_SEND_TRANS_STUB, it means we are not holding a transaction, thus are able to join and then commit transaction. If current->journal is a valid transaction handle, we avoid committing transaction and just end it This is less effective than committing current transaction, as it won't free metadata reserved space, but we may still free some data space before new data writes. Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1178634 Fixes: c53e9653605d ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-13btrfs: fix missing delalloc new bit for new delalloc rangesFilipe Manana3-61/+66
When doing a buffered write, through one of the write family syscalls, we look for ranges which currently don't have allocated extents and set the 'delalloc new' bit on them, so that we can report a correct number of used blocks to the stat(2) syscall until delalloc is flushed and ordered extents complete. However there are a few other places where we can do a buffered write against a range that is mapped to a hole (no extent allocated) and where we do not set the 'new delalloc' bit. Those places are: - Doing a memory mapped write against a hole; - Cloning an inline extent into a hole starting at file offset 0; - Calling btrfs_cont_expand() when the i_size of the file is not aligned to the sector size and is located in a hole. For example when cloning to a destination offset beyond EOF. So after such cases, until the corresponding delalloc range is flushed and the respective ordered extents complete, we can report an incorrect number of blocks used through the stat(2) syscall. In some cases we can end up reporting 0 used blocks to stat(2), which is a particular bad value to report as it may mislead tools to think a file is completely sparse when its i_size is not zero, making them skip reading any data, an undesired consequence for tools such as archivers and other backup tools, as reported a long time ago in the following thread (and other past threads): https://lists.gnu.org/archive/html/bug-tar/2016-07/msg00001.html Example reproducer: $ cat reproducer.sh #!/bin/bash MNT=/mnt/sdi DEV=/dev/sdi mkfs.btrfs -f $DEV > /dev/null # mkfs.xfs -f $DEV > /dev/null # mkfs.ext4 -F $DEV > /dev/null # mkfs.f2fs -f $DEV > /dev/null mount $DEV $MNT xfs_io -f -c "truncate 64K" \ -c "mmap -w 0 64K" \ -c "mwrite -S 0xab 0 64K" \ -c "munmap" \ $MNT/foo blocks_used=$(stat -c %b $MNT/foo) echo "blocks used: $blocks_used" if [ $blocks_used -eq 0 ]; then echo "ERROR: blocks used is 0" fi umount $DEV $ ./reproducer.sh blocks used: 0 ERROR: blocks used is 0 So move the logic that decides to set the 'delalloc bit' bit into the function btrfs_set_extent_delalloc(), since that is what we use for all those missing cases as well as for the cases that currently work well. This change is also preparatory work for an upcoming patch that fixes other problems related to tracking and reporting the number of bytes used by an inode. CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-13Merge tag 'ext4_for_linus_bugfixes' of ↵Linus Torvalds2-14/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Two ext4 bug fixes, one being a revert of a commit sent during the merge window" * tag 'ext4_for_linus_bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: Revert "ext4: fix superblock checksum calculation race" ext4: handle dax mount option collision
2020-11-12Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds1-1/+1
Pull fscrypt fix from Eric Biggers: "Fix a regression where new files weren't using inline encryption when they should be" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: fix inline encryption not used on new files
2020-11-12Merge tag 'gfs2-v5.10-rc3-fixes' of ↵Linus Torvalds4-12/+10
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "Fix jdata data corruption and glock reference leak" * tag 'gfs2-v5.10-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix case in which ail writes are done to jdata holes Revert "gfs2: Ignore journal log writes for jdata holes" gfs2: fix possible reference leak in gfs2_check_blk_type
2020-11-12Merge tag 'nfs-for-5.10-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds4-12/+15
Pull NFS client bugfixes from Anna Schumaker: "Stable fixes: - Fix failure to unregister shrinker Other fixes: - Fix unnecessary locking to clear up some contention - Fix listxattr receive buffer size - Fix default mount options for nfsroot" * tag 'nfs-for-5.10-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Remove unnecessary inode lock in nfs_fsync_dir() NFS: Remove unnecessary inode locking in nfs_llseek_dir() NFS: Fix listxattr receive buffer size NFSv4.2: fix failure to unregister shrinker nfsroot: Default mount option should ask for built-in NFS version
2020-11-12gfs2: Fix case in which ail writes are done to jdata holesBob Peterson2-1/+3
Patch b2a846dbef4e ("gfs2: Ignore journal log writes for jdata holes") tried (unsuccessfully) to fix a case in which writes were done to jdata blocks, the blocks are sent to the ail list, then a punch_hole or truncate operation caused the blocks to be freed. In other words, the ail items are for jdata holes. Before b2a846dbef4e, the jdata hole caused function gfs2_block_map to return -EIO, which was eventually interpreted as an IO error to the journal, and then withdraw. This patch changes function gfs2_get_block_noalloc, which is only used for jdata writes, so it returns -ENODATA rather than -EIO, and when -ENODATA is returned to gfs2_ail1_start_one, the error is ignored. We can safely ignore it because gfs2_ail1_start_one is only called when the jdata pages have already been written and truncated, so the ail1 content no longer applies. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-12Revert "gfs2: Ignore journal log writes for jdata holes"Bob Peterson1-6/+2
This reverts commit b2a846dbef4ef54ef032f0f5ee188c609a0278a7. That commit changed the behavior of function gfs2_block_map to return -ENODATA in cases where a hole (IOMAP_HOLE) is encountered and create is false. While that fixed the intended problem for jdata, it also broke other callers of gfs2_block_map such as some jdata block reads. Before the patch, an encountered hole would be skipped and the buffer seen as unmapped by the caller. The patch changed the behavior to return -ENODATA, which is interpreted as an error by the caller. The -ENODATA return code should be restricted to the specific case where jdata holes are encountered during ail1 writes. That will be done in a later patch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-12NFS: Remove unnecessary inode lock in nfs_fsync_dir()Trond Myklebust1-5/+1
nfs_inc_stats() is already thread-safe, and there are no other reasons to hold the inode lock here. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-12NFS: Remove unnecessary inode locking in nfs_llseek_dir()Trond Myklebust1-5/+4
Remove the contentious inode lock, and instead provide thread safety using the file->f_lock spinlock. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-12NFS: Fix listxattr receive buffer sizeChuck Lever1-2/+2
Certain NFSv4.2/RDMA tests fail with v5.9-rc1. rpcrdma_convert_kvec() runs off the end of the rl_segments array because rq_rcv_buf.tail[0].iov_len holds a very large positive value. The resultant kernel memory corruption is enough to crash the client system. Callers of rpc_prepare_reply_pages() must reserve an extra XDR_UNIT in the maximum decode size for a possible XDR pad of the contents of the xdr_buf's pages. That guarantees the allocated receive buffer will be large enough to accommodate the usual contents plus that XDR pad word. encode_op_hdr() cannot add that extra word. If it does, xdr_inline_pages() underruns the length of the tail iovec. Fixes: 3e1f02123fba ("NFSv4.2: add client side XDR handling for extended attributes") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-12NFSv4.2: fix failure to unregister shrinkerJ. Bruce Fields1-0/+2
We forgot to unregister the nfs4_xattr_large_entry_shrinker. That leaves the global list of shrinkers corrupted after unload of the nfs module, after which possibly unrelated code that calls register_shrinker() or unregister_shrinker() gets a BUG() with "supervisor write access in kernel mode". And similarly for the nfs4_xattr_large_entry_lru. Reported-by: Kris Karas <bugs-a17@moonlit-rail.com> Tested-By: Kris Karas <bugs-a17@moonlit-rail.com> Fixes: 95ad37f90c33 "NFSv4.2: add client side xattr caching." Signed-off-by: J. Bruce Fields <bfields@redhat.com> CC: stable@vger.kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-12gfs2: fix possible reference leak in gfs2_check_blk_typeZhang Qilong1-5/+5
In the fail path of gfs2_check_blk_type, forgetting to call gfs2_glock_dq_uninit will result in rgd_gh reference leak. Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-11fscrypt: fix inline encryption not used on new filesEric Biggers1-1/+1
The new helper function fscrypt_prepare_new_inode() runs before S_ENCRYPTED has been set on the new inode. This accidentally made fscrypt_select_encryption_impl() never enable inline encryption on newly created files, due to its use of fscrypt_needs_contents_encryption() which only returns true when S_ENCRYPTED is set. Fix this by using S_ISREG() directly instead of fscrypt_needs_contents_encryption(), analogous to what select_encryption_mode() does. I didn't notice this earlier because by design, the user-visible behavior is the same (other than performance, potentially) regardless of whether inline encryption is used or not. Fixes: a992b20cd4ee ("fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()") Reviewed-by: Satya Tangirala <satyat@google.com> Link: https://lore.kernel.org/r/20201111015224.303073-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-11-11Revert "ext4: fix superblock checksum calculation race"Theodore Ts'o1-11/+0
This reverts commit acaa532687cdc3a03757defafece9c27aa667546 which can result in a ext4_superblock_csum_set() trying to sleep while a spinlock is being held. For more discussion of this issue, please see: https://lore.kernel.org/r/000000000000f50cb705b313ed70@google.com Reported-by: syzbot+7a4ba6a239b91a126c28@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-11ext4: handle dax mount option collisionHarshad Shirwadkar1-3/+3
Mount options dax=inode and dax=never collided with fast_commit and journal checksum. Redefine the mount flags to remove the collision. Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Fixes: 9cb20f94afcd2 ("fs/ext4: Make DAX mount option a tri-state") Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201111183209.447175-1-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-11io_uring: round-up cq size before comparing with rounded sq sizeJens Axboe1-1/+1
If an application specifies IORING_SETUP_CQSIZE to set the CQ ring size to a specific size, we ensure that the CQ size is at least that of the SQ ring size. But in doing so, we compare the already rounded up to power of two SQ size to the as-of yet unrounded CQ size. This means that if an application passes in non power of two sizes, we can return -EINVAL when the final value would've been fine. As an example, an application passing in 100/100 for sq/cq size should end up with 128 for both. But since we round the SQ size first, we compare the CQ size of 100 to 128, and return -EINVAL as that is too small. Cc: stable@vger.kernel.org Fixes: 33a107f0a1b8 ("io_uring: allow application controlled CQ ring size") Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-11xfs: fix a missing unlock on error in xfs_fs_map_blocksChristoph Hellwig1-1/+1
We also need to drop the iolock when invalidate_inode_pages2 fails, not only on all other error or successful cases. Fixes: 527851124d10 ("xfs: implement pNFS export operations") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>