Age | Commit message (Collapse) | Author | Files | Lines |
|
For debugging purposes we often have to be able to query
additional information only available via SMB3 FSCTL
from the server from user space tools (e.g. like
cifs-utils's smbinfo). See MS-FSCC and MS-SMB2 protocol
specifications for more details.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2_set_sparse does not return -errno, it returns a boolean where
true means success.
Change this to just ignore the return value just like the other callsites.
Additionally add code to handle the case where we must set the file sparse
and possibly also extending it.
Fixes xfstests: generic/236 generic/350 generic/420
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
As Sergey Senozhatsky pointed out __constant_cpu_to_le32()
is misspelled in a few definitions in the list of status
codes smb2status.h as __constanst_cpu_to_le32()
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
|
|
This cleanup removes cifs specific code from SMB2/SMB3 code paths
which is cleaner and easier to maintain as the code to handle
special files is improved. Below is an example creating special files
using 'sfu' mount option over SMB3 to Windows (with this patch)
(Note that to Samba server, support for saving dos attributes
has to be enabled for the SFU mount option to work).
In the future this will also make implementation of creating
special files as reparse points easier (as Windows NFS server does
for example).
root@smf-Thinkpad-P51:~# stat -c "%F" /mnt2/char
character special file
root@smf-Thinkpad-P51:~# stat -c "%F" /mnt2/block
block special file
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
|
|
Also updated a comment describing use of the GlobalMid_Lock
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Detected by CoverityScan CID#1438719 ("Unused Value")
buf is reset again before being used so these two lines of code
are useless.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
|
|
The passthrough queries from user space tools like smbinfo can be either
SMB3 QUERY_INFO or SMB3 FSCTL, but we are not checking for the latter.
Temporarily we return EOPNOTSUPP for SMB3 FSCTL passthrough requests
but once compounding fsctls is fixed can enable.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
|
|
Can be helpful in debugging various xfstests that are currently
skipped or failing due to missing features in our current
implementation of fallocate.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
|
|
This allows fallocate -z to work against a Windows2016 share.
This is due to the SMB3 ZERO_RANGE command does not modify the filesize.
To address this we will now append a compounded SET-INFO to update the
end-of-file information.
This brings xfstests generic/469 closer to working against a windows share.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Define an _init() and a _free() function for SMB2_init so that we will
be able to use it with compounds.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Adds trace points for enter and exit (done vs. error) for:
compounded query and setinfo, hardlink, rename,
mkdir, rmdir, set_eof, delete (unlink)
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
|
|
When we open the shared root handle also ask for FILE_ALL_INFORMATION since
we can do this at zero cost as part of a compound.
Cache this information as long as the lease is held and return and serve any
future requests from cache.
This allows us to serve "stat /<mountpoint>" directly from cache and avoid
a network roundtrip. Since clients often want to do this quite a lot
this improve performance slightly.
As an example: xfstest generic/533 performs 43 stat operations on the root
of the share while it is run. Which are eliminated with this patch.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
It can be helpful for debugging. According to MS-FSCC:
"A 32-bit unsigned integer that contains the serial number of the
volume. The serial number is an opaque value generated by the file
system at format time"
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
Since we can now wait for multiple requests atomically in
wait_for_free_request() we can now greatly simplify the handling
of the credits in this function.
This fixes a potential deadlock where many concurrent compound requests
could each have reserved 1 or 2 credits each but are all blocked
waiting for the final credits they need to be able to issue the requests
to the server.
Set a default timeout of 60 seconds for compounded requests.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
To help debug credit starvation problems where we timeout
waiting for server to grant the client credits.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
When the server required encryption (but we didn't connect to it with the
"seal" mount option) we weren't displaying in /proc/fs/cifs/DebugData that
the tcon for that share was encrypted. Similarly we were not displaying
that signing was required when ses->sign was enabled (we only
checked ses->server->sign). This makes it easier to debug when in
fact the connection is signed (or sealed), whether for performance
or security questions.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
|
|
A negative timeout is the same as the current behaviour, i.e. no timeout.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
Reserve the last MAX_COMPOUND credits for any request asking for >1 credit.
This is to prevent future compound requests from becoming starved while waiting
for potentially many requests is there is a large number of concurrent
singe-credit requests.
However, we need to protect from servers that are very slow to hand out
new credits on new sessions so we only do this IFF there are 2*MAX_COMPOUND
(arbitrary) credits already in flight.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
Change wait_for_free_credits() to allow waiting for >=1 credits instead of just
a single credit.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
and compute timeout and optyp from it.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
Android uses pin_file for uncrypt during OTA, and that should be managed by
CAP_SYS_ADMIN only.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Architectures like ppc64 use the deposited page table to store hardware
page table slot information. Make sure we deposit a page table when
using zero page at the pmd level for hash.
Without this we hit
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000082a74
Oops: Kernel access of bad area, sig: 11 [#1]
....
NIP [c000000000082a74] __hash_page_thp+0x224/0x5b0
LR [c0000000000829a4] __hash_page_thp+0x154/0x5b0
Call Trace:
hash_page_mm+0x43c/0x740
do_hash_page+0x2c/0x3c
copy_from_iter_flushcache+0xa4/0x4a0
pmem_copy_from_iter+0x2c/0x50 [nd_pmem]
dax_copy_from_iter+0x40/0x70
dax_iomap_actor+0x134/0x360
iomap_apply+0xfc/0x1b0
dax_iomap_rw+0xac/0x130
ext4_file_write_iter+0x254/0x460 [ext4]
__vfs_write+0x120/0x1e0
vfs_write+0xd8/0x220
SyS_write+0x6c/0x110
system_call+0x3c/0x130
Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Cc: <stable@vger.kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull filesystem-dax updates from Dan Williams:
- Fix handling of PMD-sized entries in the Xarray that lead to a crash
scenario
- Miscellaneous cleanups and small fixes
* tag 'fsdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Flush partial PMDs correctly
fs/dax: NIT fix comment regarding start/end vs range
fs/dax: Convert to use vmf_error()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull UBI and UBIFS updates from Richard Weinberger:
- A new interface for UBI to deal better with read disturb
- Reject unsupported ioctl flags in UBIFS (xfstests found it)
* tag 'upstream-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubi: wl: Silence uninitialized variable warning
ubifs: Reject unsupported ioctl flags explicitly
ubi: Expose the bitrot interface
ubi: Introduce in_pq()
|
|
As readahead is an optimization, all errors are usually filtered out,
but still properly handled when the real read call is done. The commit
5e9d398240b2 ("btrfs: readpages() should submit IO as read-ahead") added
REQ_RAHEAD to readpages() because that's only used for readahead
(despite what one would expect from the callback name).
This causes a flood of messages and inflated read error stats, so skip
reporting in case it's readahead.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202403
Reported-by: LimeTech <tomm@lime-technology.com>
Fixes: 5e9d398240b2 ("btrfs: readpages() should submit IO as read-ahead")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When we are mixing buffered writes with direct IO writes against the same
file and snapshotting is happening concurrently, we can end up with a
corrupt file content in the snapshot. Example:
1) Inode/file is empty.
2) Snapshotting starts.
2) Buffered write at offset 0 length 256Kb. This updates the i_size of the
inode to 256Kb, disk_i_size remains zero. This happens after the task
doing the snapshot flushes all existing delalloc.
3) DIO write at offset 256Kb length 768Kb. Once the ordered extent
completes it sets the inode's disk_i_size to 1Mb (256Kb + 768Kb) and
updates the inode item in the fs tree with a size of 1Mb (which is
the value of disk_i_size).
4) The dealloc for the range [0, 256Kb[ did not start yet.
5) The transaction used in the DIO ordered extent completion, which updated
the inode item, is committed by the snapshotting task.
6) Snapshot creation completes.
7) Dealloc for the range [0, 256Kb[ is flushed.
After that when reading the file from the snapshot we always get zeroes for
the range [0, 256Kb[, the file has a size of 1Mb and the data written by
the direct IO write is found. From an application's point of view this is
a corruption, since in the source subvolume it could never read a version
of the file that included the data from the direct IO write without the
data from the buffered write included as well. In the snapshot's tree,
file extent items are missing for the range [0, 256Kb[.
The issue, obviously, does not happen when using the -o flushoncommit
mount option.
Fix this by flushing delalloc for all the roots that are about to be
snapshotted when committing a transaction. This guarantees total ordering
when updating the disk_i_size of an inode since the flush for dealloc is
done when a transaction is in the TRANS_STATE_COMMIT_START state and wait
is done once no more external writers exist. This is similar to what we
do when using the flushoncommit mount option, but we do it only if the
transaction has snapshots to create and only for the roots of the
subvolumes to be snapshotted. The bulk of the dealloc is flushed in the
snapshot creation ioctl, so the flush work we do inside the transaction
is minimized.
This issue, involving buffered and direct IO writes with snapshotting, is
often triggered by fstest btrfs/078, and got reported by fsck when not
using the NO_HOLES features, for example:
$ cat results/btrfs/078.full
(...)
_check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
*** fsck.btrfs output ***
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
root 258 inode 264 errors 100, file extent discount
Found file extent holes:
start: 524288, len: 65536
ERROR: errors found in fs roots
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When Filipe added the recursive directory logging stuff in
2f2ff0ee5e430 ("Btrfs: fix metadata inconsistencies after directory
fsync") he specifically didn't take the directory i_mutex for the
children directories that we need to log because of lockdep. This is
generally fine, but can lead to this WARN_ON() tripping if we happen to
run delayed deletion's in between our first search and our second search
of dir_item/dir_indexes for this directory. We expect this to happen,
so the WARN_ON() isn't necessary. Drop the WARN_ON() and add a comment
so we know why this case can happen.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If we do a shrinking truncate against an inode which is already present
in the respective log tree and then rename it, as part of logging the new
name we end up logging an inode item that reflects the old size of the
file (the one which we previously logged) and not the new smaller size.
The decision to preserve the size previously logged was added by commit
1a4bcf470c886b ("Btrfs: fix fsync data loss after adding hard link to
inode") in order to avoid data loss after replaying the log. However that
decision is only needed for the case the logged inode size is smaller then
the current size of the inode, as explained in that commit's change log.
If the current size of the inode is smaller then the previously logged
size, we know a shrinking truncate happened and therefore need to use
that smaller size.
Example to trigger the problem:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ xfs_io -f -c "pwrite -S 0xab 0 8000" /mnt/foo
$ xfs_io -c "fsync" /mnt/foo
$ xfs_io -c "truncate 3000" /mnt/foo
$ mv /mnt/foo /mnt/bar
$ xfs_io -c "fsync" /mnt/bar
<power failure>
$ mount /dev/sdb /mnt
$ od -t x1 -A d /mnt/bar
0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
*
0008000
Once we rename the file, we log its name (and inode item), and because
the inode was already logged before in the current transaction, we log it
with a size of 8000 bytes because that is the size we previously logged
(with the first fsync). As part of the rename, besides logging the inode,
we do also sync the log, which is done since commit d4682ba03ef618
("Btrfs: sync log after logging new name"), so the next fsync against our
inode is effectively a no-op, since no new changes happened since the
rename operation. Even if did not sync the log during the rename
operation, the same problem (fize size of 8000 bytes instead of 3000
bytes) would be visible after replaying the log if the log ended up
getting synced to disk through some other means, such as for example by
fsyncing some other modified file. In the example above the fsync after
the rename operation is there just because not every filesystem may
guarantee logging/journalling the inode (and syncing the log/journal)
during the rename operation, for example it is needed for f2fs, but not
for ext4 and xfs.
Fix this scenario by, when logging a new name (which is triggered by
rename and link operations), using the current size of the inode instead
of the previously logged inode size.
A test case for fstests follows soon.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202695
CC: stable@vger.kernel.org # 4.4+
Reported-by: Seulbae Kim <seulbae@gatech.edu>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
As Jiqun Li reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202883
sometimes, dead lock when make system call SYS_getdents64 with fsync() is
called by another process.
monkey running on android9.0
1. task 9785 held sbi->cp_rwsem and waiting lock_page()
2. task 10349 held mm_sem and waiting sbi->cp_rwsem
3. task 9709 held lock_page() and waiting mm_sem
so this is a dead lock scenario.
task stack is show by crash tools as following
crash_arm64> bt ffffffc03c354080
PID: 9785 TASK: ffffffc03c354080 CPU: 1 COMMAND: "RxIoScheduler-3"
>> #7 [ffffffc01b50fac0] __lock_page at ffffff80081b11e8
crash-arm64> bt 10349
PID: 10349 TASK: ffffffc018b83080 CPU: 1 COMMAND: "BUGLY_ASYNC_UPL"
>> #3 [ffffffc01f8cfa40] rwsem_down_read_failed at ffffff8008a93afc
PC: 00000033 LR: 00000000 SP: 00000000 PSTATE: ffffffffffffffff
crash-arm64> bt 9709
PID: 9709 TASK: ffffffc03e7f3080 CPU: 1 COMMAND: "IntentService[A"
>> #3 [ffffffc001e67850] rwsem_down_read_failed at ffffff8008a93afc
>> #8 [ffffffc001e67b80] el1_ia at ffffff8008084fc4
PC: ffffff8008274114 [compat_filldir64+120]
LR: ffffff80083584d4 [f2fs_fill_dentries+448]
SP: ffffffc001e67b80 PSTATE: 80400145
X29: ffffffc001e67b80 X28: 0000000000000000 X27: 000000000000001a
X26: 00000000000093d7 X25: ffffffc070d52480 X24: 0000000000000008
X23: 0000000000000028 X22: 00000000d43dfd60 X21: ffffffc001e67e90
X20: 0000000000000011 X19: ffffff80093a4000 X18: 0000000000000000
X17: 0000000000000000 X16: 0000000000000000 X15: 0000000000000000
X14: ffffffffffffffff X13: 0000000000000008 X12: 0101010101010101
X11: 7f7f7f7f7f7f7f7f X10: 6a6a6a6a6a6a6a6a X9: 7f7f7f7f7f7f7f7f
X8: 0000000080808000 X7: ffffff800827409c X6: 0000000080808000
X5: 0000000000000008 X4: 00000000000093d7 X3: 000000000000001a
X2: 0000000000000011 X1: ffffffc070d52480 X0: 0000000000800238
>> #9 [ffffffc001e67be0] f2fs_fill_dentries at ffffff80083584d0
PC: 0000003c LR: 00000000 SP: 00000000 PSTATE: 000000d9
X12: f48a02ff X11: d4678960 X10: d43dfc00 X9: d4678ae4
X8: 00000058 X7: d4678994 X6: d43de800 X5: 000000d9
X4: d43dfc0c X3: d43dfc10 X2: d46799c8 X1: 00000000
X0: 00001068
Below potential deadlock will happen between three threads:
Thread A Thread B Thread C
- f2fs_do_sync_file
- f2fs_write_checkpoint
- down_write(&sbi->node_change) -- 1)
- do_page_fault
- down_write(&mm->mmap_sem) -- 2)
- do_wp_page
- f2fs_vm_page_mkwrite
- getdents64
- f2fs_read_inline_dir
- lock_page -- 3)
- f2fs_sync_node_pages
- lock_page -- 3)
- __do_map_lock
- down_read(&sbi->node_change) -- 1)
- f2fs_fill_dentries
- dir_emit
- compat_filldir64
- do_page_fault
- down_read(&mm->mmap_sem) -- 2)
Since f2fs_readdir is protected by inode.i_rwsem, there should not be
any updates in inode page, we're safe to lookup dents in inode page
without its lock held, so taking off the lock to improve concurrency
of readdir and avoid potential deadlock.
Reported-by: Jiqun Li <jiqun.li@unisoc.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
With below testcase, we will fail to find existed xattr entry:
1. mkfs.f2fs -O extra_attr -O flexible_inline_xattr /dev/zram0
2. mount -t f2fs -o inline_xattr_size=1 /dev/zram0 /mnt/f2fs/
3. touch /mnt/f2fs/file
4. setfattr -n "user.name" -v 0 /mnt/f2fs/file
5. getfattr -n "user.name" /mnt/f2fs/file
/mnt/f2fs/file: user.name: No such attribute
The reason is for inode which has very small inline xattr size,
__find_inline_xattr() will fail to traverse any entry due to first
entry may not be loaded from xattr node yet, later, we may skip to
check entire xattr datas in __find_xattr(), result in such wrong
condition.
This patch adds condition to check such case to avoid this issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As Paul Bandha reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202709
When I run the poc on the mounted f2fs img I get a buffer overflow in
read_inline_xattr due to there being no sanity check on the value of
i_inline_xattr_size.
I created the img by just modifying the value of i_inline_xattr_size
in the inode:
i_name [test1.txt]
i_ext: fofs:0 blkaddr:0 len:0
i_extra_isize [0x 18 : 24]
i_inline_xattr_size [0x ffff : 65535]
i_addr[ofs] [0x 0 : 0]
mkdir /mnt/f2fs
mount ./f2fs1.img /mnt/f2fs
gcc poc.c -o poc
./poc
int main() {
int y = syscall(SYS_listxattr, "/mnt/f2fs/test1.txt", NULL, 0);
printf("ret %d", y);
printf("errno: %d\n", errno);
}
BUG: KASAN: slab-out-of-bounds in read_inline_xattr+0x18f/0x260
Read of size 262140 at addr ffff88011035efd8 by task f2fs1poc/3263
CPU: 0 PID: 3263 Comm: f2fs1poc Not tainted 4.18.0-custom #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x71/0xab
print_address_description+0x83/0x250
kasan_report+0x213/0x350
memcpy+0x1f/0x50
read_inline_xattr+0x18f/0x260
read_all_xattrs+0xba/0x190
f2fs_listxattr+0x9d/0x3f0
listxattr+0xb2/0xd0
path_listxattr+0x93/0xe0
do_syscall_64+0x9d/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Let's add sanity check for inode.i_inline_xattr_size during f2fs_iget()
to avoid this issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds some kernel messages when user sets wrong inline_xattr_size.
Fixes: 500e0b28ecd3 ("f2fs: fix to check inline_xattr_size boundary correctly")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_mpage_readpages(), if page is beyond EOF, we should just
zero out it, but previously, before checking previous mapping
info, we missed to check filesize boundary, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As Gao Xiang reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202749
f2fs may skip pageout() due to incorrect page reference count.
The problem here is that MM defined the rule [1] very clearly that
once page was set with PG_private flag, we should increment the
refcount in that page, also main flows like pageout(), migrate_page()
will assume there is one additional page reference count if
page_has_private() returns true.
But currently, f2fs won't add/del refcount when changing PG_private
flag. Anyway, f2fs should follow MM's rule to make MM's related flows
running as expected.
[1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com/
Reported-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Since 8c242db9b8c0 ("f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer"),
we've started to not skip clear private flag for atomic_write page
truncation, so removing old wrong comment in f2fs_invalidate_page().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As Jiqun Li reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202747
System can panic due to using wrong allocate/free function pair
in xattr interface:
- use kvmalloc to allocate memory
- use kzfree to free memory
Let's fix to use kvfree instead of kzfree, BTW, we are safe to
get rid of kzfree, since there is no such confidential data stored
as xattr, we don't need to zero it before free memory.
Fixes: 5222595d093e ("f2fs: use kvmalloc, if kmalloc is failed")
Reported-by: Jiqun Li <jiqun.li@unisoc.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch supports to trace f2fs_ioc_shutdown.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Thread A Thread B
- __fput
- f2fs_release_file
- drop_inmem_pages
- mutex_lock(&fi->inmem_lock)
- __revoke_inmem_pages
- lock_page(page)
- open
- f2fs_setattr
- truncate_setsize
- truncate_inode_pages_range
- lock_page(page)
- truncate_cleanup_page
- f2fs_invalidate_page
- drop_inmem_page
- mutex_lock(&fi->inmem_lock);
We may encounter above ABBA deadlock as reported by Kyungtae Kim:
I'm reporting a bug in linux-4.17.19: "INFO: task hung in
drop_inmem_page" (no reproducer)
I think this might be somehow related to the following:
https://groups.google.com/forum/#!searchin/syzkaller-bugs/INFO$3A$20task$20hung$20in$20%7Csort:date/syzkaller-bugs/c6soBTrdaIo/AjAzPeIzCgAJ
=========================================
INFO: task syz-executor7:10822 blocked for more than 120 seconds.
Not tainted 4.17.19 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor7 D27024 10822 6346 0x00000004
Call Trace:
context_switch kernel/sched/core.c:2867 [inline]
__schedule+0x721/0x1e60 kernel/sched/core.c:3515
schedule+0x88/0x1c0 kernel/sched/core.c:3559
schedule_preempt_disabled+0x18/0x30 kernel/sched/core.c:3617
__mutex_lock_common kernel/locking/mutex.c:833 [inline]
__mutex_lock+0x5bd/0x1410 kernel/locking/mutex.c:893
mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:908
drop_inmem_page+0xcb/0x810 fs/f2fs/segment.c:327
f2fs_invalidate_page+0x337/0x5e0 fs/f2fs/data.c:2401
do_invalidatepage mm/truncate.c:165 [inline]
truncate_cleanup_page+0x261/0x330 mm/truncate.c:187
truncate_inode_pages_range+0x552/0x1610 mm/truncate.c:367
truncate_inode_pages mm/truncate.c:478 [inline]
truncate_pagecache+0x6d/0x90 mm/truncate.c:801
truncate_setsize+0x81/0xa0 mm/truncate.c:826
f2fs_setattr+0x44f/0x1270 fs/f2fs/file.c:781
notify_change+0xa62/0xe80 fs/attr.c:313
do_truncate+0x12e/0x1e0 fs/open.c:63
do_last fs/namei.c:2955 [inline]
path_openat+0x2042/0x29f0 fs/namei.c:3505
do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
do_sys_open+0x35e/0x4e0 fs/open.c:1101
__do_sys_open fs/open.c:1119 [inline]
__se_sys_open fs/open.c:1114 [inline]
__x64_sys_open+0x89/0xc0 fs/open.c:1114
do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
RSP: 002b:00007f734e459c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: ffffffffffffffda RBX: 00007f734e45a6cc RCX: 00000000004497b9
RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e45a700
INFO: task syz-executor7:10858 blocked for more than 120 seconds.
Not tainted 4.17.19 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor7 D28880 10858 6346 0x00000004
Call Trace:
context_switch kernel/sched/core.c:2867 [inline]
__schedule+0x721/0x1e60 kernel/sched/core.c:3515
schedule+0x88/0x1c0 kernel/sched/core.c:3559
__rwsem_down_write_failed_common kernel/locking/rwsem-xadd.c:565 [inline]
rwsem_down_write_failed+0x5e6/0xc90 kernel/locking/rwsem-xadd.c:594
call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
__down_write arch/x86/include/asm/rwsem.h:142 [inline]
down_write+0x58/0xa0 kernel/locking/rwsem.c:72
inode_lock include/linux/fs.h:713 [inline]
do_truncate+0x120/0x1e0 fs/open.c:61
do_last fs/namei.c:2955 [inline]
path_openat+0x2042/0x29f0 fs/namei.c:3505
do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
do_sys_open+0x35e/0x4e0 fs/open.c:1101
__do_sys_open fs/open.c:1119 [inline]
__se_sys_open fs/open.c:1114 [inline]
__x64_sys_open+0x89/0xc0 fs/open.c:1114
do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
RSP: 002b:00007f734e3b4c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: ffffffffffffffda RBX: 00007f734e3b56cc RCX: 00000000004497b9
RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
RBP: 000000000071c238 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e3b5700
INFO: task syz-executor5:10829 blocked for more than 120 seconds.
Not tainted 4.17.19 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor5 D28760 10829 6308 0x80000002
Call Trace:
context_switch kernel/sched/core.c:2867 [inline]
__schedule+0x721/0x1e60 kernel/sched/core.c:3515
schedule+0x88/0x1c0 kernel/sched/core.c:3559
io_schedule+0x21/0x80 kernel/sched/core.c:5179
wait_on_page_bit_common mm/filemap.c:1100 [inline]
__lock_page+0x2b5/0x390 mm/filemap.c:1273
lock_page include/linux/pagemap.h:483 [inline]
__revoke_inmem_pages+0xb35/0x11c0 fs/f2fs/segment.c:231
drop_inmem_pages+0xa3/0x3e0 fs/f2fs/segment.c:306
f2fs_release_file+0x2c7/0x330 fs/f2fs/file.c:1556
__fput+0x2c7/0x780 fs/file_table.c:209
____fput+0x1a/0x20 fs/file_table.c:243
task_work_run+0x151/0x1d0 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x8ba/0x30a0 kernel/exit.c:865
do_group_exit+0x13b/0x3a0 kernel/exit.c:968
get_signal+0x6bb/0x1650 kernel/signal.c:2482
do_signal+0x84/0x1b70 arch/x86/kernel/signal.c:810
exit_to_usermode_loop+0x155/0x190 arch/x86/entry/common.c:162
prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
do_syscall_64+0x445/0x4e0 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
RSP: 002b:00007f1c68e74ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 000000000071bf80 RCX: 00000000004497b9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000071bf80
RBP: 000000000071bf80 R08: 0000000000000000 R09: 000000000071bf58
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f1c68e759c0 R15: 00007f1c68e75700
This patch tries to use trylock_page to mitigate such deadlock condition
for fix.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As Seulbae Kim reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202637
We didn't recover permission field correctly after sudden power-cut,
the reason is in setattr we didn't add inode into global dirty list
once i_mode is changed, so latter checkpoint triggered by fsync will
not flush last i_mode into disk, result in this problem, fix it.
Reported-by: Seulbae Kim <seulbae@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This follows to give random number to i_generation along with commit
232530680290b ("ext4: improve smp scalability for inode generation")
This can be used for DUN for UFS HW encryption.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
VFS will take inode_lock for readdir, therefore no need to
take page lock in readdir at all just as the majority of
other generic filesystems.
This patch improves concurrency since .iterate_shared
was introduced to VFS years ago.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In error path of IPU, we didn't account iostat correctly, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
For IPU path of f2fs_do_write_data_page(), in its error path, we
need to release encrypted page and fscrypt context, otherwise it
will cause memory leak.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch changes to allow failure of f2fs_bio_alloc() in
__submit_flush_wait(), which can simulate flush error in checkpoint()
for covering more error paths.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
With current retry mechanism in f2fs_fill_super, first fill_super
fails due to no memory, then second fill_super runs w/o recovery,
if we succeed, we may lose fsynced data, it doesn't make sense.
Let's retry fill_super only if it occurs non-ENOMEM error during
recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Note that __GFP_ZERO is not supported for mempool_alloc,
which also documented in the mempool_alloc comments.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We have to cover whole headerfile with last #endif.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Pull NFS server updates from Bruce Fields:
"Miscellaneous NFS server fixes.
Probably the most visible bug is one that could artificially limit
NFSv4.1 performance by limiting the number of oustanding rpcs from a
single client.
Neil Brown also gets a special mention for fixing a 14.5-year-old
memory-corruption bug in the encoding of NFSv3 readdir responses"
* tag 'nfsd-5.1' of git://linux-nfs.org/~bfields/linux:
nfsd: allow nfsv3 readdir request to be larger.
nfsd: fix wrong check in write_v4_end_grace()
nfsd: fix memory corruption caused by readdir
nfsd: fix performance-limiting session calculation
svcrpc: fix UDP on servers with lots of threads
svcrdma: Remove syslog warnings in work completion handlers
svcrdma: Squelch compiler warning when SUNRPC_DEBUG is disabled
svcrdma: Use struct_size() in kmalloc()
svcrpc: fix unlikely races preventing queueing of sockets
svcrpc: svc_xprt_has_something_to_do seems a little long
SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
nfsd: fix an IS_ERR() vs NULL check
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"A large number of bug fixes and cleanups.
One new feature to allow users to more easily find the jbd2 journal
thread for a particular ext4 file system"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
jbd2: jbd2_get_transaction does not need to return a value
jbd2: fix invalid descriptor block checksum
ext4: fix bigalloc cluster freeing when hole punching under load
ext4: add sysfs attr /sys/fs/ext4/<disk>/journal_task
ext4: Change debugging support help prefix from EXT4 to Ext4
ext4: fix compile error when using BUFFER_TRACE
jbd2: fix compile warning when using JBUFFER_TRACE
ext4: fix some error pointer dereferences
ext4: annotate more implicit fall throughs
ext4: annotate implicit fall throughs
ext4: don't update s_rev_level if not required
jbd2: fold jbd2_superblock_csum_{verify,set} into their callers
jbd2: fix race when writing superblock
ext4: fix crash during online resizing
ext4: disallow files with EXT4_JOURNAL_DATA_FL from EXT4_IOC_SWAP_BOOT
ext4: add mask of ext4 flags to swap
ext4: update quota information while swapping boot loader inode
ext4: cleanup pagecache before swap i_data
ext4: fix check of inode in swap_inode_boot_loader
ext4: unlock unused_pages timely when doing writeback
...
|