Age | Commit message (Collapse) | Author | Files | Lines |
|
Every now and then, I see the following hang during mount time
quotacheck when running fstests. Turning on KASAN seems to make it
happen somewhat more frequently. I've edited the backtrace for brevity.
XFS (sdd): Quotacheck needed: Please wait.
XFS: Assertion failed: bp->b_flags & _XBF_DELWRI_Q, file: fs/xfs/xfs_buf.c, line: 2411
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1831409 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
CPU: 0 PID: 1831409 Comm: mount Tainted: G W 5.19.0-rc6-xfsx #rc6 09911566947b9f737b036b4af85e399e4b9aef64
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:assfail+0x46/0x4a [xfs]
Code: a0 8f 41 a0 e8 45 fe ff ff 8a 1d 2c 36 10 00 80 fb 01 76 0f 0f b6 f3 48 c7 c7 c0 f0 4f a0 e8 10 f0 02 e1 80 e3 01 74 02 0f 0b <0f> 0b 5b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24
RSP: 0018:ffffc900078c7b30 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880099ac000 RCX: 000000007fffffff
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0418fa0
RBP: ffff8880197bc1c0 R08: 0000000000000000 R09: 000000000000000a
R10: 000000000000000a R11: f000000000000000 R12: ffffc900078c7d20
R13: 00000000fffffff5 R14: ffffc900078c7d20 R15: 0000000000000000
FS: 00007f0449903800(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005610ada631f0 CR3: 0000000014dd8002 CR4: 00000000001706f0
Call Trace:
<TASK>
xfs_buf_delwri_pushbuf+0x150/0x160 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
xfs_qm_flush_one+0xd6/0x130 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
xfs_qm_dquot_walk.isra.0+0x109/0x1e0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
xfs_qm_quotacheck+0x319/0x490 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
xfs_qm_mount_quotas+0x65/0x2c0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
xfs_mountfs+0x6b5/0xab0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
xfs_fs_fill_super+0x781/0x990 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
get_tree_bdev+0x175/0x280
vfs_get_tree+0x1a/0x80
path_mount+0x6f5/0xaa0
__x64_sys_mount+0x103/0x140
do_syscall_64+0x2b/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
I /think/ this can happen if xfs_qm_flush_one is racing with
xfs_qm_dquot_isolate (i.e. dquot reclaim) when the second function has
taken the dquot flush lock but xfs_qm_dqflush hasn't yet locked the
dquot buffer, let alone queued it to the delwri list. In this case,
flush_one will fail to get the dquot flush lock, but it can lock the
incore buffer, but xfs_buf_delwri_pushbuf will then trip over this
ASSERT, which checks that the buffer isn't on a delwri list. The hang
results because the _delwri_submit_buffers ignores non DELWRI_Q buffers,
which means that xfs_buf_iowait waits forever for an IO that has not yet
been scheduled.
AFAICT, a reasonable solution here is to detect a dquot buffer that is
not on a DELWRI list, drop it, and return -EAGAIN to try the flush
again. It's not /that/ big of a deal if quotacheck writes the dquot
buffer repeatedly before we even set QUOTA_CHKD.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
If a blkdev_issue_flush fails, fsync needs to report that to upper
levels. Modify xfs_file_fsync to capture the errors, while trying to
flush as much data and log updates to disk as possible.
If log writes cannot flush the data device, we need to shut down the log
immediately because we've violated a log invariant. Modify this code to
check the return value of blkdev_issue_flush as well.
This behavior seems to go back to about 2.6.15 or so, which makes this
fixes tag a bit misleading.
Link: https://elixir.bootlin.com/linux/v2.6.15/source/fs/xfs/xfs_vnodeops.c#L1187
Fixes: b5071ada510a ("xfs: remove xfs_blkdev_issue_flush")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Most of the MM queue. A few things are still pending.
Liam's maple tree rework didn't make it. This has resulted in a few
other minor patch series being held over for next time.
Multi-gen LRU still isn't merged as we were waiting for mapletree to
stabilize. The current plan is to merge MGLRU into -mm soon and to
later reintroduce mapletree, with a view to hopefully getting both
into 6.1-rc1.
Summary:
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve
latency and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place"
[ XFS merge from hell as per Darrick Wong in
https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]
* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
tools/testing/selftests/vm/hmm-tests.c: fix build
mm: Kconfig: fix typo
mm: memory-failure: convert to pr_fmt()
mm: use is_zone_movable_page() helper
hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
hugetlbfs: cleanup some comments in inode.c
hugetlbfs: remove unneeded header file
hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
hugetlbfs: use helper macro SZ_1{K,M}
mm: cleanup is_highmem()
mm/hmm: add a test for cross device private faults
selftests: add soft-dirty into run_vmtests.sh
selftests: soft-dirty: add test for mprotect
mm/mprotect: fix soft-dirty check in can_change_pte_writable()
mm: memcontrol: fix potential oom_lock recursion deadlock
mm/gup.c: fix formatting in check_and_migrate_movable_page()
xfs: fail dax mount if reflink is enabled on a partition
mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
userfaultfd: don't fail on unrecognized features
hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
...
|
|
|
|
To 2.38
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
It is only used in transport.c.
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Rename generic mid functions to same style, i.e. without "cifs_"
prefix.
cifs_{init,destroy}_mids() -> {init,destroy}_mids()
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
DeleteMidQEntry() was just a proxy for cifs_mid_q_entry_release().
- remove DeleteMidQEntry()
- rename cifs_mid_q_entry_release() to release_mid()
- rename kref_put() callback _cifs_mid_q_entry_release to __release_mid
- rename AllocMidQEntry() to alloc_mid()
- rename cifs_delete_mid() to delete_mid()
Update callers to use new names.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Currently much of the smb1 code is built even when
CONFIG_CIFS_ALLOW_INSECURE_LEGACY is disabled.
Move cifssmb.c to only be compiled when insecure legacy is disabled,
and move various SMB1/CIFS helper functions to that ifdef. Some
functions that were not SMB1/CIFS specific needed to be moved out of
cifssmb.c
This shrinks cifs.ko by more than 10% which is good - but also will
help with the eventual movement of the legacy code to a distinct
module. Follow on patches can shrink the number of ifdefs by
code restructuring where smb1 code is wedged in functions that
should be calling dialect specific helper functions instead,
and also by moving some functions from file.c/dir.c/inode.c into
smb1 specific c files.
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Since pvec have 15 pages, it not a multiple of 4, when write compressed
pages, write in 64K as a unit, it will call pagevec_lookup_range_tag
agagin, sometimes this will take a lot of time.
Use onstack pages instead of pvec to mitigate this problem.
Signed-off-by: Fengnan Chang <fengnanchang@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When write total cluster, all pages is uptodate, there is not need to call
f2fs_prepare_compress_overwrite, intorduce f2fs_all_cluster_page_ready
to avoid this.
Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs_abort_atomic_write() has checked whether current inode is
atomic_write one or not, it's redundant to check in its caller,
remove it for cleanup.
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Now decompression is being handled in workqueue and it makes read I/O
latency non-deterministic, because of the non-deterministic scheduling
nature of workqueues. So, I made it handled in softirq context only if
possible, not in low memory devices, since this modification will
maintain decompresion related memory a little longer.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If a file has FI_COMPRESS_RELEASED, all writes for it should not be
allowed. However, as of now, in case of compress_mode=user, writes
triggered by IOCTLs like F2FS_IOC_DE/COMPRESS_FILE are allowed unexpectly,
which could crash that file.
To fix it, let's do not allow F2FS_IOC_DE/COMPRESS_IOCTL if a file already
has FI_COMPRESS_RELEASED flag.
This is the reproduction process:
1. $ touch ./file
2. $ chattr +c ./file
3. $ dd if=/dev/random of=./file bs=4096 count=30 conv=notrunc
4. $ dd if=/dev/zero of=./file bs=4096 count=34 seek=30 conv=notrunc
5. $ sync
6. $ do_compress ./file ; call F2FS_IOC_COMPRESS_FILE
7. $ get_compr_blocks ./file ; call F2FS_IOC_GET_COMPRESS_BLOCKS
8. $ release ./file ; call F2FS_IOC_RELEASE_COMPRESS_BLOCKS
9. $ do_compress ./file ; call F2FS_IOC_COMPRESS_FILE again
10. $ get_compr_blocks ./file ; call F2FS_IOC_GET_COMPRESS_BLOCKS again
This reproduction process is tested in 128kb cluster size.
You can find compr_blocks has a negative value.
Fixes: 5fdb322ff2c2b ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE")
Signed-off-by: Junbeom Yeom <junbeom.yeom@samsung.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Jaewook Kim <jw5454.kim@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If kernel doesn't have CONFIG_F2FS_FS_COMPRESSION, a file having FS_COMPR_FL via
ioctl(FS_IOC_SETFLAGS) is unaccessible due to f2fs_is_compress_backend_ready().
Let's avoid it.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
To ensure serialized IOs, f2fs allows only LFS mode for zoned
device. Remove redundant check for direct IO.
Signed-off-by: Eunhee Rho <eunhee83.rho@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
There is issue as follows when test f2fs atomic write:
F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock
F2FS-fs (loop0): invalid crc_offset: 0
F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=1, run fsck to fix.
F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=2, run fsck to fix.
==================================================================
BUG: KASAN: null-ptr-deref in f2fs_get_dnode_of_data+0xac/0x16d0
Read of size 8 at addr 0000000000000028 by task rep/1990
CPU: 4 PID: 1990 Comm: rep Not tainted 5.19.0-rc6-next-20220715 #266
Call Trace:
<TASK>
dump_stack_lvl+0x6e/0x91
print_report.cold+0x49a/0x6bb
kasan_report+0xa8/0x130
f2fs_get_dnode_of_data+0xac/0x16d0
f2fs_do_write_data_page+0x2a5/0x1030
move_data_page+0x3c5/0xdf0
do_garbage_collect+0x2015/0x36c0
f2fs_gc+0x554/0x1d30
f2fs_balance_fs+0x7f5/0xda0
f2fs_write_single_data_page+0xb66/0xdc0
f2fs_write_cache_pages+0x716/0x1420
f2fs_write_data_pages+0x84f/0x9a0
do_writepages+0x130/0x3a0
filemap_fdatawrite_wbc+0x87/0xa0
file_write_and_wait_range+0x157/0x1c0
f2fs_do_sync_file+0x206/0x12d0
f2fs_sync_file+0x99/0xc0
vfs_fsync_range+0x75/0x140
f2fs_file_write_iter+0xd7b/0x1850
vfs_write+0x645/0x780
ksys_write+0xf1/0x1e0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
As 3db1de0e582c commit changed atomic write way which new a cow_inode for
atomic write file, and also mark cow_inode as FI_ATOMIC_FILE.
When f2fs_do_write_data_page write cow_inode will use cow_inode's cow_inode
which is NULL. Then will trigger null-ptr-deref.
To solve above issue, introduce FI_COW_FILE flag for COW inode.
Fiexes: 3db1de0e582c("f2fs: change the current atomic write way")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
F2FS_IOC_ABORT_VOLATILE_WRITE was used to abort a atomic write before.
However it was removed accidentally. So revive it by changing the name,
since volatile write had gone.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Fiexes: 7bc155fec5b3("f2fs: kill volatile write support")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Pull xfs updates from Darrick Wong:
"The biggest changes for this release are the log scalability
improvements, lockless lookups for the buffer cache, and making the
attr fork a permanent part of the incore inode in preparation for
directory parent pointers.
There's also a bunch of bug fixes that have accumulated since -rc5. I
might send you a second pull request with some more bug fixes that I'm
still working on.
Once the merge window ends, I will hand maintainership back to Dave
Chinner until the 6.1-rc1 release so that I can conduct the design
review for the online fsck feature, and try to get it merged.
Summary:
- Improve scalability of the XFS log by removing spinlocks and global
synchronization points.
- Add security labels to whiteout inodes to match the other
filesystems.
- Clean up per-ag pointer passing to simplify call sites.
- Reduce verifier overhead by precalculating more AG geometry.
- Implement fast-path lockless lookups in the buffer cache to reduce
spinlock hammering.
- Make attr forks a permanent part of the inode structure to fix a
UAF bug and because most files these days tend to have security
labels and soon will have parent pointers too.
- Clean up XFS_IFORK_Q usage and give it a better name.
- Fix more UAF bugs in the xattr code.
- SOB my tags.
- Fix some typos in the timestamp range documentation.
- Fix a few more memory leaks.
- Code cleanups and typo fixes.
- Fix an unlocked inode fork pointer access in getbmap"
* tag 'xfs-5.20-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (61 commits)
xfs: delete extra space and tab in blank line
xfs: fix NULL pointer dereference in xfs_getbmap()
xfs: Fix typo 'the the' in comment
xfs: Fix comment typo
xfs: don't leak memory when attr fork loading fails
xfs: fix for variable set but not used warning
xfs: xfs_buf cache destroy isn't RCU safe
xfs: delete unnecessary NULL checks
xfs: fix comment for start time value of inode with bigtime enabled
xfs: fix use-after-free in xattr node block inactivation
xfs: lockless buffer lookup
xfs: remove a superflous hash lookup when inserting new buffers
xfs: reduce the number of atomic when locking a buffer after lookup
xfs: merge xfs_buf_find() and xfs_buf_get_map()
xfs: break up xfs_buf_find() into individual pieces
xfs: add in-memory iunlink log item
xfs: add log item precommit operation
xfs: combine iunlink inode update functions
xfs: clean up xfs_iunlink_update_inode()
xfs: double link the unlinked inode list
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Add new ioctls to set and get the file system UUID in the ext4
superblock and improved the performance of the online resizing of file
systems with bigalloc enabled.
Fixed a lot of bugs, in particular for the inline data feature,
potential races when creating and deleting inodes with shared extended
attribute blocks, and the handling of directory blocks which are
corrupted"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (37 commits)
ext4: add ioctls to get/set the ext4 superblock uuid
ext4: avoid resizing to a partial cluster size
ext4: reduce computation of overhead during resize
jbd2: fix assertion 'jh->b_frozen_data == NULL' failure when journal aborted
ext4: block range must be validated before use in ext4_mb_clear_bb()
mbcache: automatically delete entries from cache on freeing
mbcache: Remove mb_cache_entry_delete()
ext2: avoid deleting xattr block that is being reused
ext2: unindent codeblock in ext2_xattr_set()
ext2: factor our freeing of xattr block reference
ext4: fix race when reusing xattr blocks
ext4: unindent codeblock in ext4_xattr_block_set()
ext4: remove EA inode entry from mbcache on inode eviction
mbcache: add functions to delete entry if unused
mbcache: don't reclaim used entries
ext4: make sure ext4_append() always allocates new block
ext4: check if directory block is within i_size
ext4: reflect mb_optimize_scan value in options file
ext4: avoid remove directory when directory is corrupted
ext4: aligned '*' in comments
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core / kernfs updates from Greg KH:
"Here is the set of driver core and kernfs changes for 6.0-rc1.
The "biggest" thing in here is some scalability improvements for
kernfs for large systems. Other than that, included in here are:
- arch topology and cache info changes that have been reviewed and
discussed a lot.
- potential error path cleanup fixes
- deferred driver probe cleanups
- firmware loader cleanups and tweaks
- documentation updates
- other small things
All of these have been in the linux-next tree for a while with no
reported problems"
* tag 'driver-core-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (63 commits)
docs: embargoed-hardware-issues: fix invalid AMD contact email
firmware_loader: Replace kmap() with kmap_local_page()
sysfs docs: ABI: Fix typo in comment
kobject: fix Kconfig.debug "its" grammar
kernfs: Fix typo 'the the' in comment
docs: driver-api: firmware: add driver firmware guidelines. (v3)
arch_topology: Fix cache attributes detection in the CPU hotplug path
ACPI: PPTT: Leave the table mapped for the runtime usage
cacheinfo: Use atomic allocation for percpu cache attributes
drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist
MAINTAINERS: Change mentions of mpm to olivia
docs: ABI: sysfs-devices-soc: Update Lee Jones' email address
docs: ABI: sysfs-class-pwm: Update Lee Jones' email address
Documentation/process: Add embargoed HW contact for LLVM
Revert "kernfs: Change kernfs_notify_list to llist."
ACPI: Remove the unused find_acpi_cpu_cache_topology()
arch_topology: Warn that topology for nested clusters is not supported
arch_topology: Add support for parsing sockets in /cpu-map
arch_topology: Set cluster identifier in each core/thread from /cpu-map
arch_topology: Limit span of cpu_clustergroup_mask()
...
|
|
The testcase use SMB2_SET_INFO_HE command to set a malformed file attribute
under the label `security.NTACL`. SMB2_QUERY_INFO_HE command in testcase
trigger the following overflow.
[ 4712.003781] ==================================================================
[ 4712.003790] BUG: KASAN: slab-out-of-bounds in build_sec_desc+0x842/0x1dd0 [ksmbd]
[ 4712.003807] Write of size 1060 at addr ffff88801e34c068 by task kworker/0:0/4190
[ 4712.003813] CPU: 0 PID: 4190 Comm: kworker/0:0 Not tainted 5.19.0-rc5 #1
[ 4712.003850] Workqueue: ksmbd-io handle_ksmbd_work [ksmbd]
[ 4712.003867] Call Trace:
[ 4712.003870] <TASK>
[ 4712.003873] dump_stack_lvl+0x49/0x5f
[ 4712.003935] print_report.cold+0x5e/0x5cf
[ 4712.003972] ? ksmbd_vfs_get_sd_xattr+0x16d/0x500 [ksmbd]
[ 4712.003984] ? cmp_map_id+0x200/0x200
[ 4712.003988] ? build_sec_desc+0x842/0x1dd0 [ksmbd]
[ 4712.004000] kasan_report+0xaa/0x120
[ 4712.004045] ? build_sec_desc+0x842/0x1dd0 [ksmbd]
[ 4712.004056] kasan_check_range+0x100/0x1e0
[ 4712.004060] memcpy+0x3c/0x60
[ 4712.004064] build_sec_desc+0x842/0x1dd0 [ksmbd]
[ 4712.004076] ? parse_sec_desc+0x580/0x580 [ksmbd]
[ 4712.004088] ? ksmbd_acls_fattr+0x281/0x410 [ksmbd]
[ 4712.004099] smb2_query_info+0xa8f/0x6110 [ksmbd]
[ 4712.004111] ? psi_group_change+0x856/0xd70
[ 4712.004148] ? update_load_avg+0x1c3/0x1af0
[ 4712.004152] ? asym_cpu_capacity_scan+0x5d0/0x5d0
[ 4712.004157] ? xas_load+0x23/0x300
[ 4712.004162] ? smb2_query_dir+0x1530/0x1530 [ksmbd]
[ 4712.004173] ? _raw_spin_lock_bh+0xe0/0xe0
[ 4712.004179] handle_ksmbd_work+0x30e/0x1020 [ksmbd]
[ 4712.004192] process_one_work+0x778/0x11c0
[ 4712.004227] ? _raw_spin_lock_irq+0x8e/0xe0
[ 4712.004231] worker_thread+0x544/0x1180
[ 4712.004234] ? __cpuidle_text_end+0x4/0x4
[ 4712.004239] kthread+0x282/0x320
[ 4712.004243] ? process_one_work+0x11c0/0x11c0
[ 4712.004246] ? kthread_complete_and_exit+0x30/0x30
[ 4712.004282] ret_from_fork+0x1f/0x30
This patch add the buffer validation for security descriptor that is
stored by malformed SMB2_SET_INFO_HE command. and allocate large
response buffer about SMB2_O_INFO_SECURITY file info class.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17771
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
lockd doesn't currently vet the start and length in nlm4 requests like
it should, and can end up generating lock requests with arguments that
overflow when passed to the filesystem.
The NLM4 protocol uses unsigned 64-bit arguments for both start and
length, whereas struct file_lock tracks the start and end as loff_t
values. By the time we get around to calling nlm4svc_retrieve_args,
we've lost the information that would allow us to determine if there was
an overflow.
Start tracking the actual start and len for NLM4 requests in the
nlm_lock. In nlm4svc_retrieve_args, vet these values to ensure they
won't cause an overflow, and return NLM4_FBIG if they do.
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=392
Reported-by: Jan Kasiak <j.kasiak@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org> # 5.14+
|
|
As all inode locking is now fully balanced, fh_put() does not need to
call fh_unlock().
fh_lock() and fh_unlock() are no longer used, so discard them.
These are the only real users of ->fh_locked, so discard that too.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
When locking a file to access ACLs and xattrs etc, use explicit locking
with inode_lock() instead of fh_lock(). This means that the calls to
fh_fill_pre/post_attr() are also explicit which improves readability and
allows us to place them only where they are needed. Only the xattr
calls need pre/post information.
When locking a file we don't need I_MUTEX_PARENT as the file is not a
parent of anything, so we can use inode_lock() directly rather than the
inode_lock_nested() call that fh_lock() uses.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
When creating or unlinking a name in a directory use explicit
inode_lock_nested() instead of fh_lock(), and explicit calls to
fh_fill_pre_attrs() and fh_fill_post_attrs(). This is already done
for renames, with lock_rename() as the explicit locking.
Also move the 'fill' calls closer to the operation that might change the
attributes. This way they are avoided on some error paths.
For the v2-only code in nfsproc.c, the fill calls are not replaced as
they aren't needed.
Making the locking explicit will simplify proposed future changes to
locking for directories. It also makes it easily visible exactly where
pre/post attributes are used - not all callers of fh_lock() actually
need the pre/post attributes.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
nfsd_lookup() takes an exclusive lock on the parent inode, but no
callers want the lock and it may not be needed at all if the
result is in the dcache.
Change nfsd_lookup_dentry() to not take the lock, and call
lookup_one_len_locked() which takes lock only if needed.
nfsd4_open() currently expects the lock to still be held, but that isn't
necessary as nfsd_validate_delegated_dentry() provides required
guarantees without the lock.
NOTE: NFSv4 requires directory changeinfo for OPEN even when a create
wasn't requested and no change happened. Now that nfsd_lookup()
doesn't use fh_lock(), we need to explicitly fill the attributes
when no create happens. A new fh_fill_both_attrs() is provided
for that task.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
On non-error paths, nfsd_link() calls fh_unlock() twice. This is safe
because fh_unlock() records that the unlock has been done and doesn't
repeat it.
However it makes the code a little confusing and interferes with changes
that are planned for directory locking.
So rearrange the code to ensure fh_unlock() is called exactly once if
fh_lock() was called.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Some error paths in nfsd_unlink() allow it to exit without unlocking the
directory. This is not a problem in practice as the directory will be
locked with an fh_put(), but it is untidy and potentially confusing.
This allows us to remove all the fh_unlock() calls that are immediately
after nfsd_unlink() calls.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
nfsd_create() usually returns with the directory still locked.
nfsd_symlink() usually returns with it unlocked. This is clumsy.
Until recently nfsd_create() needed to keep the directory locked until
ACLs and security label had been set. These are now set inside
nfsd_create() (in nfsd_setattr()) so this need is gone.
So change nfsd_create() and nfsd_symlink() to always unlock, and remove
any fh_unlock() calls that follow calls to these functions.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
pacl and dpacl pointers are added to struct nfsd_attrs, which requires
that we have an nfsd_attrs_free() function to free them.
Those nfsv4 functions that can set ACLs now set up these pointers
based on the passed in NFSv4 ACL.
nfsd_setattr() sets the acls as appropriate.
Errors are handled as with security labels.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
Pull zonefs update from Damien Le Moal:
"A single change for this cycle to simplify handling of the memory page
used as super block buffer during mount (from Fabio)"
* tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Call page_address() on page acquired with GFP_KERNEL flag
|
|
Pull iomap updates from Darrick Wong:
"The most notable change in this first batch is that we no longer
schedule pages beyond i_size for writeback, preferring instead to let
truncate deal with those pages.
Next week, there may be a second pull request to remove
iomap_writepage from the other two filesystems (gfs2/zonefs) that use
iomap for buffered IO. This follows in the same vein as the recent
removal of writepage from XFS, since it hasn't been triggered in a few
years; it does nothing during direct reclaim; and as far as the people
who examined the patchset can tell, it's moving the codebase in the
right direction.
However, as it was a late addition to for-next, I'm holding off on
that section for another week of testing to see if anyone can come up
with a solid reason for holding off in the meantime.
Summary:
- Skip writeback for pages that are completely beyond EOF
- Minor code cleanups"
* tag 'iomap-5.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
dax: set did_zero to true when zeroing successfully
iomap: set did_zero to true when zeroing successfully
iomap: skip pages past eof in iomap_do_writepage()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull affs fix from David Sterba:
"One update to AFFS, switching away from the kmap/kmap_atomic API"
* tag 'affs-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
affs: use memcpy_to_page and remove replace kmap_atomic()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"This brings some long awaited changes, the send protocol bump,
otherwise lots of small improvements and fixes. The main core part is
reworking bio handling, cleaning up the submission and endio and
improving error handling.
There are some changes outside of btrfs adding helpers or updating
API, listed at the end of the changelog.
Features:
- sysfs:
- export chunk size, in debug mode add tunable for setting its size
- show zoned among features (was only in debug mode)
- show commit stats (number, last/max/total duration)
- send protocol updated to 2
- new commands:
- ability write larger data chunks than 64K
- send raw compressed extents (uses the encoded data ioctls),
ie. no decompression on send side, no compression needed on
receive side if supported
- send 'otime' (inode creation time) among other timestamps
- send file attributes (a.k.a file flags and xflags)
- this is first version bump, backward compatibility on send and
receive side is provided
- there are still some known and wanted commands that will be
implemented in the near future, another version bump will be
needed, however we want to minimize that to avoid causing
usability issues
- print checksum type and implementation at mount time
- don't print some messages at mount (mentioned as people asked about
it), we want to print messages namely for new features so let's
make some space for that
- big metadata - this has been supported for a long time and is
not a feature that's worth mentioning
- skinny metadata - same reason, set by default by mkfs
Performance improvements:
- reduced amount of reserved metadata for delayed items
- when inserted items can be batched into one leaf
- when deleting batched directory index items
- when deleting delayed items used for deletion
- overall improved count of files/sec, decreased subvolume lock
contention
- metadata item access bounds checker micro-optimized, with a few
percent of improved runtime for metadata-heavy operations
- increase direct io limit for read to 256 sectors, improved
throughput by 3x on sample workload
Notable fixes:
- raid56
- reduce parity writes, skip sectors of stripe when there are no
data updates
- restore reading from on-disk data instead of using stripe cache,
this reduces chances to damage correct data due to RMW cycle
- refuse to replay log with unknown incompat read-only feature bit
set
- zoned
- fix page locking when COW fails in the middle of allocation
- improved tracking of active zones, ZNS drives may limit the
number and there are ENOSPC errors due to that limit and not
actual lack of space
- adjust maximum extent size for zone append so it does not cause
late ENOSPC due to underreservation
- mirror reading error messages show the mirror number
- don't fallback to buffered IO for NOWAIT direct IO writes, we don't
have the NOWAIT semantics for buffered io yet
- send, fix sending link commands for existing file paths when there
are deleted and created hardlinks for same files
- repair all mirrors for profiles with more than 1 copy (raid1c34)
- fix repair of compressed extents, unify where error detection and
repair happen
Core changes:
- bio completion cleanups
- don't double defer compression bios
- simplify endio workqueues
- add more data to btrfs_bio to avoid allocation for read requests
- rework bio error handling so it's same what block layer does,
the submission works and errors are consumed in endio
- when asynchronous bio offload fails fall back to synchronous
checksum calculation to avoid errors under writeback or memory
pressure
- new trace points
- raid56 events
- ordered extent operations
- super block log_root_transid deprecated (never used)
- mixed_backref and big_metadata sysfs feature files removed, they've
been default for sufficiently long time, there are no known users
and mixed_backref could be confused with mixed_groups
Non-btrfs changes, API updates:
- minor highmem API update to cover const arguments
- switch all kmap/kmap_atomic to kmap_local
- remove redundant flush_dcache_page()
- address_space_operations::writepage callback removed
- add bdev_max_segments() helper"
* tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (163 commits)
btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read
btrfs: fix repair of compressed extents
btrfs: remove the start argument to check_data_csum and export
btrfs: pass a btrfs_bio to btrfs_repair_one_sector
btrfs: simplify the pending I/O counting in struct compressed_bio
btrfs: repair all known bad mirrors
btrfs: merge btrfs_dev_stat_print_on_error with its only caller
btrfs: join running log transaction when logging new name
btrfs: simplify error handling in btrfs_lookup_dentry
btrfs: send: always use the rbtree based inode ref management infrastructure
btrfs: send: fix sending link commands for existing file paths
btrfs: send: introduce recorded_ref_alloc and recorded_ref_free
btrfs: zoned: wait until zone is finished when allocation didn't progress
btrfs: zoned: write out partially allocated region
btrfs: zoned: activate necessary block group
btrfs: zoned: activate metadata block group on flush_space
btrfs: zoned: disable metadata overcommit for zoned
btrfs: zoned: introduce space_info->active_total_bytes
btrfs: zoned: finish least available block group on data bg allocation
btrfs: let can_allocate_chunk return error
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull efivars sysfs interface removal from Ard Biesheuvel:
"Remove the obsolete 'efivars' sysfs based interface to the EFI
variable store, now that all users have moved to the efivarfs pseudo
file system, which was created ~10 years ago to address some
fundamental shortcomings in the sysfs based driver.
Move the 'business logic' related to which EFI variables are important
and may affect the boot flow from the efivars support layer into the
efivarfs pseudo file system, so it is no longer exposed to other parts
of the kernel"
* tag 'efi-efivars-removal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: vars: Move efivar caching layer into efivarfs
efi: vars: Switch to new wrapper layer
efi: vars: Remove deprecated 'efivars' sysfs interface
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
- Enable mirrored memory for arm64
- Fix up several abuses of the efivar API
- Refactor the efivar API in preparation for moving the 'business
logic' part of it into efivarfs
- Enable ACPI PRM on arm64
* tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
ACPI: Move PRM config option under the main ACPI config
ACPI: Enable Platform Runtime Mechanism(PRM) support on ARM64
ACPI: PRM: Change handler_addr type to void pointer
efi: Simplify arch_efi_call_virt() macro
drivers: fix typo in firmware/efi/memmap.c
efi: vars: Drop __efivar_entry_iter() helper which is no longer used
efi: vars: Use locking version to iterate over efivars linked lists
efi: pstore: Omit efivars caching EFI varstore access layer
efi: vars: Add thin wrapper around EFI get/set variable interface
efi: vars: Don't drop lock in the middle of efivar_init()
pstore: Add priv field to pstore_record for backend specific use
Input: applespi - avoid efivars API and invoke EFI services directly
selftests/kexec: remove broken EFI_VARS secure boot fallback check
brcmfmac: Switch to appropriate helper to load EFI variable contents
iwlwifi: Switch to proper EFI variable store interface
media: atomisp_gmin_platform: stop abusing efivar API
efi: efibc: avoid efivar API for setting variables
efi: avoid efivars layer when loading SSDTs from variables
efi: Correct comment on efi_memmap_alloc
memblock: Disable mirror feature if kernelcore is not specified
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs iov_iter updates from Al Viro:
"Part 1 - isolated cleanups and optimizations.
One of the goals is to reduce the overhead of using ->read_iter() and
->write_iter() instead of ->read()/->write().
new_sync_{read,write}() has a surprising amount of overhead, in
particular inside iocb_flags(). That's the explanation for the
beginning of the series is in this pile; it's not directly
iov_iter-related, but it's a part of the same work..."
* tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
first_iovec_segment(): just return address
iov_iter: massage calling conventions for first_{iovec,bvec}_segment()
iov_iter: first_{iovec,bvec}_segment() - simplify a bit
iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment()
iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT
iov_iter_bvec_advance(): don't bother with bvec_iter
copy_page_{to,from}_iter(): switch iovec variants to generic
keep iocb_flags() result cached in struct file
iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
struct file: use anonymous union member for rcuhead and llist
btrfs: use IOMAP_DIO_NOSYNC
teach iomap_dio_rw() to suppress dsync
No need of likely/unlikely on calls of check_copy_size()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs dcache updates from Al Viro:
"The main part here is making parallel lookups safe for RT - making
sure preemption is disabled in start_dir_add()/ end_dir_add() sections
(on non-RT it's automatic, on RT it needs to to be done explicitly)
and moving wakeups from __d_lookup_done() inside of such to the end of
those sections.
Wakeups can be safely delayed for as long as ->d_lock on in-lookup
dentry is held; proving that has caught a bug in d_add_ci() that
allows memory corruption when sufficiently bogus ntfs (or
case-insensitive xfs) image is mounted. Easily fixed, fortunately"
* tag 'pull-work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/dcache: Move wakeup out of i_seq_dir write held region.
fs/dcache: Move the wakeup from __d_lookup_done() to the caller.
fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT
d_add_ci(): make sure we don't miss d_lookup_done()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs lseek updates from Al Viro:
"Jason's lseek series.
Saner handling of 'lseek should fail with ESPIPE' - this gets rid of
the magical no_llseek thing and makes checks consistent.
In particular, the ad-hoc "can we do splice via internal pipe" checks
got saner (and somewhat more permissive, which is what Jason had been
after, AFAICT)"
* tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: remove no_llseek
fs: check FMODE_LSEEK to control internal pipe splicing
vfio: do not set FMODE_LSEEK flag
dma-buf: remove useless FMODE_LSEEK flag
fs: do not compare against ->llseek
fs: clear or set FMODE_LSEEK based on llseek function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs namei updates from Al Viro:
"RCU pathwalk cleanups.
Storing sampled ->d_seq of the next dentry in nameidata simplifies
life considerably, especially if we delay fetching ->d_inode until
step_into()"
* tag 'pull-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
step_into(): move fetching ->d_inode past handle_mounts()
lookup_fast(): don't bother with inode
follow_dotdot{,_rcu}(): don't bother with inode
step_into(): lose inode argument
namei: stash the sampled ->d_seq into nameidata
namei: move clearing LOOKUP_RCU towards rcu_read_unlock()
switch try_to_unlazy_next() to __legitimize_mnt()
follow_dotdot{,_rcu}(): change calling conventions
namei: get rid of pointless unlikely(read_seqcount_retry(...))
__follow_mount_rcu(): verify that mount_lock remains unchanged
|
|
Pull folio updates from Matthew Wilcox:
- Fix an accounting bug that made NR_FILE_DIRTY grow without limit
when running xfstests
- Convert more of mpage to use folios
- Remove add_to_page_cache() and add_to_page_cache_locked()
- Convert find_get_pages_range() to filemap_get_folios()
- Improvements to the read_cache_page() family of functions
- Remove a few unnecessary checks of PageError
- Some straightforward filesystem conversions to use folios
- Split PageMovable users out from address_space_operations into
their own movable_operations
- Convert aops->migratepage to aops->migrate_folio
- Remove nobh support (Christoph Hellwig)
* tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits)
fs: remove the NULL get_block case in mpage_writepages
fs: don't call ->writepage from __mpage_writepage
fs: remove the nobh helpers
jfs: stop using the nobh helper
ext2: remove nobh support
ntfs3: refactor ntfs_writepages
mm/folio-compat: Remove migration compatibility functions
fs: Remove aops->migratepage()
secretmem: Convert to migrate_folio
hugetlb: Convert to migrate_folio
aio: Convert to migrate_folio
f2fs: Convert to filemap_migrate_folio()
ubifs: Convert to filemap_migrate_folio()
btrfs: Convert btrfs_migratepage to migrate_folio
mm/migrate: Add filemap_migrate_folio()
mm/migrate: Convert migrate_page() to migrate_folio()
nfs: Convert to migrate_folio
btrfs: Convert btree_migratepage to migrate_folio
mm/migrate: Convert expected_page_refs() to folio_expected_refs()
mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()
...
|
|
This function always returns 0, and ignores the nofail boolean. Drop the
nofail argument, make the function void return and fix up the callers.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
This fixes a race between changing the ext4 superblock uuid and operations
like mounting, resizing, changing features, etc.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jeremy Bongio <bongiojp@gmail.com>
Link: https://lore.kernel.org/r/20220721224422.438351-1-bongiojp@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch avoids an attempt to resize the filesystem to an
unaligned cluster boundary. An online resize to a size that is not
integral to cluster size results in the last iteration attempting to
grow the fs by a negative amount, which trips a BUG_ON and leaves the fs
with a corrupted in-memory superblock.
Signed-off-by: Oleg Kiselev <okiselev@amazon.com>
Link: https://lore.kernel.org/r/0E92A0AB-4F16-4F1A-94B7-702CC6504FDE@amazon.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch avoids doing an O(n**2)-complexity walk through every flex group.
Instead, it uses the already computed overhead information for the newly
allocated space, and simply adds it to the previously calculated
overhead stored in the superblock. This drastically reduces the time
taken to resize very large bigalloc filesystems (from 3+ hours for a
64TB fs down to milliseconds).
Signed-off-by: Oleg Kiselev <okiselev@amazon.com>
Link: https://lore.kernel.org/r/CE4F359F-4779-45E6-B6A9-8D67FDFF5AE2@amazon.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Following process will fail assertion 'jh->b_frozen_data == NULL' in
jbd2_journal_dirty_metadata():
jbd2_journal_commit_transaction
unlink(dir/a)
jh->b_transaction = trans1
jh->b_jlist = BJ_Metadata
journal->j_running_transaction = NULL
trans1->t_state = T_COMMIT
unlink(dir/b)
handle->h_trans = trans2
do_get_write_access
jh->b_modified = 0
jh->b_frozen_data = frozen_buffer
jh->b_next_transaction = trans2
jbd2_journal_dirty_metadata
is_handle_aborted
is_journal_aborted // return false
--> jbd2 abort <--
while (commit_transaction->t_buffers)
if (is_journal_aborted)
jbd2_journal_refile_buffer
__jbd2_journal_refile_buffer
WRITE_ONCE(jh->b_transaction,
jh->b_next_transaction)
WRITE_ONCE(jh->b_next_transaction, NULL)
__jbd2_journal_file_buffer(jh, BJ_Reserved)
J_ASSERT_JH(jh, jh->b_frozen_data == NULL) // assertion failure !
The reproducer (See detail in [Link]) reports:
------------[ cut here ]------------
kernel BUG at fs/jbd2/transaction.c:1629!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 584 Comm: unlink Tainted: G W
5.19.0-rc6-00115-g4a57a8400075-dirty #697
RIP: 0010:jbd2_journal_dirty_metadata+0x3c5/0x470
RSP: 0018:ffffc90000be7ce0 EFLAGS: 00010202
Call Trace:
<TASK>
__ext4_handle_dirty_metadata+0xa0/0x290
ext4_handle_dirty_dirblock+0x10c/0x1d0
ext4_delete_entry+0x104/0x200
__ext4_unlink+0x22b/0x360
ext4_unlink+0x275/0x390
vfs_unlink+0x20b/0x4c0
do_unlinkat+0x42f/0x4c0
__x64_sys_unlink+0x37/0x50
do_syscall_64+0x35/0x80
After journal aborting, __jbd2_journal_refile_buffer() is executed with
holding @jh->b_state_lock, we can fix it by moving 'is_handle_aborted()'
into the area protected by @jh->b_state_lock.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216251
Fixes: 470decc613ab20 ("[PATCH] jbd2: initial copy of files from jbd")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20220715125152.4022726-1-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Block range to free is validated in ext4_free_blocks() using
ext4_inode_block_valid() and then it's passed to ext4_mb_clear_bb().
However in some situations on bigalloc file system the range might be
adjusted after the validation in ext4_free_blocks() which can lead to
troubles on corrupted file systems such as one found by syzkaller that
resulted in the following BUG
kernel BUG at fs/ext4/ext4.h:3319!
PREEMPT SMP NOPTI
CPU: 28 PID: 4243 Comm: repro Kdump: loaded Not tainted 5.19.0-rc6+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
RIP: 0010:ext4_free_blocks+0x95e/0xa90
Call Trace:
<TASK>
? lock_timer_base+0x61/0x80
? __es_remove_extent+0x5a/0x760
? __mod_timer+0x256/0x380
? ext4_ind_truncate_ensure_credits+0x90/0x220
ext4_clear_blocks+0x107/0x1b0
ext4_free_data+0x15b/0x170
ext4_ind_truncate+0x214/0x2c0
? _raw_spin_unlock+0x15/0x30
? ext4_discard_preallocations+0x15a/0x410
? ext4_journal_check_start+0xe/0x90
? __ext4_journal_start_sb+0x2f/0x110
ext4_truncate+0x1b5/0x460
? __ext4_journal_start_sb+0x2f/0x110
ext4_evict_inode+0x2b4/0x6f0
evict+0xd0/0x1d0
ext4_enable_quotas+0x11f/0x1f0
ext4_orphan_cleanup+0x3de/0x430
? proc_create_seq_private+0x43/0x50
ext4_fill_super+0x295f/0x3ae0
? snprintf+0x39/0x40
? sget_fc+0x19c/0x330
? ext4_reconfigure+0x850/0x850
get_tree_bdev+0x16d/0x260
vfs_get_tree+0x25/0xb0
path_mount+0x431/0xa70
__x64_sys_mount+0xe2/0x120
do_syscall_64+0x5b/0x80
? do_user_addr_fault+0x1e2/0x670
? exc_page_fault+0x70/0x170
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fdf4e512ace
Fix it by making sure that the block range is properly validated before
used every time it changes in ext4_free_blocks() or ext4_mb_clear_bb().
Link: https://syzkaller.appspot.com/bug?id=5266d464285a03cee9dbfda7d2452a72c3c2ae7c
Reported-by: syzbot+15cd994e273307bf5cfa@syzkaller.appspotmail.com
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Tadeusz Struk <tadeusz.struk@linaro.org>
Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Link: https://lore.kernel.org/r/20220714165903.58260-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Use the fact that entries with elevated refcount are not removed from
the hash and just move removal of the entry from the hash to the entry
freeing time. When doing this we also change the generic code to hold
one reference to the cache entry, not two of them, which makes code
somewhat more obvious.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-10-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Nobody uses mb_cache_entry_delete() anymore. Remove it.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-9-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|