diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-02-21 10:34:36 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-02-21 10:34:36 -0800 |
commit | b52bb135aad99deea9bfe5f050c3295b049adc87 (patch) | |
tree | f5282961c881e3077cf14ae42bef9f800bfee6aa /fs/xfs/xfs_reflink.c | |
parent | 4f016a316f2243efb0d1c0e7259f07817eb99e67 (diff) | |
parent | 1cd738b13ae9b29e03d6149f0246c61f76e81fcf (diff) | |
download | linux-b52bb135aad99deea9bfe5f050c3295b049adc87.tar.bz2 |
Merge tag 'xfs-5.12-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
"There's a lot going on this time, which seems about right for this
drama-filled year.
Community developers added some code to speed up freezing when
read-only workloads are still running, refactored the logging code,
added checks to prevent file extent counter overflow, reduced iolock
cycling to speed up fsync and gc scans, and started the slow march
towards supporting filesystem shrinking.
There's a huge refactoring of the internal speculative preallocation
garbage collection code which fixes a bunch of bugs, makes the gc
scheduling per-AG and hence multithreaded, and standardizes the retry
logic when we try to reserve space or quota, can't, and want to
trigger a gc scan. We also enable multithreaded quotacheck to reduce
mount times further. This is also preparation for background file gc,
which may or may not land for 5.13.
We also fixed some deadlocks in the rename code, fixed a quota
accounting leak when FSSETXATTR fails, restored the behavior that
write faults to an mmap'd region actually cause a SIGBUS, fixed a bug
where sgid directory inheritance wasn't quite working properly, and
fixed a bug where symlinks weren't working properly in ecryptfs. We
also now advertise the inode btree counters feature that was
introduced two cycles ago.
Summary:
- Fix an ABBA deadlock when renaming files on overlayfs.
- Make sure that we can't overflow the inode extent counters when
adding to or removing extents from a file.
- Make directory sgid inheritance work the same way as all the other
filesystems.
- Don't drain the buffer cache on freeze and ro remount, which should
reduce the amount of time if read-only workloads are continuing
during the freeze.
- Fix a bug where symlink size isn't reported to the vfs in ecryptfs.
- Disentangle log cleaning from log covering. This refactoring sets
us up for future changes to the log, though for now it simply means
that we can use covering for freezes, and cleaning becomes
something we only do at unmount.
- Speed up file fsyncs by reducing iolock cycling.
- Fix delalloc blocks leaking when changing the project id fails
because of input validation errors in FSSETXATTR.
- Fix oversized quota reservation when converting unwritten extents
during a DAX write.
- Create a transaction allocation helper function to standardize the
idiom of allocating a transaction, reserving blocks, locking
inodes, and reserving quota. Replace all the open-coded logic for
file creation, file ownership changes, and file modifications to
use them.
- Actually shut down the fs if the incore quota reservations get
corrupted.
- Fix background block garbage collection scans to not block and to
actually clean out CoW staging extents properly.
- Run block gc scans when we run low on project quota.
- Use the standardized transaction allocation helpers to make it so
that ENOSPC and EDQUOT errors during reservation will back out,
invoke the block gc scanner, and try again. This is preparation for
introducing background inode garbage collection in the next cycle.
- Combine speculative post-EOF block garbage collection with
speculative copy on write block garbage collection.
- Enable multithreaded quotacheck.
- Allow sysadmins to tweak the CPU affinities and maximum concurrency
levels of quotacheck and background blockgc worker pools.
- Expose the inode btree counter feature in the fs geometry ioctl.
- Cleanups of the growfs code in preparation for starting work on
filesystem shrinking.
- Fix all the bloody gcc warnings that the maintainer knows about. :P
- Fix a RST syntax error.
- Don't trigger bmbt corruption assertions after the fs shuts down.
- Restore behavior of forcing SIGBUS on a shut down filesystem when
someone triggers a mmap write fault (or really, any buffered
write)"
* tag 'xfs-5.12-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (85 commits)
xfs: consider shutdown in bmapbt cursor delete assert
xfs: fix boolreturn.cocci warnings
xfs: restore shutdown check in mapped write fault path
xfs: fix rst syntax error in admin guide
xfs: fix incorrect root dquot corruption error when switching group/project quota types
xfs: get rid of xfs_growfs_{data,log}_t
xfs: rename `new' to `delta' in xfs_growfs_data_private()
libxfs: expose inobtcount in xfs geometry
xfs: don't bounce the iolock between free_{eof,cow}blocks
xfs: expose the blockgc workqueue knobs publicly
xfs: parallelize block preallocation garbage collection
xfs: rename block gc start and stop functions
xfs: only walk the incore inode tree once per blockgc scan
xfs: consolidate the eofblocks and cowblocks workers
xfs: consolidate incore inode radix tree posteof/cowblocks tags
xfs: remove trivial eof/cowblocks functions
xfs: hide xfs_icache_free_cowblocks
xfs: hide xfs_icache_free_eofblocks
xfs: relocate the eofb/cowb workqueue functions
xfs: set WQ_SYSFS on all workqueues in debug mode
...
Diffstat (limited to 'fs/xfs/xfs_reflink.c')
-rw-r--r-- | fs/xfs/xfs_reflink.c | 103 |
1 files changed, 62 insertions, 41 deletions
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 6fa05fb78189..725c7d8e4438 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -376,16 +376,14 @@ xfs_reflink_allocate_cow( resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned); xfs_iunlock(ip, *lockmode); - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp); - *lockmode = XFS_ILOCK_EXCL; - xfs_ilock(ip, *lockmode); + *lockmode = 0; + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, resblks, 0, + false, &tp); if (error) return error; - error = xfs_qm_dqattach_locked(ip, false); - if (error) - goto out_trans_cancel; + *lockmode = XFS_ILOCK_EXCL; /* * Check for an overlapping extent again now that we dropped the ilock. @@ -398,20 +396,13 @@ xfs_reflink_allocate_cow( goto convert; } - error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0, - XFS_QMOPT_RES_REGBLKS); - if (error) - goto out_trans_cancel; - - xfs_trans_ijoin(tp, ip, 0); - /* Allocate the entire reservation as unwritten blocks. */ nimaps = 1; error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount, XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, 0, cmap, &nimaps); if (error) - goto out_unreserve; + goto out_trans_cancel; xfs_inode_set_cowblocks_tag(ip); error = xfs_trans_commit(tp); @@ -436,9 +427,6 @@ convert: trace_xfs_reflink_convert_cow(ip, cmap); return xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb); -out_unreserve: - xfs_trans_unreserve_quota_nblks(tp, ip, (long)resblks, 0, - XFS_QMOPT_RES_REGBLKS); out_trans_cancel: xfs_trans_cancel(tp); return error; @@ -508,9 +496,8 @@ xfs_reflink_cancel_cow_blocks( xfs_bmap_del_extent_cow(ip, &icur, &got, &del); /* Remove the quota reservation */ - error = xfs_trans_reserve_quota_nblks(NULL, ip, - -(long)del.br_blockcount, 0, - XFS_QMOPT_RES_REGBLKS); + error = xfs_quota_unreserve_blkres(ip, + del.br_blockcount); if (error) break; } else { @@ -628,6 +615,11 @@ xfs_reflink_end_cow_extent( xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, ip, 0); + error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, + XFS_IEXT_REFLINK_END_COW_CNT); + if (error) + goto out_cancel; + /* * In case of racing, overlapping AIO writes no COW extents might be * left by the time I/O completes for the loser of the race. In that @@ -997,22 +989,47 @@ xfs_reflink_remap_extent( struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp; xfs_off_t newlen; - int64_t qres, qdelta; + int64_t qdelta = 0; unsigned int resblks; + bool quota_reserved = true; bool smap_real; bool dmap_written = xfs_bmap_is_written_extent(dmap); + int iext_delta = 0; int nimaps; int error; - /* Start a rolling transaction to switch the mappings */ + /* + * Start a rolling transaction to switch the mappings. + * + * Adding a written extent to the extent map can cause a bmbt split, + * and removing a mapped extent from the extent can cause a bmbt split. + * The two operations cannot both cause a split since they operate on + * the same index in the bmap btree, so we only need a reservation for + * one bmbt split if either thing is happening. However, we haven't + * locked the inode yet, so we reserve assuming this is the case. + * + * The first allocation call tries to reserve enough space to handle + * mapping dmap into a sparse part of the file plus the bmbt split. We + * haven't locked the inode or read the existing mapping yet, so we do + * not know for sure that we need the space. This should succeed most + * of the time. + * + * If the first attempt fails, try again but reserving only enough + * space to handle a bmbt split. This is the hard minimum requirement, + * and we revisit quota reservations later when we know more about what + * we're remapping. + */ resblks = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK); - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp); + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, + resblks + dmap->br_blockcount, 0, false, &tp); + if (error == -EDQUOT || error == -ENOSPC) { + quota_reserved = false; + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, + resblks, 0, false, &tp); + } if (error) goto out; - xfs_ilock(ip, XFS_ILOCK_EXCL); - xfs_trans_ijoin(tp, ip, 0); - /* * Read what's currently mapped in the destination file into smap. * If smap isn't a hole, we will have to remove it before we can add @@ -1060,15 +1077,9 @@ xfs_reflink_remap_extent( } /* - * Compute quota reservation if we think the quota block counter for + * Increase quota reservation if we think the quota block counter for * this file could increase. * - * Adding a written extent to the extent map can cause a bmbt split, - * and removing a mapped extent from the extent can cause a bmbt split. - * The two operations cannot both cause a split since they operate on - * the same index in the bmap btree, so we only need a reservation for - * one bmbt split if either thing is happening. - * * If we are mapping a written extent into the file, we need to have * enough quota block count reservation to handle the blocks in that * extent. We log only the delta to the quota block counts, so if the @@ -1081,19 +1092,29 @@ xfs_reflink_remap_extent( * count. This is suboptimal, but the VFS flushed the dest range * before we started. That should have removed all the delalloc * reservations, but we code defensively. + * + * xfs_trans_alloc_inode above already tried to grab an even larger + * quota reservation, and kicked off a blockgc scan if it couldn't. + * If we can't get a potentially smaller quota reservation now, we're + * done. */ - qres = qdelta = 0; - if (smap_real || dmap_written) - qres = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK); - if (!smap_real && dmap_written) - qres += dmap->br_blockcount; - if (qres > 0) { - error = xfs_trans_reserve_quota_nblks(tp, ip, qres, 0, - XFS_QMOPT_RES_REGBLKS); + if (!quota_reserved && !smap_real && dmap_written) { + error = xfs_trans_reserve_quota_nblks(tp, ip, + dmap->br_blockcount, 0, false); if (error) goto out_cancel; } + if (smap_real) + ++iext_delta; + + if (dmap_written) + ++iext_delta; + + error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, iext_delta); + if (error) + goto out_cancel; + if (smap_real) { /* * If the extent we're unmapping is backed by storage (written |