summaryrefslogtreecommitdiffstats
path: root/fs/xfs/xfs_iomap.c
AgeCommit message (Collapse)AuthorFilesLines
2018-08-07xfs: use WRITE_ONCE to update if_seqChristoph Hellwig1-1/+2
This adds ordering of the updates and makes sure we always see the if_seq update before the extent tree is modified. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-02xfs: automatic dfops inode reloggingBrian Foster1-3/+0
Inodes that are held across deferred operations are explicitly joined to the dfops structure to ensure appropriate relogging. While inodes are currently joined explicitly, we can detect the conditions that require relogging at dfops finish time by inspecting the transaction item list for inodes with ili_lock_flags == 0. Replace the xfs_defer_ijoin() infrastructure with such detection and automatic relogging of held inodes. This eliminates the need for the per-dfops inode list, replaced by an on-stack variant in xfs_defer_trans_roll(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-02xfs: add missing defer ijoins for held inodesBrian Foster1-0/+3
Log items that require relogging during deferred operations processing are explicitly joined to the associated dfops via the xfs_defer_*join() helpers. These calls imply that the associated object is "held" by the transaction such that when rolled, the item can be immediately joined to a follow up transaction. For buffers, this means the buffer remains locked and held after each roll. For inodes, this means that the inode remains locked. Failure to join a held item to the dfops structure means the associated object pins the tail of the log while dfops processing completes, because the item never relogs and is not unlocked or released until deferred processing completes. Currently, all buffers that are held in transactions (XFS_BLI_HOLD) with deferred operations are explicitly joined to the dfops. This is not the case for inodes, however, as various contexts defer operations to transactions with held inodes without explicit joins to the associated dfops (and thus not relogging). While this is not a catastrophic problem, it is not ideal. Given that we want to eventually relog such items automatically during dfops processing, start by explicitly adding these missing xfs_defer_ijoin() calls. A call is added everywhere an inode is joined to a transaction without transferring lock ownership and said transaction runs deferred operations. All xfs_defer_ijoin() calls will eventually be replaced by automatic dfops inode relogging. This patch essentially implements the behavior change that would otherwise occur due to automatic inode dfops relogging. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-31xfs: avoid COW fork extent lookups in writeback if the fork didn't changeChristoph Hellwig1-1/+4
Used the per-fork sequence counter to avoid lookups in the writeback code unless the COW fork actually changed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-26xfs: remove all boilerplate defer init/finish codeBrian Foster1-24/+2
At this point, the transaction subsystem completely manages deferred items internally such that the common and boilerplate xfs_trans_alloc() -> xfs_defer_init() -> xfs_defer_finish() -> xfs_trans_commit() sequence can be replaced with a simple transaction allocation and commit. Remove all such boilerplate deferred ops code. In doing so, we change each case over to use the dfops in the transaction and specifically eliminate: - The on-stack dfops and associated xfs_defer_init() call, as the internal dfops is initialized on transaction allocation. - xfs_bmap_finish() calls that precede a final xfs_trans_commit() of a transaction. - xfs_defer_cancel() calls in error handlers that precede a transaction cancel. The only deferred ops calls that remain are those that are non-deterministic with respect to the final commit of the associated transaction or are open-coded due to special handling. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove xfs_defer_init() firstblock paramBrian Foster1-3/+3
All but one caller of xfs_defer_init() passes in the ->t_firstblock of the associated transaction. The one outlier is xlog_recover_process_intents(), which simply passes a dummy value because a valid pointer is required. This firstblock variable can simply be removed. At this point we could remove the xfs_defer_init() firstblock parameter and initialize ->t_firstblock directly. Even that is not necessary, however, because ->t_firstblock is automatically reinitialized in the new transaction on a transaction roll. Since xfs_defer_init() should never occur more than once on a particular transaction (since the corresponding finish will roll it), replace the reinit from xfs_defer_init() with an assert that verifies the transaction has a NULLFSBLOCK firstblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove xfs_bmapi_write() firstblock paramBrian Foster1-6/+4
All callers pass ->t_firstblock from the current transaction. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: use ->t_firstblock for all xfs_bmapi_write() callersBrian Foster1-11/+9
Convert all xfs_bmapi_write() users to ->t_firstblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: refactor dfops init to attach to transactionBrian Foster1-6/+3
Most callers of xfs_defer_init() immediately attach the dfops structure to a transaction. Add a transaction parameter to eliminate much of this boilerplate code. This also helps self-document the fact that many codepaths now expect a dfops pointer implicitly via xfs_trans->t_dfops. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove xfs_bmapi_write() dfops paramBrian Foster1-4/+3
Now that all callers use ->t_dfops, the xfs_bmapi_write() dfops parameter is no longer necessary. Remove it and access ->t_dfops directly. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: use ->t_dfops for all xfs_bmapi_write() callersBrian Foster1-9/+12
Attach ->t_dfops for all remaining callers of xfs_bmapi_write(). This prepares the latter to no longer require a separate dfops parameter. Note that xfs_symlink() already uses ->t_dfops. Fix up the local references for consistency. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: update my copyrights for the writeback and iomap codeChristoph Hellwig1-1/+1
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: add support for sub-pagesize writeback without buffer_headsChristoph Hellwig1-3/+0
Switch to using the iomap_page structure for checking sub-page uptodate status and track sub-page I/O completion status, and remove large quantities of boilerplate code working around buffer heads. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: allow writeback on pages without buffer headsChristoph Hellwig1-1/+2
Disable the IOMAP_F_BUFFER_HEAD flag on file systems with a block size equal to the page size, and deal with pages without buffer heads in writeback. Thanks to the previous refactoring this is basically trivial now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: move locking into xfs_bmap_punch_delalloc_rangeChristoph Hellwig1-3/+0
Both callers want the same looking, so do it only once. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11Merge branch 'iomap-4.19-merge' into xfs-4.19-mergeDarrick J. Wong1-2/+4
2018-06-24xfs: recheck reflink state after grabbing ILOCK_SHARED for a writeDarrick J. Wong1-1/+14
The reflink iflag could have changed since the earlier unlocked check, so if we got ILOCK_SHARED for a write and but we're now a reflink inode we have to switch to ILOCK_EXCL and relock. This helps us avoid blowing lock assertions in things like generic/166: XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/xfs_reflink.c, line: 383 WARNING: CPU: 1 PID: 24707 at fs/xfs/xfs_message.c:104 assfail+0x25/0x30 [xfs] Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug] CPU: 1 PID: 24707 Comm: xfs_io Not tainted 4.18.0-rc1-djw #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:assfail+0x25/0x30 [xfs] Code: ff 0f 0b c3 90 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 e8 ef 1b a0 48 89 fa 31 ff e8 54 f9 ff ff 80 3d fd ba 0f 00 00 75 03 <0f> 0b c3 0f 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 63 f6 49 89 f9 RSP: 0018:ffffc90006423ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880030b65e80 RCX: 0000000000000000 RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa01b0447 RBP: ffffc90006423c10 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88003d43fc30 R11: f000000000000000 R12: ffff880077cda000 R13: 0000000000000000 R14: ffffc90006423c30 R15: ffffc90006423bf9 FS: 00007feba8986800(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000138ab58 CR3: 000000003d40a000 CR4: 00000000000006a0 Call Trace: xfs_reflink_allocate_cow+0x24c/0x3d0 [xfs] xfs_file_iomap_begin+0x6d2/0xeb0 [xfs] ? iomap_to_fiemap+0x80/0x80 iomap_apply+0x5e/0x130 iomap_dio_rw+0x2e0/0x400 ? iomap_to_fiemap+0x80/0x80 ? xfs_file_dio_aio_write+0x133/0x4a0 [xfs] xfs_file_dio_aio_write+0x133/0x4a0 [xfs] xfs_file_write_iter+0x7b/0xb0 [xfs] __vfs_write+0x16f/0x1f0 vfs_write+0xc8/0x1c0 ksys_pwrite64+0x74/0x90 do_syscall_64+0x56/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-20iomap: add initial support for writes without buffer headsChristoph Hellwig1-2/+4
For now just limited to blocksize == PAGE_SIZE, where we can simply read in the full page in write begin, and just set the whole page dirty after copying data into it. This code is enabled by default and XFS will now be feed pages without buffer heads in ->writepage and ->writepages. If a file system sets the IOMAP_F_BUFFER_HEAD flag on the iomap the old path will still be used, this both helps the transition in XFS and prepares for the gfs2 migration to the iomap infrastructure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-08xfs: clean up MIN/MAXDave Chinner1-3/+3
Get rid of the MIN/MAX macros and just use the native min/max macros directly in the XFS code. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06xfs: convert to SPDX license tagsDave Chinner1-13/+1
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-10xfs: split out dqget for inodes from regular dqgetDarrick J. Wong1-1/+1
There are two uses of dqget here -- one is to return the dquot for a given type and id, and the other is to return the dquot for a given type and inode. Those are two separate things, so split them into two smaller functions. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-10xfs: remove unnecessary xfs_qm_dqattach parameterDarrick J. Wong1-2/+2
The flags argument is always zero, get rid of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-09xfs: clean up locking in xfs_file_iomap_beginDave Chinner1-37/+61
Rather than checking what kind of locking is needed in a helper function and then jumping through hoops to do the locking in line, move the locking to the helper function that does all the checks and rename it to xfs_ilock_for_iomap(). This also allows us to hoist all the nonblocking checks up into the locking helper, further simplifier the code flow in xfs_file_iomap_begin() and making it easier to understand. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-09xfs: simplify xfs_file_iomap_begin() logicDave Chinner1-36/+46
The current logic that determines whether allocation should be done has grown somewhat spaghetti like with the addition of IOMAP_NOWAIT functionality. Separate out each of the different cases into single, obvious checks to get rid most of the nested IOMAP_NOWAIT checks in the allocation logic. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-01xfs: don't block on the ilock for RWF_NOWAITChristoph Hellwig1-8/+19
Fix xfs_file_iomap_begin to trylock the ilock if IOMAP_NOWAIT is passed, so that we don't block io_submit callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-01xfs: don't start out with the exclusive ilock for direct I/OChristoph Hellwig1-4/+2
There is no reason to take the ilock exclusively at the start of xfs_file_iomap_begin for direct I/O, given that it will be demoted just before calling xfs_iomap_write_direct anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-01xfs: don't allocate COW blocks for zeroing holes or unwritten extentsChristoph Hellwig1-1/+10
The iomap zeroing interface is smart enough to skip zeroing holes or unwritten extents. Don't subvert this logic for reflink files. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-02xfs: fix s_maxbytes overflow problemsDarrick J. Wong1-1/+1
Fix some integer overflow problems if offset + count happen to be large enough to cause an integer overflow. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-08xfs: make iomap_begin functions trim iomaps consistentlyDarrick J. Wong1-1/+1
Historically, the XFS iomap_begin function only returned mappings for exactly the range queried, i.e. it doesn't do XFS_BMAPI_ENTIRE lookups. The current vfs iomap consumers are only set up to deal with trimmed mappings. xfs_xattr_iomap_begin does BMAPI_ENTIRE lookups, which is inconsistent with the current iomap usage. Remove the flag so that both iomap_begin functions behave the same way. FWIW this also fixes a behavioral regression in xattr FIEMAP that was introduced in 4.8 wherein attr fork extents are no longer trimmed like they used to be. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-11-17Merge tag 'libnvdimm-for-4.15' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "Save for a few late fixes, all of these commits have shipped in -next releases since before the merge window opened, and 0day has given a build success notification. The ext4 touches came from Jan, and the xfs touches have Darrick's reviewed-by. An xfstest for the MAP_SYNC feature has been through a few round of reviews and is on track to be merged. - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable 'userspace flush' of persistent memory updates via filesystem-dax mappings. It arranges for any filesystem metadata updates that may be required to satisfy a write fault to also be flushed ("on disk") before the kernel returns to userspace from the fault handler. Effectively every write-fault that dirties metadata completes an fsync() before returning from the fault handler. The new MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag is validated as supported by the filesystem's ->mmap() file operation. - Add support for the standard ACPI 6.2 label access methods that replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This enables interoperability with environments that only implement the standardized methods. - Add support for the ACPI 6.2 NVDIMM media error injection methods. - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch last shutdown status, firmware update, SMART error injection, and SMART alarm threshold control. - Cleanup physical address information disclosures to be root-only. - Fix revalidation of the DIMM "locked label area" status to support dynamic unlock of the label area. - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA (system-physical-address) command and error injection commands. Acknowledgements that came after the commits were pushed to -next: - 957ac8c421ad ("dax: fix PMD faults on zero-length files"): Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> - a39e596baa07 ("xfs: support for synchronous DAX faults") and 7b565c9f965b ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>" * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits) acpi, nfit: add 'Enable Latch System Shutdown Status' command support dax: fix general protection fault in dax_alloc_inode dax: fix PMD faults on zero-length files dax: stop requiring a live device for dax_flush() brd: remove dax support dax: quiet bdev_dax_supported() fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core tools/testing/nvdimm: unit test clear-error commands acpi, nfit: validate commands against the device type tools/testing/nvdimm: stricter bounds checking for error injection commands xfs: support for synchronous DAX faults xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault() ext4: Support for synchronous DAX faults ext4: Simplify error handling in ext4_dax_huge_fault() dax: Implement dax_finish_sync_fault() dax, iomap: Add support for synchronous faults mm: Define MAP_SYNC and VM_SYNC flags dax: Allow tuning whether dax_insert_mapping_entry() dirties entry dax: Allow dax_iomap_fault() to return pfn dax: Fix comment describing dax_iomap_fault() ...
2017-11-14Merge tag 'xfs-4.15-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-6/+9
Pull xfs updates from Darrick Wong: "xfs: great scads of new stuff for 4.15. This merge cycle, we're making some substantive changes to XFS. The in-core extent mappings have been refactored to use proper iterators and a btree to handle heavily fragmented files without needing high-order memory allocations; some important log recovery bug fixes; and the first part of the online fsck functionality. (The online fsck feature is disabled by default and more pieces of it will be coming in future release cycles.) This giant pile of patches has been run through a full xfstests run over the weekend and through a quick xfstests run against this morning's master, with no major failures reported. New in this version: - Refactor the incore extent map manipulations to use a cursor instead of directly modifying extent data. - Refactor the incore extent map cursor to use an in-memory btree instead of a single high-order allocation. This eliminates a major source of complaints about insufficient memory when opening a heavily fragmented file into a system whose memory is also heavily fragmented. - Fix a longstanding bug where deleting a file with a complex extended attribute btree incorrectly handled memory pointers, which could lead to memory corruption. - Improve metadata validation to eliminate crashing problems found while fuzzing xfs. - Move the error injection tag definitions into libxfs to be shared with userspace components. - Fix some log recovery bugs where we'd underflow log block position vector and incorrectly fail log recovery. - Drain the buffer lru after log recovery to force recovered buffers back through the verifiers after mount. On a v4 filesystem the log never attaches verifiers during log replay (v5 does), so we could end up with buffers marked verified but without having ever been verified. - Fix various other bugs. - Introduce the first part of a new online fsck tool. The new fsck tool will be able to iterate every piece of metadata in the filesystem to look for obvious errors and corruptions. In the next release cycle the checking will be extended to cross-reference with the other fs metadata, so this feature should only be used by the developers in the mean time" * tag 'xfs-4.15-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (131 commits) xfs: on failed mount, force-reclaim inodes after unmounting quota controls xfs: check the uniqueness of the AGFL entries xfs: remove u_int* type usage xfs: handle zero entries case in xfs_iext_rebalance_leaf xfs: add comments documenting the rebalance algorithm xfs: trivial indentation fixup for xfs_iext_remove_node xfs: remove a superflous assignment in xfs_iext_remove_node xfs: add some comments to xfs_iext_insert/xfs_iext_insert_node xfs: fix number of records handling in xfs_iext_split_leaf fs/xfs: Remove NULL check before kmem_cache_destroy xfs: only check da node header padding on v5 filesystems xfs: fix btree scrub deref check xfs: fix uninitialized return values in scrub code xfs: pass inode number to xfs_scrub_ino_set_{preen,warning} xfs: refactor the directory data block bestfree checks xfs: mark xlog_verify_dest_ptr STATIC xfs: mark xlog_recover_check_summary STATIC xfs: mark xfs_btree_check_lblock and xfs_btree_check_ptr static xfs: remove unreachable error injection code in xfs_qm_dqget xfs: remove unused debug counts for xfs_lock_inodes ...
2017-11-13fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax coreDan Williams1-2/+2
While reviewing whether MAP_SYNC should strengthen its current guarantee of syncing writes from the initiating process to also include third-party readers observing dirty metadata, Dave pointed out that the check of IOMAP_WRITE is misplaced. The policy of what to with IOMAP_F_DIRTY should be separated from the generic filesystem mechanism of reporting dirty metadata. Move this policy to the fs-dax core to simplify the per-filesystem iomap handlers, and further centralize code that implements the MAP_SYNC policy. This otherwise should not change behavior, it just makes it easier to change behavior in the future. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-06xfs: introduce the xfs_iext_cursor abstractionChristoph Hellwig1-6/+8
Add a new xfs_iext_cursor structure to hide the direct extent map index manipulations. In addition to the existing lookup/get/insert/ remove and update routines new primitives to get the first and last extent cursor, as well as moving up and down by one extent are provided. Also new are convenience to increment/decrement the cursor and retreive the new extent, as well as to peek into the previous/next extent without updating the cursor and last but not least a macro to iterate over all extents in a fork. [darrick: rename for_each_iext to for_each_xfs_iext] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-03xfs: support for synchronous DAX faultsChristoph Hellwig1-0/+5
Return IOMAP_F_DIRTY from xfs_file_iomap_begin() when asked to prepare blocks for writing and the inode is pinned, and has dirty fields other than the timestamps. In __xfs_filemap_fault() we then detect this case and call dax_finish_sync_fault() to make sure all metadata is committed, and to insert the page table entry. Note that this will also dirty corresponding radix tree entry which is what we want - fsync(2) will still provide data integrity guarantees for applications not using userspace flushing. And applications using userspace flushing can avoid calling fsync(2) and thus avoid the performance overhead. [JK: Added VM_SYNC flag handling] Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-01xfs: move error injection tags into their own fileDarrick J. Wong1-0/+1
Move the error injection tag names into a libxfs header so that we can share it between kernel and userspace. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2017-10-01iomap: Switch from blkno to disk offsetAndreas Gruenbacher1-3/+3
Replace iomap->blkno, the sector number, with iomap->addr, the disk offset in bytes. For invalid disk offsets, use the special value IOMAP_NULL_ADDR instead of IOMAP_NULL_BLOCK. This allows to use iomap for mappings which are not block aligned, such as inline data on ext4. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> # iomap, xfs Reviewed-by: Jan Kara <jack@suse.cz>
2017-09-26xfs: update i_size after unwritten conversion in dio completionEryu Guan1-2/+5
Since commit d531d91d6990 ("xfs: always use unwritten extents for direct I/O writes"), we start allocating unwritten extents for all direct writes to allow appending aio in XFS. But for dio writes that could extend file size we update the in-core inode size first, then convert the unwritten extents to real allocations at dio completion time in xfs_dio_write_end_io(). Thus a racing direct read could see the new i_size and find the unwritten extents first and read zeros instead of actual data, if the direct writer also takes a shared iolock. Fix it by updating the in-core inode size after the unwritten extent conversion. To do this, introduce a new boolean argument to xfs_iomap_write_unwritten() to tell if we want to update in-core i_size or not. Suggested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-11Merge tag 'libnvdimm-for-4.14' of ↵Linus Torvalds1-9/+1
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm from Dan Williams: "A rework of media error handling in the BTT driver and other updates. It has appeared in a few -next releases and collected some late- breaking build-error and warning fixups as a result. Summary: - Media error handling support in the Block Translation Table (BTT) driver is reworked to address sleeping-while-atomic locking and memory-allocation-context conflicts. - The dax_device lookup overhead for xfs and ext4 is moved out of the iomap hot-path to a mount-time lookup. - A new 'ecc_unit_size' sysfs attribute is added to advertise the read-modify-write boundary property of a persistent memory range. - Preparatory fix-ups for arm and powerpc pmem support are included along with other miscellaneous fixes" * tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits) libnvdimm, btt: fix format string warnings libnvdimm, btt: clean up warning and error messages ext4: fix null pointer dereference on sbi libnvdimm, nfit: move the check on nd_reserved2 to the endpoint dax: fix FS_DAX=n BLOCK=y compilation libnvdimm: fix integer overflow static analysis warning libnvdimm, nd_blk: remove mmio_flush_range() libnvdimm, btt: rework error clearing libnvdimm: fix potential deadlock while clearing errors libnvdimm, btt: cache sector_size in arena_info libnvdimm, btt: ensure that flags were also unchanged during a map_read libnvdimm, btt: refactor map entry operations with macros libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute ext4: perform dax_device lookup at mount ext2: perform dax_device lookup at mount xfs: perform dax_device lookup at mount dax: introduce a fs_dax_get_by_bdev() helper libnvdimm, btt: check memory allocation failure libnvdimm, label: fix index block size calculation ...
2017-09-01xfs: remove unused flags arg from xfs_file_iomap_begin_delayEric Sandeen1-3/+1
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01xfs: remove the ip argument to xfs_defer_finishChristoph Hellwig1-3/+3
And instead require callers to explicitly join the inode using xfs_defer_ijoin. Also consolidate the defer error handling in a few places using a goto label. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-08-31xfs: perform dax_device lookup at mountDan Williams1-9/+1
The ->iomap_begin() operation is a hot path, so cache the fs_dax_get_by_host() result at mount time to avoid the incurring the hash lookup overhead on a per-i/o basis. Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-07-10Merge tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-2/+2
Pull XFS updates from Darrick Wong: "Here are some changes for you for 4.13. For the most part it's fixes for bugs and deadlock problems, and preparation for online fsck in some future merge window. - Avoid quotacheck deadlocks - Fix transaction overflows when bunmapping fragmented files - Refactor directory readahead - Allow admin to configure if ASSERT is fatal - Improve transaction usage detail logging during overflows - Minor cleanups - Don't leak log items when the log shuts down - Remove double-underscore typedefs - Various preparation for online scrubbing - Introduce new error injection configuration sysfs knobs - Refactor dq_get_next to use extent map directly - Fix problems with iterating the page cache for unwritten data - Implement SEEK_{HOLE,DATA} via iomap - Refactor XFS to use iomap SEEK_HOLE and SEEK_DATA - Don't use MAXPATHLEN to check on-disk symlink target lengths" * tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits) xfs: don't crash on unexpected holes in dir/attr btrees xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN xfs: fix contiguous dquot chunk iteration livelock xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA vfs: Add iomap_seek_hole and iomap_seek_data helpers vfs: Add page_cache_seek_hole_data helper xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk xfs: rewrite xfs_dq_get_next_id using xfs_iext_lookup_extent xfs: Check for m_errortag initialization in xfs_errortag_test xfs: grab dquots without taking the ilock xfs: fix semicolon.cocci warnings xfs: Don't clear SGID when inheriting ACLs xfs: free cowblocks and retry on buffered write ENOSPC xfs: replace log_badcrc_factor knob with error injection tag xfs: convert drop_writes to use the errortag mechanism xfs: remove unneeded parameter from XFS_TEST_ERROR xfs: expose errortag knobs via sysfs xfs: make errortag a per-mountpoint structure xfs: free uncommitted transactions during log recovery xfs: don't allow bmap on rt files ...
2017-06-27xfs: convert drop_writes to use the errortag mechanismDarrick J. Wong1-1/+1
We now have enhanced error injection that can control the frequency with which errors happen, so convert drop_writes to use this. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2017-06-27xfs: remove unneeded parameter from XFS_TEST_ERRORDarrick J. Wong1-1/+1
Since we moved the injected error frequency controls to the mountpoint, we can get rid of the last argument to XFS_TEST_ERROR. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2017-06-20xfs: nowait aio supportGoldwyn Rodrigues1-0/+22
If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable immediately. IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin if it needs allocation either due to file extension, writing to a hole, or COW or waiting for other DIOs to finish. Return -EAGAIN if we don't have extent list in memory. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-05-13dax, xfs, ext4: compile out iomap-dax paths in the FS_DAX=n caseDan Williams1-2/+2
Tetsuo reports: fs/built-in.o: In function `xfs_file_iomap_end': xfs_iomap.c:(.text+0xe0ef9): undefined reference to `put_dax' fs/built-in.o: In function `xfs_file_iomap_begin': xfs_iomap.c:(.text+0xe1a7f): undefined reference to `dax_get_by_host' make: *** [vmlinux] Error 1 $ grep DAX .config CONFIG_DAX=m # CONFIG_DEV_DAX is not set # CONFIG_FS_DAX is not set When FS_DAX=n we can/must throw away the dax code in filesystems. Implement 'fs_' versions of dax_get_by_host() and put_dax() that are nops in the FS_DAX=n case. Cc: <linux-xfs@vger.kernel.org> Cc: <linux-ext4@vger.kernel.org> Cc: Jan Kara <jack@suse.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Fixes: ef51042472f5 ("block, dax: move 'select DAX' from BLOCK to FS_DAX") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-06Merge tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-4/+4
Pull xfs updates from Darrick Wong: "Here are the XFS changes for 4.12. The big new feature for this release is the new space mapping ioctl that we've been discussing since LSF2016, but other than that most of the patches are larger bug fixes, memory corruption prevention, and other cleanups. Summary: - various code cleanups - introduce GETFSMAP ioctl - various refactoring - avoid dio reads past eof - fix memory corruption and other errors with fragmented directory blocks - fix accidental userspace memory corruptions - publish fs uuid in superblock - make fstrim terminatable - fix race between quotaoff and in-core inode creation - avoid use-after-free when finishing up w/ buffer heads - reserve enough space to handle bmap tree resizing during cow remap" * tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits) xfs: fix use-after-free in xfs_finish_page_writeback xfs: reserve enough blocks to handle btree splits when remapping xfs: wait on new inodes during quotaoff dquot release xfs: update ag iterator to support wait on new inodes xfs: support ability to wait on new inodes xfs: publish UUID in struct super_block xfs: Allow user to kill fstrim process xfs: better log intent item refcount checking xfs: fix up quotacheck buffer list error handling xfs: remove xfs_trans_ail_delete_bulk xfs: don't use bool values in trace buffers xfs: fix getfsmap userspace memory corruption while setting OF_LAST xfs: fix __user annotations for xfs_ioc_getfsmap xfs: corruption needs to respect endianess too! xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmap xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmap xfs: simplify validation of the unwritten extent bit xfs: remove unused values from xfs_exntst_t xfs: remove the unused XFS_MAXLINK_1 define xfs: more do_div cleanups ...
2017-04-25ext2, ext4, xfs: retrieve dax_device for iomap operationsDan Williams1-0/+10
In preparation for converting fs/dax.c to use dax_direct_access() instead of bdev_direct_access(), add the plumbing to retrieve the dax_device associated with a given block_device. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-06xfs: actually report xattr extents via iomapDarrick J. Wong1-2/+2
Apparently FIEMAP for xattrs has been broken since we switched to the iomap backend because of an incorrect check for xattr presence. Also fix the broken locking. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-04-03xfs: remove the ISUNWRITTEN macroChristoph Hellwig1-2/+2
Opencoding the trivial checks makes it much easier to read (and grep..). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>