summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2022-04-21xfs: convert AGI log flags to unsigned.Dave Chinner3-19/+19
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: convert AGF log flags to unsigned.Dave Chinner3-25/+25
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: convert bmapi flags to unsigned.Dave Chinner2-29/+29
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: convert bmap extent type flags to unsigned.Dave Chinner2-18/+18
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: convert scrub type flags to unsigned.Dave Chinner1-8/+8
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. This touches xfs_fs.h so affects the user API, but the user API fields are also unsigned so the flags should really be unsigned, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: convert attr type flags to unsigned.Dave Chinner1-4/+4
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: CIL context doesn't need to count iovecsDave Chinner2-17/+6
Now that we account for log opheaders in the log item formatting code, we don't actually use the aggregated count of log iovecs in the CIL for anything. Remove it and the tracking code that calculates it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: xlog_write() doesn't need optype anymoreDave Chinner3-8/+4
So remove it from the interface and callers. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: xlog_write() no longer needs contwr stateDave Chinner1-18/+10
The rework of xlog_write() no longer requires xlog_get_iclog_state() to tell it about internal iclog space reservation state to direct it on what to do. Remove this parameter. $ size fs/xfs/xfs_log.o.* text data bss dec hex filename 26520 560 8 27088 69d0 fs/xfs/xfs_log.o.orig 26384 560 8 26952 6948 fs/xfs/xfs_log.o.patched Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: remove xlog_verify_dest_ptrChristoph Hellwig2-38/+1
Just check that the offset in xlog_write_vec is smaller than the iclog size and remove the expensive cycling through all iclogs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: introduce xlog_write_partial()Dave Chinner2-236/+196
Re-implement writing of a log vector that does not fit into the current iclog. The iclog will already be in XLOG_STATE_WANT_SYNC because xlog_get_iclog_space() will have reserved all the remaining iclog space for us, hence we can simply iterate over the iovecs in the log vector getting more iclog space until the entire log vector is written. Handling this partial write case separately means we do need to pass unnecessary state around for the common, fast path case when the log vector fits entirely within the current iclog. It isolates the complexity and allows us to modify and improve the partial write case without impacting the simple fast path. This change includes several improvements incorporated from patches written by Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: introduce xlog_write_full()Dave Chinner1-3/+66
Introduce an optimised version of xlog_write() that is used when the entire write will fit in a single iclog. This greatly simplifies the implementation of writing a log vector chain into an iclog, and sets the ground work for a much more understandable xlog_write() implementation. This incorporates some factoring and simplifications proposed by Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: change the type of ic_datapChristoph Hellwig2-5/+4
Turn ic_datap from a char into a void pointer given that it points to arbitrary data. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [dgc: also remove (char *) cast in xlog_alloc_log()] Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: pass lv chain length into xlog_write()Dave Chinner3-39/+23
The caller of xlog_write() usually has a close accounting of the aggregated vector length contained in the log vector chain passed to xlog_write(). There is no need to iterate the chain to calculate he length of the data in xlog_write_calculate_len() if the caller is already iterating that chain to build it. Passing in the vector length avoids doing an extra chain iteration, which can be a significant amount of work given that large CIL commits can have hundreds of thousands of vectors attached to the chain. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: log ticket region debug is largely uselessDave Chinner2-121/+6
xlog_tic_add_region() is used to trace the regions being added to a log ticket to provide information in the situation where a ticket reservation overrun occurs. The information gathered is stored int the ticket, and dumped if xlog_print_tic_res() is called. For a front end struct xfs_trans overrun, the ticket only contains reservation tracking information - the ticket is never handed to the log so has no regions attached to it. The overrun debug information in this case comes from xlog_print_trans(), which walks the items attached to the transaction and dumps their attached formatted log vectors directly. It also dumps the ticket state, but that only contains reservation accounting and nothing else. Hence xlog_print_tic_res() never dumps region or overrun information from this path. xlog_tic_add_region() is actually called from xlog_write(), which means it is being used to track the regions seen in a CIL checkpoint log vector chain. In looking at CIL behaviour recently, I've seen 32MB checkpoints regularly exceed 250,000 regions in the LV chain. The log ticket debug code can track *15* regions. IOWs, if there is a ticket overrun in the CIL code, the ticket region tracking code is going to be completely useless for determining what went wrong. The only thing it can tell us is how much of an overrun occurred, and we really don't need extra debug information in the log ticket to tell us that. Indeed, the main place we call xlog_tic_add_region() is also adding up the number of regions and the space used so that xlog_write() knows how much will be written to the log. This is exactly the same information that log ticket is storing once we take away the useless region tracking array. Hence xlog_tic_add_region() is not useful, but can be called 250,000 times a CIL push... Just strip all that debug "information" out of the of the log ticket and only have it report reservation space information when an overrun occurs. This also reduces the size of a log ticket down by about 150 bytes... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: reserve space and initialise xlog_op_header in item formattingDave Chinner3-83/+99
Current xlog_write() adds op headers to the log manually for every log item region that is in the vector passed to it. While xlog_write() needs to stamp the transaction ID into the ophdr, we already know it's length, flags, clientid, etc at CIL commit time. This means the only time that xlog write really needs to format and reserve space for a new ophdr is when a region is split across two iclogs. Adding the opheader and accounting for it as part of the normal formatted item region means we simplify the accounting of space used by a transaction and we don't have to special case reserving of space in for the ophdrs in xlog_write(). It also means we can largely initialise the ophdr in transaction commit instead of xlog_write, making the xlog_write formatting inner loop much tighter. xlog_prepare_iovec() is now too large to stay as an inline function, so we move it out of line and into xfs_log.c. Object sizes: text data bss dec hex filename 1125934 305951 484 1432369 15db31 fs/xfs/built-in.a.before 1123360 305951 484 1429795 15d123 fs/xfs/built-in.a.after So the code is a roughly 2.5kB smaller with xlog_prepare_iovec() now out of line, even though it grew in size itself. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: move log iovec alignment to preparation functionDave Chinner1-11/+14
To include log op headers directly into the log iovec regions that the ophdrs wrap, we need to move the buffer alignment code from xlog_finish_iovec() to xlog_prepare_iovec(). This is because the xlog_op_header is only 12 bytes long, and we need the buffer that the caller formats their data into to be 8 byte aligned. Hence once we start prepending the ophdr in xlog_prepare_iovec(), we are going to need to manage the padding directly to ensure that the buffer pointer returned is correctly aligned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: log tickets don't need log client idDave Chinner6-62/+18
We currently set the log ticket client ID when we reserve a transaction. This client ID is only ever written to the log by a CIL checkpoint or unmount records, and so anything using a high level transaction allocated through xfs_trans_alloc() does not need a log ticket client ID to be set. For the CIL checkpoint, the client ID written to the journal is always XFS_TRANSACTION, and for the unmount record it is always XFS_LOG, and nothing else writes to the log. All of these operations tell xlog_write() exactly what they need to write to the log (the optype) and build their own opheaders for start, commit and unmount records. Hence we no longer need to set the client id in either the log ticket or the xfs_trans. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: embed the xlog_op_header in the commit recordDave Chinner2-18/+15
Remove the final case where xlog_write() has to prepend an opheader to a log transaction. Similar to the start record, the commit record is just an empty opheader with a XLOG_COMMIT_TRANS type, so we can just make this the payload for the region being passed to xlog_write() and remove the special handling in xlog_write() for the commit record. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: embed the xlog_op_header in the unmount recordDave Chinner1-11/+28
Remove another case where xlog_write() has to prepend an opheader to a log transaction. The unmount record + ophdr is smaller than the minimum amount of space guaranteed to be free in an iclog (2 * sizeof(ophdr)) and so we don't have to care about an unmount record being split across 2 iclogs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: only CIL pushes require a start recordDave Chinner2-52/+81
So move the one-off start record writing in xlog_write() out into the static header that the CIL push builds to write into the log initially. This simplifes the xlog_write() logic a lot. pahole on x86-64 confirms that the xlog_cil_trans_hdr is correctly 32 bit aligned and packed for copying the log op and transaction headers directly into the log as a single log region copy. struct xlog_cil_trans_hdr { struct xlog_op_header oph[2]; /* 0 24 */ struct xfs_trans_header thdr; /* 24 16 */ struct xfs_log_iovec lhdr[2]; /* 40 32 */ /* size: 72, cachelines: 2, members: 3 */ /* last cacheline: 8 bytes */ }; A wart is needed to handle the fact that length of the region the opheader points to doesn't include the opheader length. hence if we embed the opheader, we have to substract the opheader length from the length written into the opheader by the generic copying code. This will eventually go away when everything is converted to embedded opheaders. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: factor out the CIL transaction header buildingDave Chinner1-22/+39
It is static code deep in the middle of the CIL push logic. Factor it out into a helper so that it is clear and easy to modify separately. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: simplify local variable assignment in file write codeKaixu Xia1-6/+2
Get the struct inode pointer from iocb->ki_filp->f_mapping->host directly and the other variables are unnecessary, so simplify the local variables assignment. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: reorder iunlink remove operation in xfs_ifreeDave Chinner1-11/+13
The O_TMPFILE creation implementation creates a specific order of operations for inode allocation/freeing and unlinked list modification. Currently both are serialised by the AGI, so the order doesn't strictly matter as long as the are both in the same transaction. However, if we want to move the unlinked list insertions largely out from under the AGI lock, then we have to be concerned about the order in which we do unlinked list modification operations. O_TMPFILE creation tells us this order is inode allocation/free, then unlinked list modification. Change xfs_ifree() to use this same ordering on unlinked list removal. This way we always guarantee that when we enter the iunlinked list removal code from this path, we already have the AGI locked and we don't have to worry about lock nesting AGI reads inside unlink list locks because it's already locked and attached to the transaction. We can do this safely as the inode freeing and unlinked list removal are done in the same transaction and hence are atomic operations with respect to log recovery. Reported-by: Frank Hofmann <fhofmann@cloudflare.com> Fixes: 298f7bec503f ("xfs: pin inode backing buffer to the inode log item") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21MAINTAINERS: update IOMAP FILESYSTEM LIBRARY and XFS FILESYSTEMTiezhu Yang1-3/+0
In IOMAP FILESYSTEM LIBRARY and XFS FILESYSTEM, the M(ail): entry is redundant with the L(ist): entry, remove the redundant M(ail): entry. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21xfs: convert buffer flags to unsigned.Dave Chinner3-25/+25
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. This manifests as a compiler error such as: /kisskb/src/fs/xfs/./xfs_trace.h:432:2: note: in expansion of macro 'TP_printk' TP_printk("dev %d:%d daddr 0x%llx bbcount 0x%x hold %d pincount %d " ^ /kisskb/src/fs/xfs/./xfs_trace.h:440:5: note: in expansion of macro '__print_flags' __print_flags(__entry->flags, "|", XFS_BUF_FLAGS), ^ /kisskb/src/fs/xfs/xfs_buf.h:67:4: note: in expansion of macro 'XBF_UNMAPPED' { XBF_UNMAPPED, "UNMAPPED" } ^ /kisskb/src/fs/xfs/./xfs_trace.h:440:40: note: in expansion of macro 'XFS_BUF_FLAGS' __print_flags(__entry->flags, "|", XFS_BUF_FLAGS), ^ /kisskb/src/fs/xfs/./xfs_trace.h: In function 'trace_raw_output_xfs_buf_flags_class': /kisskb/src/fs/xfs/xfs_buf.h:46:23: error: initializer element is not constant #define XBF_UNMAPPED (1 << 31)/* do not map the buffer */ as __print_flags assigns XFS_BUF_FLAGS to a structure that uses an unsigned long for the flag. Since this results in the value of XBF_UNMAPPED causing a signed integer overflow, the result is technically undefined behavior, which gcc-5 does not accept as an integer constant. This is based on a patch from Arnd Bergman <arnd@arndb.de>. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-13xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flagsChandan Babu R2-1/+6
This commit enables XFS module to work with fs instances having 64-bit per-inode extent counters by adding XFS_SB_FEAT_INCOMPAT_NREXT64 flag to the list of supported incompat feature flags. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-13xfs: Enable bulkstat ioctl to support 64-bit per-inode extent countersChandan Babu R4-7/+32
The following changes are made to enable userspace to obtain 64-bit extent counters, 1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from xfs_bulkstat->bs_pad[] to hold 64-bit extent counter. 2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that it is capable of receiving 64-bit extent counters. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-13xfs: Decouple XFS_IBULK flags from XFS_IWALK flagsChandan Babu R3-3/+7
A future commit will add a new XFS_IBULK flag which will not have a corresponding XFS_IWALK flag. In preparation for the change, this commit separates XFS_IBULK_* flags from XFS_IWALK_* flags. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-13xfs: Conditionally upgrade existing inodes to use large extent countersChandan Babu R11-2/+75
This commit enables upgrading existing inodes to use large extent counters provided that underlying filesystem's superblock has large extent counter feature enabled. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-13xfs: Directory's data fork extent counter can never overflowChandan Babu R9-91/+28
The maximum file size that can be represented by the data fork extent counter in the worst case occurs when all extents are 1 block in length and each block is 1KB in size. With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with 1KB sized blocks, a file can reach upto, (2^31) * 1KB = 2TB This is much larger than the theoretical maximum size of a directory i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB. Since a directory's inode can never overflow its data fork extent counter, this commit removes all the overflow checks associated with it. xfs_dinode_verify() now performs a rough check to verify if a diretory's data fork is larger than 96GB. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-12xfs: use a separate frextents counter for rt extent reservationsDarrick J. Wong8-47/+99
As mentioned in the previous commit, the kernel misuses sb_frextents in the incore mount to reflect both incore reservations made by running transactions as well as the actual count of free rt extents on disk. This results in the superblock being written to the log with an underestimate of the number of rt extents that are marked free in the rtbitmap. Teaching XFS to recompute frextents after log recovery avoids operational problems in the current mount, but it doesn't solve the problem of us writing undercounted frextents which are then recovered by an older kernel that doesn't have that fix. Create an incore percpu counter to mirror the ondisk frextents. This new counter will track transaction reservations and the only time we will touch the incore super counter (i.e the one that gets logged) is when those transactions commit updates to the rt bitmap. This is in contrast to the lazysbcount counters (e.g. fdblocks), where we know that log recovery will always fix any incorrect counter that we log. As a bonus, we only take m_sb_lock at transaction commit time. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-12xfs: recalculate free rt extents after log recoveryDarrick J. Wong3-9/+71
I've been observing periodic corruption reports from xfs_scrub involving the free rt extent counter (frextents) while running xfs/141. That test uses an error injection knob to induce a torn write to the log, and an arbitrary number of recovery mounts, frextents will count fewer free rt extents than can be found the rtbitmap. The root cause of the problem is a combination of the misuse of sb_frextents in the incore mount to reflect both incore reservations made by running transactions as well as the actual count of free rt extents on disk. The following sequence can reproduce the undercount: Thread 1 Thread 2 xfs_trans_alloc(rtextents=3) xfs_mod_frextents(-3) <blocks> xfs_attr_set() xfs_bmap_attr_addfork() xfs_add_attr2() xfs_log_sb() xfs_sb_to_disk() xfs_trans_commit() <log flushed to disk> <log goes down> Note that thread 1 subtracts 3 from sb_frextents even though it never commits to using that space. Thread 2 writes the undercounted value to the ondisk superblock and logs it to the xattr transaction, which is then flushed to disk. At next mount, log recovery will find the logged superblock and write that back into the filesystem. At the end of log recovery, we reread the superblock and install the recovered undercounted frextents value into the incore superblock. From that point on, we've effectively leaked thread 1's transaction reservation. The correct fix for this is to separate the incore reservation from the ondisk usage, but that's a matter for the next patch. Because the kernel has been logging superblocks with undercounted frextents for a very long time and we don't demand that sysadmins run xfs_repair after a crash, fix the undercount by recomputing frextents after log recovery. Gating this on log recovery is a reasonable balance (I think) between correcting the problem and slowing down every mount attempt. Note that xfs_repair will fix undercounted frextents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-12xfs: pass explicit mount pointer to rtalloc query functionsDarrick J. Wong4-14/+17
Pass an explicit xfs_mount pointer to the rtalloc query functions so that they can support transactionless queries. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-12xfs: Use generic_file_open()Matthew Wilcox (Oracle)1-3/+1
Remove the open-coded check of O_LARGEFILE. This changes the errno to be the same as other filesystems; it was changed generically in 2.6.24 but that fix skipped XFS. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-11xfs: Introduce per-inode 64-bit extent countersChandan Babu R6-30/+204
This commit introduces new fields in the on-disk inode format to support 64-bit data fork extent counters and 32-bit attribute fork extent counters. The new fields will be used only when an inode has XFS_DIFLAG2_NREXT64 flag set. Otherwise we continue to use the regular 32-bit data fork extent counters and 16-bit attribute fork extent counters. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Suggested-by: Dave Chinner <dchinner@redhat.com>
2022-04-11xfs: Replace numbered inode recovery error messages with descriptive onesChandan Babu R1-30/+22
This commit also prints inode fields with invalid values instead of printing addresses of inode and buffer instances. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Suggested-by: Dave Chinner <dchinner@redhat.com>
2022-04-11xfs: Introduce macros to represent new maximum extent counts for data/attr forksChandan Babu R6-15/+55
This commit defines new macros to represent maximum extent counts allowed by filesystems which have support for large per-inode extent counters. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11xfs: Use uint64_t to count maximum blocks that can be used by BMBTChandan Babu R1-3/+3
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpersChandan Babu R4-1/+24
This commit adds the new per-inode flag XFS_DIFLAG2_NREXT64 to indicate that an inode supports 64-bit extent counters. This flag is also enabled by default on newly created inodes when the corresponding filesystem has large extent counter feature bit (i.e. XFS_FEAT_NREXT64) set. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64Chandan Babu R2-0/+3
XFS_FSOP_GEOM_FLAGS_NREXT64 indicates that the current filesystem instance supports 64-bit per-inode extent counters. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bitChandan Babu R3-0/+6
XFS_SB_FEAT_INCOMPAT_NREXT64 incompat feature bit will be set on filesystems which support large per-inode extent counters. This commit defines the new incompat feature bit and the corresponding per-fs feature bit (along with inline functions to work on it). Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectivelyChandan Babu R6-10/+10
A future commit will introduce a 64-bit on-disk data extent counter and a 32-bit on-disk attr extent counter. This commit promotes xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits in order to correctly handle in-core versions of these quantities. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextentsChandan Babu R1-2/+2
A future commit will increase the width of xfs_extnum_t in order to facilitate larger per-inode extent counters. Hence this patch now uses basic types to define xfs_log_dinode->[di_nextents|dianextents]. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11xfs: Introduce xfs_dfork_nextents() helperChandan Babu R5-21/+58
This commit replaces the macro XFS_DFORK_NEXTENTS() with the helper function xfs_dfork_nextents(). As of this commit, xfs_dfork_nextents() returns the same value as XFS_DFORK_NEXTENTS(). A future commit which extends inode's extent counter fields will add more logic to this helper. This commit also replaces direct accesses to xfs_dinode->di_[a]nextents with calls to xfs_dfork_nextents(). No functional changes have been made. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11xfs: Use xfs_extnum_t instead of basic data typesChandan Babu R5-5/+5
xfs_extnum_t is the type to use to declare variables which have values obtained from xfs_dinode->di_[a]nextents. This commit replaces basic types (e.g. uint32_t) with xfs_extnum_t for such variables. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11xfs: Introduce xfs_iext_max_nextents() helperChandan Babu R4-11/+16
xfs_iext_max_nextents() returns the maximum number of extents possible for one of data, cow or attribute fork. This helper will be extended further in a future commit when maximum extent counts associated with data/attribute forks are increased. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11xfs: Define max extent length based on on-disk format definitionChandan Babu R8-59/+64
The maximum extent length depends on maximum block count that can be stored in a BMBT record. Hence this commit defines MAXEXTLEN based on BMBT_BLOCKCOUNT_BITLEN. While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN. Suggested-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11xfs: Move extent count limits to xfs_format.hChandan Babu R2-7/+7
Maximum values associated with extent counters i.e. Maximum extent length, Maximum data extents and Maximum xattr extents are dictated by the on-disk format. Hence move these definitions over to xfs_format.h. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11xfs: Add XFS messages to printk indexJonathan Lassoff2-10/+21
In order for end users to quickly react to new issues that come up in production, it is proving useful to leverage the printk indexing system. This printk index enables kernel developers to use calls to printk() with changeable format strings (as they always have; no change of expectations), while enabling end users to examine format strings to detect changes. Since end users are using regular expressions to match messages printed through printk(), being able to detect changes in chosen format strings from release to release provides a useful signal to review printk()-matching regular expressions for any necessary updates. So that detailed XFS messages are captures by this printk index, this patch wraps the xfs_<level> and xfs_alert_tag functions. Signed-off-by: Jonathan Lassoff <jof@thejof.com> Reviewed-by: Chris Down <chris@chrisdown.name> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>