summaryrefslogtreecommitdiffstats
path: root/fs/xfs/libxfs/xfs_dir2_node.c
AgeCommit message (Collapse)AuthorFilesLines
2019-08-30xfs: reverse search directory freespace indexesDave Chinner1-8/+5
When a directory is growing rapidly, new blocks tend to get added at the end of the directory. These end up at the end of the freespace index, and when the directory gets large finding these new freespaces gets expensive. The code does a linear search across the frespace index from the first block in the directory to the last, hence meaning the newly added space is the last index searched. Instead, do a reverse order index search, starting from the last block and index in the freespace index. This makes most lookups for free space on rapidly growing directories O(1) instead of O(N), but should not have any impact on random insert workloads because the average search length is the same regardless of which end of the array we start at. The result is a major improvement in large directory grow rates: create time(sec) / rate (files/s) File count vanilla Prev commit Patched 10k 0.41 / 24.3k 0.42 / 23.8k 0.41 / 24.3k 20k 0.74 / 27.0k 0.76 / 26.3k 0.75 / 26.7k 100k 3.81 / 26.4k 3.47 / 28.8k 3.27 / 30.6k 200k 8.58 / 23.3k 7.19 / 27.8k 6.71 / 29.8k 1M 85.69 / 11.7k 48.53 / 20.6k 37.67 / 26.5k 2M 280.31 / 7.1k 130.14 / 15.3k 79.55 / 25.2k 10M 3913.26 / 2.5k 552.89 / 18.1k Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: speed up directory bestfree block scanningDave Chinner1-63/+34
When running a "create millions inodes in a directory" test recently, I noticed we were spending a huge amount of time converting freespace block headers from disk format to in-memory format: 31.47% [kernel] [k] xfs_dir2_node_addname 17.86% [kernel] [k] xfs_dir3_free_hdr_from_disk 3.55% [kernel] [k] xfs_dir3_free_bests_p We shouldn't be hitting the best free block scanning code so hard when doing sequential directory creates, and it turns out there's a highly suboptimal loop searching the the best free array in the freespace block - it decodes the block header before checking each entry inside a loop, instead of decoding the header once before running the entry search loop. This makes a massive difference to create rates. Profile now looks like this: 13.15% [kernel] [k] xfs_dir2_node_addname 3.52% [kernel] [k] xfs_dir3_leaf_check_int 3.11% [kernel] [k] xfs_log_commit_cil And the wall time/average file create rate differences are just as stark: create time(sec) / rate (files/s) File count vanilla patched 10k 0.41 / 24.3k 0.42 / 23.8k 20k 0.74 / 27.0k 0.76 / 26.3k 100k 3.81 / 26.4k 3.47 / 28.8k 200k 8.58 / 23.3k 7.19 / 27.8k 1M 85.69 / 11.7k 48.53 / 20.6k 2M 280.31 / 7.1k 130.14 / 15.3k The larger the directory, the bigger the performance improvement. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: factor free block index lookup from xfs_dir2_node_addname_int()Dave Chinner1-92/+102
Simplify the logic in xfs_dir2_node_addname_int() by factoring out the free block index lookup code that finds a block with enough free space for the entry to be added. The code that is moved gets a major cleanup at the same time, but there is no algorithm change here. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: factor data block addition from xfs_dir2_node_addname_int()Dave Chinner1-166/+158
Factor out the code that adds a data block to a directory from xfs_dir2_node_addname_int(). This makes the code flow cleaner and more obvious and provides clear isolation of upcoming optimsations. Signed-off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: move xfs_dir2_addname()Dave Chinner1-71/+69
This gets rid of the need for a forward declaration of the static function xfs_dir2_addname_int() and readies the code for factoring of xfs_dir2_addname_int(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-12xfs: remove more ondisk directory corruption assertsDarrick J. Wong1-1/+2
Continue our game of replacing ASSERTs for corrupt ondisk metadata with EFSCORRUPTED returns. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2019-06-28xfs: remove unused header filesEric Sandeen1-3/+0
There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28xfs: add struct xfs_mount pointer to struct xfs_bufChristoph Hellwig1-3/+3
We need to derive the mount pointer from a buffer in a lot of place. Add a direct pointer to short cut the pointer chasing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28xfs: move xfs_ino_geometry to xfs_shared.hDarrick J. Wong1-0/+1
The inode geometry structure isn't related to ondisk format; it's support for the mount structure. Move it to xfs_shared.h. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-03-08xfs: clean up xfs_dir2_leafn_addDarrick J. Wong1-12/+8
Remove typedefs and consolidate local variable initialization. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
2019-03-08xfs: Zero initialize highstale and lowstale in xfs_dir2_leafn_addNathan Chancellor1-0/+2
When building with -Wsometimes-uninitialized, Clang warns: fs/xfs/libxfs/xfs_dir2_node.c:481:6: warning: variable 'lowstale' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] fs/xfs/libxfs/xfs_dir2_node.c:481:6: warning: variable 'highstale' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] While it isn't technically wrong, it isn't a problem in practice because highstale and lowstale are only initialized in xfs_dir2_leafn_add when compact is not zero then they are passed to xfs_dir3_leaf_find_entry, where they are initialized before use when compact is zero. Regardless, it's better not to be passing around uninitialized stack memory so zero initialize these variables, which silences this warning. Link: https://github.com/ClangBuiltLinux/linux/issues/393 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-11xfs: miscellaneous verifier magic value fixupsBrian Foster1-5/+5
Most buffer verifiers have hardcoded magic value checks conditionalized on the version of the filesystem. The magic value field of the verifier structure facilitates abstraction of some of this code. Populate the ->magic field of various verifiers to take advantage of this abstraction. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-17xfs: use swap macro in xfs_dir2_leafn_rebalanceGustavo A. R. Silva1-10/+7
Make use of the swap macro and remove unnecessary variable *tmp*. This makes the code easier to read and maintain. Also, slightly refactor some code. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06xfs: convert to SPDX license tagsDave Chinner1-13/+1
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-04xfs: explicitly pass buffer size to xfs_corruption_errorDarrick J. Wong1-1/+2
Explicitly pass the buffer length to xfs_corruption_error() instead of assuming XFS_CORRUPTION_DUMP_LEN so that we avoid dumping off the end of the buffer. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-03-23xfs: sanity-check the unused space before trying to use itDarrick J. Wong1-3/+8
In xfs_dir2_data_use_free, we examine on-disk metadata and ASSERT if it doesn't make sense. Since a carefully crafted fuzzed image can cause the kernel to crash after blowing a bunch of assertions, let's move those checks into a validator function and rig everything up to return EFSCORRUPTED to userspace. Found by lastbit fuzzing ltail.bestcount via xfs/391. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-03-11xfs: convert a few more directory asserts to corruptionDarrick J. Wong1-2/+3
Yet another round of playing whack-a-mole with directory code that asserts on corrupt on-disk metadata when it really should be returning -EFSCORRUPTED instead of ASSERTing. Found by a xfs/391 crash while lastbit fuzzing of ltail.bestcount. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-29Split buffer's b_fspriv fieldCarlos Maiolino1-1/+1
By splitting the b_fspriv field into two different fields (b_log_item and b_li_list). It's possible to get rid of an old ABI workaround, by using the new b_log_item field to store xfs_buf_log_item separated from the log items attached to the buffer, which will be linked in the new b_li_list field. This way, there is no more need to reorder the log items list to place the buf_log_item at the beginning of the list, simplifying a bit the logic to handle buffer IO. This also opens the possibility to change buffer's log items list into a proper list_head. b_log_item field is still defined as a void *, because it is still used by the log buffers to store xlog_in_core structures, and there is no need to add an extra field on xfs_buf just for xlog_in_core. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: minor style changes] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-12xfs: use %px for data pointers when debuggingDarrick J. Wong1-1/+1
Starting with commit 57e734423ad ("vsprintf: refactor %pK code out of pointer"), the behavior of the raw '%p' printk format specifier was changed to print a 32-bit hash of the pointer value to avoid leaking kernel pointers into dmesg. For most situations that's good. This is /undesirable/ behavior when we're trying to debug XFS, however, so define a PTR_FMT that prints the actual pointer when we're in debug mode. Note that %p for tracepoints still prints the raw pointer, so in the long run we could consider rewriting some of these messages as tracepoints. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-12xfs: change 0x%p -> %p in print messagesDarrick J. Wong1-1/+1
Since %p prepends "0x" to the outputted string, we can drop the prefix. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: create a new buf_ops pointer to verify structure metadataDarrick J. Wong1-0/+1
Expose all metadata structure buffer verifier functions via buf_ops. These will be used by the online scrub mechanism to look for problems with buffers that are already sitting around in memory. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: refactor verifier callers to print address of failing checkDarrick J. Wong1-8/+16
Refactor the callers of verifiers to print the instruction address of a failing check. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: have buffer verifier functions report failing addressDarrick J. Wong1-26/+35
Modify each function that checks the contents of a metadata buffer to return the instruction address of the failing test so that we can report more precise failure errors to the log. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: refactor xfs_verifier_error and xfs_buf_ioerrorDarrick J. Wong1-9/+4
Since all verification errors also mark the buffer as having an error, we can combine these two calls. Later we'll add a xfs_failaddr_t parameter to promote the idea of reporting corruption errors and the address of the failing check to enable better debugging reports. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2017-06-20xfs: return the hash value of a leaf1 directory blockDarrick J. Wong1-4/+6
Modify the existing dir leafn lasthash function to enable us to calculate the highest hash value of a leaf1 block. This will be used by the directory scrubbing code to check the sanity of hashes in leaf1 directory blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-02-02xfs: verify free block header fieldsDarrick J. Wong1-2/+49
Perform basic sanity checking of the directory free block header fields so that we avoid hanging the system on invalid data. (Granted that just means that now we shutdown on directory write, but that seems better than hanging...) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-03-15xfs: always set rvalp in xfs_dir2_node_trim_freeChristoph Hellwig1-1/+3
xfs_dir2_node_trim_free can return with setting the rvalp argument pointer. Initialize it to 0 at the beginning of the function and only update it to 1 if we succeeded trimming a freespace block. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04xfs: print name of verifier if it failsEric Sandeen1-0/+1
This adds a name to each buf_ops structure, so that if a verifier fails we can print the type of verifier that failed it. Should be a slight debugging aid, I hope. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-10-12xfs: validate metadata LSNs against log on v5 superblocksBrian Foster1-0/+3
Since the onset of v5 superblocks, the LSN of the last modification has been included in a variety of on-disk data structures. This LSN is used to provide log recovery ordering guarantees (e.g., to ensure an older log recovery item is not replayed over a newer target data structure). While this works correctly from the point a filesystem is formatted and mounted, userspace tools have some problematic behaviors that defeat this mechanism. For example, xfs_repair historically zeroes out the log unconditionally (regardless of whether corruption is detected). If this occurs, the LSN of the filesystem is reset and the log is now in a problematic state with respect to on-disk metadata structures that might have a larger LSN. Until either the log catches up to the highest previously used metadata LSN or each affected data structure is modified and written out without incident (which resets the metadata LSN), log recovery is susceptible to filesystem corruption. This problem is ultimately addressed and repaired in the associated userspace tools. The kernel is still responsible to detect the problem and notify the user that something is wrong. Check the superblock LSN at mount time and fail the mount if it is invalid. From that point on, trigger verifier failure on any metadata I/O where an invalid LSN is detected. This results in a filesystem shutdown and guarantees that we do not log metadata changes with invalid LSNs on disk. Since this is a known issue with a known recovery path, present a warning to instruct the user how to recover. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-25Merge branch 'xfs-misc-fixes-for-4.3-3' into for-nextDave Chinner1-1/+9
2015-08-25xfs: Fix file type directory corruption for btree directoriesJan Kara1-1/+9
Users have occasionally reported that file type for some directory entries is wrong. This mostly happened after updating libraries some libraries. After some debugging the problem was traced down to xfs_dir2_node_replace(). The function uses args->filetype as a file type to store in the replaced directory entry however it also calls xfs_da3_node_lookup_int() which will store file type of the current directory entry in args->filetype. Thus we fail to change file type of a directory entry to a proper type. Fix the problem by storing new file type in a local variable before calling xfs_da3_node_lookup_int(). cc: <stable@vger.kernel.org> # 3.16 - 4.x Reported-by: Giacomo Comes <comes@naic.edu> Signed-off-by: Jan Kara <jack@suse.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-07-29Merge branch 'xfs-meta-uuid' into for-nextDave Chinner1-2/+2
2015-07-29xfs: create new metadata UUID field and incompat flagEric Sandeen1-2/+2
This adds a new superblock field, sb_meta_uuid. If set, along with a new incompat flag, the code will use that field on a V5 filesystem to compare to metadata UUIDs, which allows us to change the user- visible UUID at will. Userspace handles the setting and clearing of the incompat flag as appropriate, as the UUID gets changed; i.e. setting the user-visible UUID back to the original UUID (as stored in the new field) will remove the incompatible feature flag. If the incompat flag is not set, this copies the user-visible UUID into into the meta_uuid slot in memory when the superblock is read from disk; the meta_uuid field is not written back to disk in this case. The remainder of this patch simply switches verifiers, initializers, etc to use the new sb_meta_uuid field. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-07-29xfs: Use consistent logging message prefixesJoe Perches1-2/+1
The second and subsequent lines of multi-line logging messages are not prefixed with the same information as the first line. Separate messages with newlines into multiple calls to ensure consistent prefixing and allow easier grep use. Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-12-04Merge branch 'xfs-misc-fixes-for-3.19-2' into for-nextDave Chinner1-12/+0
Conflicts: fs/xfs/xfs_iops.c
2014-12-04xfs: fix set-but-unused warningsDave Chinner1-12/+0
The kernel compile doesn't turn on these checks by default, so it's only when I do a kernel-user sync that I find that there are lots of compiler warnings waiting to be fixed. Fix up these set-but-unused warnings. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-11-28xfs: move most of xfs_sb.h to xfs_format.hChristoph Hellwig1-1/+0
More on-disk format consolidation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-11-28xfs: merge xfs_ag.h into xfs_format.hChristoph Hellwig1-1/+0
More on-disk format consolidation. A few declarations that weren't on-disk format related move into better suitable spots. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-06-25xfs: global error sign conversionDave Chinner1-20/+20
Convert all the errors the core XFs code to negative error signs like the rest of the kernel and remove all the sign conversion we do in the interface layers. Errors for conversion (and comparison) found via searches like: $ git grep " E" fs/xfs $ git grep "return E" fs/xfs $ git grep " E[A-Z].*;$" fs/xfs Negation points found via searches like: $ git grep "= -[a-z,A-Z]" fs/xfs $ git grep "return -[a-z,A-D,F-Z]" fs/xfs $ git grep " -[a-z].*;" fs/xfs [ with some bits I missed from Brian Foster ] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-06-25libxfs: move source filesDave Chinner1-0/+2284
Move all the source files that are shared with userspace into libxfs/. This is done as one big chunk simpy to get it done quickly Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>