summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2013-10-30xfs: vectorise encoding/decoding directory headersDave Chinner11-281/+338
Conversion from on-disk structures to in-core header structures currently relies on magic number checks. If the magic number is wrong, but one of the supported values, we do the wrong thing with the encode/decode operation. Split these functions so that there are discrete operations for the specific directory format we are handling. In doing this, move all the header encode/decode functions to xfs_da_format.c as they are directly manipulating the on-disk format. It should be noted that all the growth in binary size is from xfs_da_format.c - the rest of the code actaully shrinks. text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2 789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3 789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4 789061 96802 1096 886959 d88af fs/xfs/xfs.o.p5 789733 96802 1096 887631 d8b4f fs/xfs/xfs.o.p6 791421 96802 1096 889319 d91e7 fs/xfs/xfs.o.p7 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30xfs: vectorise DA btree operationsDave Chinner10-81/+139
The remaining non-vectorised code for the directory structure is the node format blocks. This is shared with the attribute tree, and so is slightly more complex to vectorise. Introduce a "non-directory" directory ops structure that is attached to all non-directory inodes so that attribute operations can be vectorised for all inodes. Once we do this, we can vectorise all the da btree operations. Because this patch adds more infrastructure than it removes the binary size does not decrease: text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2 789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3 789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4 789061 96802 1096 886959 d88af fs/xfs/xfs.o.p5 789733 96802 1096 887631 d8b4f fs/xfs/xfs.o.p6 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30xfs: vectorise directory leaf operationsDave Chinner10-159/+218
Next step in the vectorisation process is the leaf block encode/decode operations. Most of the operations on leaves are handled by the data block vectors, so there are relatively few of them here. Because of all the shuffling of code and having to pass more state to some functions, this patch doesn't directly reduce the size of the binary. It does open up many more opportunities for factoring and optimisation, however. text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2 789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3 789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4 789061 96802 1096 886959 d88af fs/xfs/xfs.o.p5 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30xfs: vectorise directory data operations part 2Dave Chinner10-137/+186
Convert the rest of the directory data block encode/decode operations to vector format. This further reduces the size of the built binary: text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2 789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3 789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30xfs: vectorise directory data operationsDave Chinner9-205/+329
Following from the initial patches to vectorise the shortform directory encode/decode operations, convert half the data block operations to use the vector. The rest will be done in a second patch. This further reduces the size of the built binary: text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2 789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30xfs: vectorise remaining shortform dir2 opsDave Chinner6-183/+199
Following from the initial patch to introduce the directory operations vector, convert the rest of the shortform directory operations to use vectored ops rather than superblock feature checks. This further reduces the size of the built binary: text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30xfs: abstract the differences in dir2/dir3 via an ops vectorDave Chinner12-45/+132
Lots of the dir code now goes through switches to determine what is the correct on-disk format to parse. It generally involves a "xfs_sbversion_hasfoo" check, deferencing the superblock version and feature fields and hence touching several cache lines per operation in the process. Some operations do multiple checks because they nest conditional operations and they don't pass the information in a direct fashion between each other. Hence, add an ops vector to the xfs_inode structure that is configured when the inode is initialised to point to all the correct decode and encoding operations. This will significantly reduce the branchiness and cacheline footprint of the directory object decoding and encoding. This is the first patch in a series of conversion patches. It will introduce the ops structure, the setup of it and add the first operation to the vector. Subsequent patches will convert directory ops one at a time to keep the changes simple and obvious. Just this patch shows the benefit of such an approach on code size. Just converting the two shortform dir operations as this patch does decreases the built binary size by ~1500 bytes: $ size fs/xfs/xfs.o.orig fs/xfs/xfs.o.p1 text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 $ That's a significant decrease in the instruction cache footprint of the directory code for such a simple change, and indicates that this approach is definitely worth pursuing further. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23xfs: split xfs_rtalloc.c for userspace sanityDave Chinner4-1248/+1292
xfs_rtalloc.c is partially shared with userspace. Split the file up into two parts - one that is kernel private and the other which is wholly shared with userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23xfs: decouple inode and bmap btree header filesDave Chinner73-583/+403
Currently the xfs_inode.h header has a dependency on the definition of the BMAP btree records as the inode fork includes an array of xfs_bmbt_rec_host_t objects in it's definition. Move all the btree format definitions from xfs_btree.h, xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to xfs_format.h to continue the process of centralising the on-disk format definitions. With this done, the xfs inode definitions are no longer dependent on btree header files. The enables a massive culling of unnecessary includes, with close to 200 #include directives removed from the XFS kernel code base. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23xfs: decouple log and transaction headersDave Chinner75-239/+276
xfs_trans.h has a dependency on xfs_log.h for a couple of structures. Most code that does transactions doesn't need to know anything about the log, but this dependency means that they have to include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header files and clean up the includes to be in dependency order. In doing this, remove the direct include of xfs_trans_reserve.h from xfs_trans.h so that we remove the dependency between xfs_trans.h and xfs_mount.h. Hence the xfs_trans.h include can be moved to the indicate the actual dependencies other header files have on it. Note that these are kernel only header files, so this does not translate to any userspace changes at all. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23xfs: remove unused transaction callback variablesDave Chinner1-7/+0
We don't do callbacks at transaction commit time, no do we have any infrastructure to set up or run such callbacks, so remove the variables and typedefs for these operations. If we ever need to add callbacks, we can reintroduce the variables at that time. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23xfs: split dquot buffer operations outDave Chinner8-266/+303
Parts of userspace want to be able to read and modify dquot buffers (e.g. xfs_db) so we need to split out the reading and writing of these buffers so it is easy to shared code with libxfs in userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23xfs: unify directory/attribute format definitionsDave Chinner33-426/+437
The on-disk format definitions for the directory and attribute structures are spread across 3 header files right now, only one of which is dedicated to defining on-disk structures and their manipulation (xfs_dir2_format.h). Pull all the format definitions into a single header file - xfs_da_format.h - and switch all the code over to point at that. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23xfs: create a shared header file for format-related informationDave Chinner54-232/+289
All of the buffer operations structures are needed to be exported for xfs_db, so move them all to a common location rather than spreading them all over the place. They are verifying the on-disk format, so while xfs_format.h might be a good place, it is not part of the on disk format. Hence we need to create a new header file that we centralise these related definitions. Start by moving the bffer operations structures, and then also move all the other definitions that have crept into xfs_log_format.h and xfs_format.h as there was no other shared header file to put them in. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-21xfs: fold xfs_change_file_space into xfs_ioc_spaceChristoph Hellwig4-191/+126
Now that only one caller of xfs_change_file_space is left it can be merged into said caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-21xfs: simplify the fallocate pathChristoph Hellwig3-63/+56
Call xfs_alloc_file_space or xfs_free_file_space directly from xfs_file_fallocate instead of going through xfs_change_file_space. This simplified the code by removing the unessecary marshalling of the arguments into an xfs_flock64_t structure and allows removing checks that are already done in the VFS code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-21xfs: always hold the iolock when calling xfs_change_file_spaceChristoph Hellwig4-49/+27
Currently fallocate always holds the iolock when calling into xfs_change_file_space, while the ioctl path lets some of the lower level functions take it, but leave it out in others. This patch makes sure the ioctl path also always holds the iolock and thus introduces consistent locking for the preallocation operations while simplifying the code and allowing to kill the now unused XFS_ATTR_NOLOCK flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-21xfs: remove the unused XFS_ATTR_NONBLOCK flagChristoph Hellwig2-4/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-21xfs: always take the iolock around xfs_setattr_sizeChristoph Hellwig4-22/+24
There is no reason to conditionally take the iolock inside xfs_setattr_size when we can let the caller handle it unconditionally, which just incrases the lock hold time for the case where it was previously taken internally by a few instructions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-17xfs: don't break from growfs ag update loop on errorEric Sandeen1-9/+13
When xfs_growfs_data_private() is updating backup superblocks, it bails out on the first error encountered, whether reading or writing: * If we get an error writing out the alternate superblocks, * just issue a warning and continue. The real work is * already done and committed. This can cause a problem later during repair, because repair looks at all superblocks, and picks the most prevalent one as correct. If we bail out early in the backup superblock loop, we can end up with more "bad" matching superblocks than good, and a post-growfs repair may revert the filesystem to the old geometry. With the combination of superblock verifiers and old bugs, we're more likely to encounter read errors due to verification. And perhaps even worse, we don't even properly write any of the newly-added superblocks in the new AGs. Even with this change, growfs will still say: xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning data blocks changed from 319815680 to 335216640 which might be confusing to the user, but it at least communicates that something has gone wrong, and dmesg will probably highlight the need for an xfs_repair. And this is still best-effort; if verifiers fail on more than half the backup supers, they may still "win" - but that's probably best left to repair to more gracefully handle by doing its own strict verification as part of the backup super "voting." Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-17xfs: don't emit corruption noise on fs probesEric Sandeen1-2/+3
If we get EWRONGFS due to probing of non-xfs filesystems, there's no need to issue the scary corruption error and backtrace. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-17xfs: remove newlines from strings passed to __xfs_printkEric Sandeen10-20/+20
__xfs_printk adds its own "\n". Having it in the original string leads to unintentional blank lines from these messages. Most format strings have no newline, but a few do, leading to i.e.: [ 7347.119911] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05 [ 7347.119911] [ 7347.119919] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05 [ 7347.119919] Fix them all. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-17xfs: prevent deadlock trying to cover an active logDave Chinner3-25/+47
Recent analysis of a deadlocked XFS filesystem from a kernel crash dump indicated that the filesystem was stuck waiting for log space. The short story of the hang on the RHEL6 kernel is this: - the tail of the log is pinned by an inode - the inode has been pushed by the xfsaild - the inode has been flushed to it's backing buffer and is currently flush locked and hence waiting for backing buffer IO to complete and remove it from the AIL - the backing buffer is marked for write - it is on the delayed write queue - the inode buffer has been modified directly and logged recently due to unlinked inode list modification - the backing buffer is pinned in memory as it is in the active CIL context. - the xfsbufd won't start buffer writeback because it is pinned - xfssyncd won't force the log because it sees the log as needing to be covered and hence wants to issue a dummy transaction to move the log covering state machine along. Hence there is no trigger to force the CIL to the log and hence unpin the inode buffer and therefore complete the inode IO, remove it from the AIL and hence move the tail of the log along, allowing transactions to start again. Mainline kernels also have the same deadlock, though the signature is slightly different - the inode buffer never reaches the delayed write lists because xfs_buf_item_push() sees that it is pinned and hence never adds it to the delayed write list that the xfsaild flushes. There are two possible solutions here. The first is to simply force the log before trying to cover the log and so ensure that the CIL is emptied before we try to reserve space for the dummy transaction in the xfs_log_worker(). While this might work most of the time, it is still racy and is no guarantee that we don't get stuck in xfs_trans_reserve waiting for log space to come free. Hence it's not the best way to solve the problem. The second solution is to modify xfs_log_need_covered() to be aware of the CIL. We only should be attempting to cover the log if there is no current activity in the log - covering the log is the process of ensuring that the head and tail in the log on disk are identical (i.e. the log is clean and at idle). Hence, by definition, if there are items in the CIL then the log is not at idle and so we don't need to attempt to cover it. When we don't need to cover the log because it is active or idle, we issue a log force from xfs_log_worker() - if the log is idle, then this does nothing. However, if the log is active due to there being items in the CIL, it will force the items in the CIL to the log and unpin them. In the case of the above deadlock scenario, instead of xfs_log_worker() getting stuck in xfs_trans_reserve() attempting to cover the log, it will instead force the log, thereby unpinning the inode buffer, allowing IO to be issued and complete and hence removing the inode that was pinning the tail of the log from the AIL. At that point, everything will start moving along again. i.e. the xfs_log_worker turns back into a watchdog that can alleviate deadlocks based around pinned items that prevent the tail of the log from being moved... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08xfs: clean up xfs_inactive() error handling, kill VN_INACTIVE_[NO]CACHEBrian Foster3-26/+12
The xfs_inactive() return value is meaningless. Turn xfs_inactive() into a void function and clean up the error handling appropriately. Kill the VN_INACTIVE_[NO]CACHE directives as they are not relevant to Linux. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08xfs: push down inactive transaction mgmt for ifreeBrian Foster1-50/+71
Push the inode free work performed during xfs_inactive() down into a new xfs_inactive_ifree() helper. This clears xfs_inactive() from all inode locking and transaction management more directly associated with freeing the inode xattrs, extents and the inode itself. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08xfs: push down inactive transaction mgmt for truncateBrian Foster1-49/+68
Create the new xfs_inactive_truncate() function to handle the truncate portion of xfs_inactive(). Push the locking and transaction management into the new function. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08xfs: push down inactive transaction mgmt for remote symlinksBrian Foster3-54/+49
Push down the transaction management for remote symlinks from xfs_inactive() down to xfs_inactive_symlink_rmt(). The latter is cleaned up to avoid transaction management intended for the calling context (i.e., trans duplication, reservation, item attachment). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08xfs: add the inode directory type support to XFS_IOC_FSGEOMMark Tinguely2-3/+5
Add the inode type directory type support to XFS_IOC_FSGEOM so that xfs_repair/xfs_info knows if the superblock v4 filesystem enabled the feature. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-01xfs: remove usage of is_bad_inodeBen Myers3-17/+1
XFS never calls mark_inode_bad or iget_failed, so it will never see a bad inode. Remove all checks for is_bad_inode because they are unnecessary. Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2013-10-01xfs: fix the wrong new_size/rnew_size at xfs_iext_realloc_direct()Jie Liu1-7/+2
At xfs_iext_realloc_direct(), the new_size is changed by adding if_bytes if originally the extent records are stored at the inline extent buffer, and we have to switch from it to a direct extent list for those new allocated extents, this is wrong. e.g, Create a file with three extents which was showing as following, xfs_io -f -c "truncate 100m" /xfs/testme for i in $(seq 0 5 10); do offset=$(($i * $((1 << 20)))) xfs_io -c "pwrite $offset 1m" /xfs/testme done Inline ------ irec: if_bytes bytes_diff new_size 1st 0 16 16 2nd 16 16 32 Switching --------- rnew_size 3rd 32 16 48 + 32 = 80 roundup=128 In this case, the desired value of new_size should be 48, and then it will be roundup to 64 and be assigned to rnew_size. However, this issue has been covered by resetting the if_bytes to the new_size which is calculated at the begnning of xfs_iext_add() before leaving out this function, and in turn make the rnew_size correctly again. Hence, this can not be detected via xfstestes. This patch fix above problem and revise the new_size comments at xfs_iext_realloc_direct() to make it more readable. Also, fix the comments while switching from the inline extent buffer to a direct extent list to reflect this change. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-01xfs: get rid of count from xfs_iomap_write_allocate()Jie Liu3-6/+5
Get rid of function variable count from xfs_iomap_write_allocate() as it is unused. Additionally, checkpatch warn me of the following for this change: WARNING: extern prototypes should be avoided in .h files +extern int xfs_iomap_write_allocate(struct xfs_inode *, xfs_off_t, So this patch also remove all extern function prototypes at xfs_iomap.h to suppress it to make this code style in consistent manner in this file. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-01xfs: Use kmem_free() instead of free()Thierry Reding1-1/+1
This fixes a build failure caused by calling the free() function which does not exist in the Linux kernel. Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-30xfs: fix memory leak in xlog_recover_add_to_transtinguely@sgi.com1-0/+1
Free the memory in error path of xlog_recover_add_to_trans(). Normally this memory is freed in recovery pass2, but is leaked in the error path. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-30xfs: dirent dtype presence is dependent on directory magic numbersDave Chinner4-39/+28
The determination of whether a directory entry contains a dtype field originally was dependent on the filesystem having CRCs enabled. This meant that the format for dtype beign enabled could be determined by checking the directory block magic number rather than doing a feature bit check. This was useful in that it meant that we didn't need to pass a struct xfs_mount around to functions that were already supplied with a directory block header. Unfortunately, the introduction of dtype fields into the v4 structure via a feature bit meant this "use the directory block magic number" method of discriminating the dirent entry sizes is broken. Hence we need to convert the places that use magic number checks to use feature bit checks so that they work correctly and not by chance. The current code works on v4 filesystems only because the dirent size roundup covers the extra byte needed by the dtype field in the places where this problem occurs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-30xfs: lockdep needs to know about 3 dquot-deep nestingDave Chinner1-3/+16
Michael Semon reported that xfs/299 generated this lockdep warning: ============================================= [ INFO: possible recursive locking detected ] 3.12.0-rc2+ #2 Not tainted --------------------------------------------- touch/21072 is trying to acquire lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 but task is already holding lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xfs_dquot_other_class); lock(&xfs_dquot_other_class); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by touch/21072: #0: (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e #1: (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40 #2: (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35 #3: (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1 #4: (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f #5: (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 #6: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 The lockdep annotation for dquot lock nesting only understands locking for user and "other" dquots, not user, group and quota dquots. Fix the annotations to match the locking heirarchy we now have. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-26xfs: fix node forward in xfs_node_toosmallMark Tinguely1-2/+3
Commit f5ea1100 cleans up the disk to host conversions for node directory entries, but because a variable is reused in xfs_node_toosmall() the next node is not correctly found. If the original node is small enough (<= 3/8 of the node size), this change may incorrectly cause a node collapse when it should not. That will cause an assert in xfstest generic/319: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: /root/newest/xfs/fs/xfs/xfs_trans_buf.c, line: 569 Keep the original node header to get the correct forward node. (When a node is considered for a merge with a sibling, it overwrites the sibling pointers of the original incore nodehdr with the sibling's pointers. This leads to loop considering the original node as a merge candidate with itself in the second pass, and so it incorrectly determines a merge should occur.) Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> [v3: added Dave Chinner's (slightly modified) suggestion to the commit header, cleaned up whitespace. -bpm]
2013-09-24xfs: log recovery lsn ordering needs uuid checkDave Chinner1-14/+59
After a fair number of xfstests runs, xfs/182 started to fail regularly with a corrupted directory - a directory read verifier was failing after recovery because it found a block with a XARM magic number (remote attribute block) rather than a directory data block. The first time I saw this repeated failure I did /something/ and the problem went away, so I was never able to find the underlying problem. Test xfs/182 failed again today, and I found the root cause before I did /something else/ that made it go away. Tracing indicated that the block in question was being correctly logged, the log was being flushed by sync, but the buffer was not being written back before the shutdown occurred. Tracing also indicated that log recovery was also reading the block, but then never writing it before log recovery invalidated the cache, indicating that it was not modified by log recovery. More detailed analysis of the corpse indicated that the filesystem had a uuid of "a4131074-1872-4cac-9323-2229adbcb886" but the XARM block had a uuid of "8f32f043-c3c9-e7f8-f947-4e7f989c05d3", which indicated it was a block from an older filesystem. The reason that log recovery didn't replay it was that the LSN in the XARM block was larger than the LSN of the transaction being replayed, and so the block was not overwritten by log recovery. Hence, log recovery cant blindly trust the magic number and LSN in the block - it must verify that it belongs to the filesystem being recovered before using the LSN. i.e. if the UUIDs don't match, we need to unconditionally recovery the change held in the log. This patch was first tested on a block device that was repeatedly causing xfs/182 to fail with the same failure on the same block with the same directory read corruption signature (i.e. XARM block). It did not fail, and hasn't failed since. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-24xfs: fix XFS_IOC_FREE_EOFBLOCKS definitionDave Chinner1-1/+1
It uses a kernel internal structure in it's definition rather than the user visible structure that is passed to the ioctl. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-24xfs: asserting lock not held during freeing not validDave Chinner1-5/+4
When we free an inode, we do so via RCU. As an RCU lookup can occur at any time before we free an inode, and that lookup takes the inode flags lock, we cannot safely assert that the flags lock is not held just before marking it dead and running call_rcu() to free the inode. We check on allocation of a new inode structre that the lock is not held, so we still have protection against locks being leaked and hence not correctly initialised when allocated out of the slab. Hence just remove the assert... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-24xfs: lock the AIL before removing the buffer itemDave Chinner1-0/+1
Regression introduced by commit 46f9d2e ("xfs: aborted buf items can be in the AIL") which fails to lock the AIL before removing the item. Spinlock debugging throws a warning about this. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-16Linux 3.12-rc1v3.12-rc1Linus Torvalds1-3/+3
2013-09-16Merge branch 'timers/core' of ↵Linus Torvalds8-128/+164
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer code update from Thomas Gleixner: - armada SoC clocksource overhaul with a trivial merge conflict - Minor improvements to various SoC clocksource drivers * 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: armada-370-xp: Add detailed clock requirements in devicetree binding clocksource: armada-370-xp: Get reference fixed-clock by name clocksource: armada-370-xp: Replace WARN_ON with BUG_ON clocksource: armada-370-xp: Fix device-tree binding clocksource: armada-370-xp: Introduce new compatibles clocksource: armada-370-xp: Use CLOCKSOURCE_OF_DECLARE clocksource: armada-370-xp: Simplify TIMER_CTRL register access clocksource: armada-370-xp: Use BIT() ARM: timer-sp: Set dynamic irq affinity ARM: nomadik: add dynamic irq flag to the timer clocksource: sh_cmt: 32-bit control register support clocksource: em_sti: Convert to devm_* managed helpers
2013-09-16Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2-36/+21
Pull CIFS fixes from Steve French: "Two minor cifs fixes and a minor documentation cleanup for cifs.txt" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs: update cifs.txt and remove some outdated infos cifs: Avoid calling unlock_page() twice in cifs_readpage() when using fscache cifs: Do not take a reference to the page in cifs_readpage_worker()
2013-09-16Merge tag 'upstream-3.12-rc1' of git://git.infradead.org/linux-ubiLinus Torvalds2-4/+4
Pull UBI fixes from Artem Bityutskiy: "Just a single fastmap fix plus a regression fix" * tag 'upstream-3.12-rc1' of git://git.infradead.org/linux-ubi: UBI: Fix invalidate_fastmap() UBI: Fix PEB leak in wear_leveling_worker()
2013-09-16Merge tag 'upstream-3.12-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds1-3/+4
Pull ubifs fix from Artem Bityutskiy: "Just one patch which fixes the power-cut recovery testing mode. I'll start using a single UBI/UBIFS tree instead of 2 trees from now on. So in the future you'll get 1 small pull request instead of 2 tiny ones" * tag 'upstream-3.12-rc1' of git://git.infradead.org/linux-ubifs: UBIFS: remove invalid warn msg with tst_recovery enabled
2013-09-15Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds7-8/+50
Pull MIPS fixes from Ralf Baechle: "These are four patches for three construction sites: - Fix register decoding for the combination of multi-core processors and multi-threading. - Two more fixes that are part of the ongoing DECstation resurrection work. One of these touches a DECstation-only network driver. - Finally Markos' trivial build fix for the AP/SP support. (With this applied now all MIPS defconfigs are building again)" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: kernel: vpe: Make vpe_attrs an array of pointers. MIPS: Fix SMP core calculations when using MT support. MIPS: DECstation I/O ASIC DMA interrupt handling fix MIPS: DECstation HRT initialization rearrangement
2013-09-15Merge branch 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86Linus Torvalds13-114/+66
Pull x86 platform updates from Matthew Garrett: "Nothing amazing here, almost entirely cleanups and minor bugfixes and one bit of hardware enablement in the amilo-rfkill driver" * 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86: platform/x86: panasonic-laptop: reuse module_acpi_driver samsung-laptop: fix config build error platform: x86: remove unnecessary platform_set_drvdata() amilo-rfkill: Enable using amilo-rfkill with the FSC Amilo L1310. wmi: parse_wdg() should return kernel error codes hp_wmi: Fix unregister order in hp_wmi_rfkill_setup() platform: replace strict_strto*() with kstrto*() x86: irst: use module_acpi_driver to simplify the code x86: smartconnect: use module_acpi_driver to simplify the code platform samsung-q10: use ACPI instead of direct EC calls thinkpad_acpi: add the ability setting TPACPI_LED_NONE by quirk thinkpad_acpi: return -NODEV while operating uninitialized LEDs
2013-09-15Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds37-678/+1718
Pull misc SCSI driver updates from James Bottomley: "This patch set is a set of driver updates (megaraid_sas, fnic, lpfc, ufs, hpsa) we also have a couple of bug fixes (sd out of bounds and ibmvfc error handling) and the first round of esas2r checker fixes and finally the much anticipated big endian additions for megaraid_sas" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (47 commits) [SCSI] fnic: fnic Driver Tuneables Exposed through CLI [SCSI] fnic: Kernel panic while running sh/nosh with max lun cfg [SCSI] fnic: Hitting BUG_ON(io_req->abts_done) in fnic_rport_exch_reset [SCSI] fnic: Remove QUEUE_FULL handling code [SCSI] fnic: On system with >1.1TB RAM, VIC fails multipath after boot up [SCSI] fnic: FC stat param seconds_since_last_reset not getting updated [SCSI] sd: Fix potential out-of-bounds access [SCSI] lpfc 8.3.42: Update lpfc version to driver version 8.3.42 [SCSI] lpfc 8.3.42: Fixed issue of task management commands having a fixed timeout [SCSI] lpfc 8.3.42: Fixed inconsistent spin lock usage. [SCSI] lpfc 8.3.42: Fix driver's abort loop functionality to skip IOs already getting aborted [SCSI] lpfc 8.3.42: Fixed failure to allocate SCSI buffer on PPC64 platform for SLI4 devices [SCSI] lpfc 8.3.42: Fix WARN_ON when driver unloads [SCSI] lpfc 8.3.42: Avoided making pci bar ioremap call during dual-chute WQ/RQ pci bar selection [SCSI] lpfc 8.3.42: Fixed driver iocbq structure's iocb_flag field running out of space [SCSI] lpfc 8.3.42: Fix crash on driver load due to cpu affinity logic [SCSI] lpfc 8.3.42: Fixed logging format of setting driver sysfs attributes hard to interpret [SCSI] lpfc 8.3.42: Fixed back to back RSCNs discovery failure. [SCSI] lpfc 8.3.42: Fixed race condition between BSG I/O dispatch and timeout handling [SCSI] lpfc 8.3.42: Fixed function mode field defined too small for not recognizing dual-chute mode ...
2013-09-15Merge branch 'slab/next' of ↵Linus Torvalds8-371/+216
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull SLAB update from Pekka Enberg: "Nothing terribly exciting here apart from Christoph's kmalloc unification patches that brings sl[aou]b implementations closer to each other" * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: slab: Use correct GFP_DMA constant slub: remove verify_mem_not_deleted() mm/sl[aou]b: Move kmallocXXX functions to common code mm, slab_common: add 'unlikely' to size check of kmalloc_slab() mm/slub.c: beautify code for removing redundancy 'break' statement. slub: Remove unnecessary page NULL check slub: don't use cpu partial pages on UP mm/slub: beautify code for 80 column limitation and tab alignment mm/slub: remove 'per_cpu' which is useless variable
2013-09-15Merge branch 'for-linus' of ↵Linus Torvalds2-6/+32
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input update from Dmitry Torokhov: "The only change is David Hermann's new EVIOCREVOKE evdev ioctl that allows safely passing file descriptors to input devices to session processes and later being able to stop delivery of events through these fds so that inactive sessions will no longer receive user input that does not belong to them" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: evdev - add EVIOCREVOKE ioctl