summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2012-08-29Merge branch 'for-linus' of ↵Linus Torvalds21-376/+418
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "I've split out the big send/receive update from my last pull request and now have just the fixes in my for-linus branch. The send/recv branch will wander over to linux-next shortly though. The largest patches in this pull are Josef's patches to fix DIO locking problems and his patch to fix a crash during balance. They are both well tested. The rest are smaller fixes that we've had queued. The last rc came out while I was hacking new and exciting ways to recover from a misplaced rm -rf on my dev box, so these missed rc3." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits) Btrfs: fix that repair code is spuriously executed for transid failures Btrfs: fix ordered extent leak when failing to start a transaction Btrfs: fix a dio write regression Btrfs: fix deadlock with freeze and sync V2 Btrfs: revert checksum error statistic which can cause a BUG() Btrfs: remove superblock writing after fatal error Btrfs: allow delayed refs to be merged Btrfs: fix enospc problems when deleting a subvol Btrfs: fix wrong mtime and ctime when creating snapshots Btrfs: fix race in run_clustered_refs Btrfs: don't run __tree_mod_log_free_eb on leaves Btrfs: increase the size of the free space cache Btrfs: barrier before waitqueue_active Btrfs: fix deadlock in wait_for_more_refs btrfs: fix second lock in btrfs_delete_delayed_items() Btrfs: don't allocate a seperate csums array for direct reads Btrfs: do not strdup non existent strings Btrfs: do not use missing devices when showing devname Btrfs: fix that error value is changed by mistake Btrfs: lock extents as we map them in DIO ...
2012-08-28Btrfs: fix that repair code is spuriously executed for transid failuresStefan Behrens1-2/+6
If verify_parent_transid() fails for all mirrors, the current code calls repair_io_failure() anyway which means: - that the disk block is rewritten without repairing anything and - that a kernel log message is printed which misleadingly claims that a read error was corrected. This is an example: parent transid verify failed on 615015833600 wanted 110423 found 110424 parent transid verify failed on 615015833600 wanted 110423 found 110424 btrfs read error corrected: ino 1 off 615015833600 (dev /dev/...) It is wrong to ignore the results from verify_parent_transid() and to call repair_eb_io_failure() when the verification of the transids failed. This commit fixes the issue. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-08-28Btrfs: fix ordered extent leak when failing to start a transactionLiu Bo1-2/+5
We cannot just return error before freeing ordered extent and releasing reserved space when we fail to start a transacion. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-08-28Btrfs: fix a dio write regressionLiu Bo1-4/+20
This bug is introduced by commit 3b8bde746f6f9bd36a9f05f5f3b6e334318176a9 (Btrfs: lock extents as we map them in DIO). In dio write, we should unlock the section which we didn't do IO on in case that we fall back to buffered write. But we need to not only unlock the section but also cleanup reserved space for the section. This bug was found while running xfstests 133, with this 133 no longer complains. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-08-28Btrfs: fix deadlock with freeze and sync V2Josef Bacik1-4/+9
We can deadlock with freeze right now because we unconditionally start a transaction in our ->sync_fs() call. To fix this just check and see if we have a running transaction to commit. This saves us from the deadlock because at this point we'll have the umount sem for the sb so we're safe from freezes coming in after we've done our check. With this patch the freeze xfstests no longer deadlocks. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-08-28Btrfs: revert checksum error statistic which can cause a BUG()Stefan Behrens3-39/+2
Commit 442a4f6308e694e0fa6025708bd5e4e424bbf51c added btrfs device statistic counters for detected IO and checksum errors to Linux 3.5. The statistic part that counts checksum errors in end_bio_extent_readpage() can cause a BUG() in a subfunction: "kernel BUG at fs/btrfs/volumes.c:3762!" That part is reverted with the current patch. However, the counting of checksum errors in the scrub context remains active, and the counting of detected IO errors (read, write or flush errors) in all contexts remains active. Cc: stable <stable@vger.kernel.org> # 3.5 Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-08-28Btrfs: remove superblock writing after fatal errorStefan Behrens2-33/+5
With commit acce952b0, btrfs was changed to flag the filesystem with BTRFS_SUPER_FLAG_ERROR and switch to read-only mode after a fatal error happened like a write I/O errors of all mirrors. In such situations, on unmount, the superblock is written in btrfs_error_commit_super(). This is done with the intention to be able to evaluate the error flag on the next mount. A warning is printed in this case during the next mount and the log tree is ignored. The issue is that it is possible that the superblock points to a root that was not written (due to write I/O errors). The result is that the filesystem cannot be mounted. btrfsck also does not start and all the other btrfs-progs tools fail to start as well. However, mount -o recovery is working well and does the right things to recover the filesystem (i.e., don't use the log root, clear the free space cache and use the next mountable root that is stored in the root backup array). This patch removes the writing of the superblock when BTRFS_SUPER_FLAG_ERROR is set, and removes the handling of the error flag in the mount function. These lines can be used to reproduce the issue (using /dev/sdm): SCRATCH_DEV=/dev/sdm SCRATCH_MNT=/mnt echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup create foo ls -alLF /dev/mapper/foo mkfs.btrfs /dev/mapper/foo mount /dev/mapper/foo $SCRATCH_MNT echo bar > $SCRATCH_MNT/foo sync echo 0 25165824 error | dmsetup reload foo dmsetup resume foo ls -alF $SCRATCH_MNT touch $SCRATCH_MNT/1 ls -alF $SCRATCH_MNT sleep 35 echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup reload foo dmsetup resume foo sleep 1 umount $SCRATCH_MNT btrfsck /dev/mapper/foo dmsetup remove foo Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-08-28Btrfs: allow delayed refs to be mergedJosef Bacik3-27/+142
Daniel Blueman reported a bug with fio+balance on a ramdisk setup. Basically what happens is the balance relocates a tree block which will drop the implicit refs for all of its children and adds a full backref. Once the block is relocated we have to add the implicit refs back, so when we cow the block again we add the implicit refs for its children back. The problem comes when the original drop ref doesn't get run before we add the implicit refs back. The delayed ref stuff will specifically prefer ADD operations over DROP to keep us from freeing up an extent that will have references to it, so we try to add the implicit ref before it is actually removed and we panic. This worked fine before because the add would have just canceled the drop out and we would have been fine. But the backref walking work needs to be able to freeze the delayed ref stuff in time so we have this ever increasing sequence number that gets attached to all new delayed ref updates which makes us not merge refs and we run into this issue. So to fix this we need to merge delayed refs. So everytime we run a clustered ref we need to try and merge all of its delayed refs. The backref walking stuff locks the delayed ref head before processing, so if we have it locked we are safe to merge any refs inside of the sequence number. If there is no sequence number we can merge all refs. Doing this not only fixes our bug but keeps the delayed ref code from adding and removing useless refs and batching together multiple refs into one search instead of one search per delayed ref, which will really help our commit times. I ran this with Daniels test and 276 and I haven't seen any problems. Thanks, Reported-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28Btrfs: fix enospc problems when deleting a subvolJosef Bacik1-1/+1
Subvol delete is a special kind of awful where we use the global reserve to cover the ENOSPC requirements. The problem is once we're done removing everything we do a btrfs_update_inode(), which by default will try to do the delayed update stuff which will use it's own reserve. There will be no space in this reserve and we'll return ENOSPC. So instead use btrfs_update_inode_fallback() which will just fallback to updating the inode item in the case of enospc. This is fine because the global reserve covers the space requirements for this. With this patch I can now delete a subvol on a problem image Dave Sterba sent me. Thanks, Reported-by: David Sterba <dave@jikos.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28Btrfs: fix wrong mtime and ctime when creating snapshotsMiao Xie1-0/+1
When we created a new snapshot, the mtime and ctime of its parent directory were not updated. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28Btrfs: fix race in run_clustered_refsArne Jansen1-0/+17
With commit commit d1270cd91f308c9d22b2804720c36ccd32dbc35e Author: Arne Jansen <sensille@gmx.net> Date: Tue Sep 13 15:16:43 2011 +0200 Btrfs: put back delayed refs that are too new I added a window where the delayed_ref's head->ref_mod code can diverge from the sum of the remaining refs, because we release the head->mutex in the middle. This leads to btrfs_lookup_extent_info returning wrong numbers. This patch fixes this by adjusting the head's ref_mod with each delayed ref we run. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28Btrfs: don't run __tree_mod_log_free_eb on leavesChris Mason1-0/+3
When we split a leaf, we may end up inserting a new root on top of that leaf. The reflog code was incorrectly assuming the old root was always a node. This makes sure we skip over leaves. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28Btrfs: increase the size of the free space cacheJosef Bacik1-8/+7
Arne was complaining about the space cache having mismatching generation numbers when debugging a deadlock. This is because we can run out of space in our preallocated range for our space cache if you have a pretty fragmented amount of space in your pinned space. So just increase the amount of space we preallocate for space cache so we can be sure to have enough space. This will only really affect data ranges since their the only chunks that end up larger than 256MB. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28Btrfs: barrier before waitqueue_activeJosef Bacik5-12/+10
We need a barrir before calling waitqueue_active otherwise we will miss wakeups. So in places that do atomic_dec(); then atomic_read() use atomic_dec_return() which imply a memory barrier (see memory-barriers.txt) and then add an explicit memory barrier everywhere else that need them. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28Btrfs: fix deadlock in wait_for_more_refsArne Jansen5-73/+21
Commit a168650c introduced a waiting mechanism to prevent busy waiting in btrfs_run_delayed_refs. This can deadlock with btrfs_run_ordered_operations, where a tree_mod_seq is held while waiting for the io to complete, while the end_io calls btrfs_run_delayed_refs. This whole mechanism is unnecessary. If not enough runnable refs are available to satisfy count, just return as count is more like a guideline than a strict requirement. In case we have to run all refs, commit transaction makes sure that no other threads are working in the transaction anymore, so we just assert here that no refs are blocked. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28btrfs: fix second lock in btrfs_delete_delayed_items()Fengguang Wu1-2/+3
Fix a real bug caught by coccinelle. fs/btrfs/delayed-inode.c:1013:1-11: second lock on line 1013 Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-08-28Btrfs: don't allocate a seperate csums array for direct readsJosef Bacik3-32/+19
We've been allocating a big array for csums instead of storing them in the io_tree like we do for buffered reads because previously we were locking the entire range, so we didn't have an extent state for each sector of the range. But now that we do the range locking as we map the buffers we can limit the mapping lenght to sectorsize and use the private part of the io_tree for our csums. This allows us to avoid an extra memory allocation for direct reads which could incur latency. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28Btrfs: do not strdup non existent stringsJosef Bacik1-3/+5
When we close devices we add back empty devices for some reason that escapes me. In the case of a missing dev we don't allocate an rcu_string for it's name, so check to see if the device has a name and if it doesn't don't bother strdup()'ing it. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28Btrfs: do not use missing devices when showing devnameJosef Bacik1-0/+2
If you do the following mkfs.btrfs /dev/sdb /dev/sdc rmmod btrfs dd if=/dev/zero of=/dev/sdb bs=1M count=1 mount -o degraded /dev/sdc /mnt/btrfs-test the box will panic trying to deref the name for the missing dev since it is the lower numbered devid. So fix show_devname to not use missing devices. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28Btrfs: fix that error value is changed by mistakeStefan Behrens1-2/+2
In iterate_inodes_from_logical() the error result from extent_from_logical() is patched by mistake. Typically ENOENT is patched to EINVAL because (-ENOENT & BTRFS_EXTENT_FLAG_TREE_BLOCK) evaluates to true. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-08-28Btrfs: lock extents as we map them in DIOJosef Bacik1-129/+127
A deadlock in xfstests 113 was uncovered by commit d187663ef24cd3d033f0cbf2867e70b36a3a90b8 This is because we would not return EIOCBQUEUED for short AIO reads, instead we'd wait for the DIO to complete and then return the amount of data we transferred, which would allow our stuff to unlock the remaning amount. But with this change this no longer happens, so if we have a short AIO read (for example if we try to read past EOF), we could leave the section from EOF to the end of where we tried to read locked. Fixing this is tricky since there is no clear way to know exactly how much data DIO truly submitted for IO, so to make this less hard on ourselves and less combersome we need to lock the extents as we try to map them, and then we unlock any areas we didn't actually map. This makes us completely safe from deadlocks and reliance on a particular behavior of the DIO code. This also lays the groundwork for allowing us to use the normal csum storage method for reads which means we can remove an allocation. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28Btrfs: fix some endian bugs handling the root timesDan Carpenter3-4/+4
"trans->transid" is cpu endian but we want to store the data as little endian. "item->ctime.nsec" is only 32 bits, not 64. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-28Btrfs: unlock on error in btrfs_delalloc_reserve_metadata()Dan Carpenter1-1/+3
We should release this mutex before returning the error code. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-28Btrfs: checking for NULL instead of IS_ERRDan Carpenter1-1/+3
add_qgroup_rb() never returns NULL, only error pointers. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-28Btrfs: fix some error codes in btrfs_qgroup_inherit()Dan Carpenter1-2/+6
These are returning zero when it should be returning a negative error code. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-28Btrfs: fix a misplaced address operator in a conditionStefan Behrens1-1/+1
This should obviously not be "if (&flag)" but "if (flag)". Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-08-26Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstreamLinus Torvalds5-12/+26
Pull LogFS bugfixes from Prasad Joshi: - "logfs: query block device for number of pages to send with bio" This BUG was found when LogFS was used on KVM. The patch fixes the problem by asking for underlaying block device the number of pages to send with each BIO. - "logfs: maintain the ordering of meta-inode destruction" LogFS maintains file system meta-data in special inodes. These inodes are releated to each other, therefore they must be destroyed in a proper order. - "logfs: initialize the number of iovecs in bio" LogFS used to panic when it was created on an encrypted LVM volume. The patch fixes the problem by properly initializing the BIO. Plus a couple more: - logfs: create a pagecache page if it is not present - logfs: destroy the reserved inodes while unmounting * tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream: logfs: query block device for number of pages to send with bio logfs: maintain the ordering of meta-inode destruction logfs: create a pagecache page if it is not present logfs: initialize the number of iovecs in bio logfs: destroy the reserved inodes while unmounting
2012-08-25Merge tag 'for-linus-v3.6-rc4' of git://oss.sgi.com/xfs/xfsLinus Torvalds3-11/+14
Pull xfs bugfixes from Ben Myers: - fix uninitialised variable in xfs_rtbuf_get() - unlock the AGI buffer when looping in xfs_dialloc - check for possible overflow in xfs_ioc_trim * tag 'for-linus-v3.6-rc4' of git://oss.sgi.com/xfs/xfs: xfs: check for possible overflow in xfs_ioc_trim xfs: unlock the AGI buffer when looping in xfs_dialloc xfs: fix uninitialised variable in xfs_rtbuf_get()
2012-08-25Merge branch 'for-3.6' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2-3/+2
Pull nfsd bugfixes from J. Bruce Fields: "Particular thanks to Michael Tokarev, Malahal Naineni, and Jamie Heilman for their testing and debugging help." * 'for-3.6' of git://linux-nfs.org/~bfields/linux: svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping svcrpc: sends on closed socket should stop immediately svcrpc: fix BUG() in svc_tcp_clear_pages nfsd4: fix security flavor of NFSv4.0 callback
2012-08-25Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds4-41/+44
Pull block-related fixes from Jens Axboe: - Improvements to the buffered and direct write IO plugging from Fengguang. - Abstract out the mapping of a bio in a request, and use that to provide a blk_bio_map_sg() helper. Useful for mapping just a bio instead of a full request. - Regression fix from Hugh, fixing up a patch that went into the previous release cycle (and marked stable, too) attempting to prevent a loop in __getblk_slow(). - Updates to discard requests, fixing up the sizing and how we align them. Also a change to disallow merging of discard requests, since that doesn't really work properly yet. - A few drbd fixes. - Documentation updates. * 'for-linus' of git://git.kernel.dk/linux-block: block: replace __getblk_slow misfix by grow_dev_page fix drbd: Write all pages of the bitmap after an online resize drbd: Finish requests that completed while IO was frozen drbd: fix drbd wire compatibility for empty flushes Documentation: update tunable options in block/cfq-iosched.txt Documentation: update tunable options in block/cfq-iosched.txt Documentation: update missing index files in block/00-INDEX block: move down direct IO plugging block: remove plugging at buffered write time block: disable discard request merge temporarily bio: Fix potential memory leak in bio_find_or_create_slab() block: Don't use static to define "void *p" in show_partition_start() block: Add blk_bio_map_sg() helper block: Introduce __blk_segment_map_sg() helper fs/block-dev.c:fix performance regression in O_DIRECT writes to md block devices block: split discard into aligned requests block: reorganize rounding of max_discard_sectors
2012-08-23Merge branch 'for_linus' of ↵Linus Torvalds6-6/+17
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF, ext3 & reiserfs fixes from Jan Kara: "A couple of fixes (udf, reiserfs, ext3) that accumulated over my vacation." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: fix retun value on error path in udf_load_logicalvol jbd: don't write superblock when unmounting an ro filesystem reiserfs: fix deadlocks with quotas quota: Move down dqptr_sem read after initializing default warn[] type at __dquot_alloc_space(). UDF: During mount free lvid_bh before rescanning with different blocksize udf: fix udf_setsize() for file data in ICB
2012-08-23Merge tag 'upstream-3.6-rc3' of git://git.infradead.org/linux-ubifsLinus Torvalds5-8/+7
Pull UBIFS fixes from Artem Bityutskiy: - Fix crash on error which prevents emulated power-cut testing. - Fix log reply regression introduced in 3.6-rc1. - Fix UBIFS complaints about too small debug buffer size which. - Fix error message spelling, and remove incorrect commentary. * tag 'upstream-3.6-rc3' of git://git.infradead.org/linux-ubifs: UBIFS: fix error messages spelling UBIFS: fix complaints about too small debug buffer size UBIFS: fix replay regression UBIFS: fix crash on error path UBIFS: remove stale commentary
2012-08-23xfs: check for possible overflow in xfs_ioc_trimTomas Racek1-2/+4
If range.start or range.minlen is bigger than filesystem size, return invalid value error. This fixes possible overflow in BTOBB macro when passed value was nearly ULLONG_MAX. Signed-off-by: Tomas Racek <tracek@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-08-23xfs: unlock the AGI buffer when looping in xfs_diallocChristoph Hellwig1-8/+9
Also update some commens in the area to make the code easier to read. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-08-23xfs: fix uninitialised variable in xfs_rtbuf_get()Dave Chinner1-1/+1
Results in this assert failure in generic/090: XFS: Assertion failed: *nmap >= 1, file: fs/xfs/xfs_bmap.c, line: 4363 ..... Call Trace: [<ffffffff814680db>] xfs_bmapi_read+0x6b/0x370 [<ffffffff814b64b2>] xfs_rtbuf_get+0x42/0x130 [<ffffffff814b6f09>] xfs_rtget_summary+0x89/0x120 [<ffffffff814b7bfe>] xfs_rtallocate_extent_size+0xce/0x340 [<ffffffff814b89f0>] xfs_rtallocate_extent+0x240/0x290 [<ffffffff81462c1a>] xfs_bmap_rtalloc+0x1ba/0x340 [<ffffffff81463a65>] xfs_bmap_alloc+0x35/0x40 [<ffffffff8146f111>] xfs_bmapi_allocate+0xf1/0x350 [<ffffffff8146f9de>] xfs_bmapi_write+0x66e/0xa60 [<ffffffff8144538a>] xfs_iomap_write_direct+0x22a/0x3f0 [<ffffffff8143707b>] __xfs_get_blocks+0x38b/0x5d0 [<ffffffff814372d4>] xfs_get_blocks_direct+0x14/0x20 [<ffffffff811b0081>] do_blockdev_direct_IO+0xf71/0x1eb0 [<ffffffff811b1015>] __blockdev_direct_IO+0x55/0x60 [<ffffffff814355ca>] xfs_vm_direct_IO+0x11a/0x1e0 [<ffffffff8112d617>] generic_file_direct_write+0xd7/0x1b0 [<ffffffff8143e16c>] xfs_file_dio_aio_write+0x13c/0x320 [<ffffffff8143e6f2>] xfs_file_aio_write+0x1c2/0x1d0 [<ffffffff81174a07>] do_sync_write+0xa7/0xe0 [<ffffffff81175288>] vfs_write+0xa8/0x160 [<ffffffff81175702>] sys_pwrite64+0x92/0xb0 [<ffffffff81b68f69>] system_call_fastpath+0x16/0x1b Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-08-23block: replace __getblk_slow misfix by grow_dev_page fixHugh Dickins1-36/+30
Commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow") is not good: a successful call to grow_buffers() cannot guarantee that the page won't be reclaimed before the immediate next call to __find_get_block(), which is why there was always a loop there. Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595: inode #19278: block 664: comm cc1: unable to read itable block" on console, which pointed to this commit. I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs sometimes fails from a missing header file, under memory pressure on ppc G5. I've never seen this on x86, and I've never seen it on 3.5-rc7 itself, despite that commit being in there: bisection pointed to an irrelevant pinctrl merge, but hard to tell when failure takes between 18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2). (I've since found such __ext4_get_inode_loc errors in /var/log/messages from previous weeks: why the message never appeared on console until yesterday morning is a mystery for another day.) Revert 91f68c89d8f3, restoring __getblk_slow() to how it was (plus a checkpatch nitfix). Simplify the interface between grow_buffers() and grow_dev_page(), and avoid the infinite loop beyond end of device by instead checking init_page_buffers()'s end_block there (I presume that's more efficient than a repeated call to blkdev_max_block()), returning -ENXIO to __getblk_slow() in that case. And remove akpm's ten-year-old "__getblk() cannot fail ... weird" comment, but that is worrying: are all users of __getblk() really now prepared for a NULL bh beyond end of device, or will some oops?? Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org # 3.0 3.2 3.4 3.5 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-08-22Merge branch 'for-linus' of ↵Linus Torvalds3-6/+13
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fixes from Sage Weil: "Jim's fix closes a narrow race introduced with the msgr changes. One fix resolves problems with debugfs initialization that Yan found when multiple client instances are created (e.g., two clusters mounted, or rbd + cephfs), another one fixes problems with mounting a nonexistent server subdirectory, and the last one fixes a divide by zero error from unsanitized ioctl input that Dan Carpenter found." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: avoid divide by zero in __validate_layout() libceph: avoid truncation due to racing banners ceph: tolerate (and warn on) extraneous dentry from mds libceph: delay debugfs initialization until we learn global_id
2012-08-22Merge tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds15-119/+239
Pull NFS client bugfixes from Trond Myklebust: - NFSv3 mounts need to fail if the FSINFO rpc call fails - Ensure that the NFS commit cache gets torn down when we unload the NFS module. - Fix memory scribble issues when interrupting a LAYOUTGET rpc call - Fix NFSv4 legacy idmapper regressions - Fix issues with the NFSv4 getacl command - Fix a regression when using the legacy "mount -t nfs4" * tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv3: Ensure that do_proc_get_root() reports errors correctly NFSv4: Ensure that nfs4_alloc_client cleans up on error. NFS: return -ENOKEY when the upcall fails to map the name NFS: Clear key construction data if the idmap upcall fails NFSv4: Don't use private xdr_stream fields in decode_getacl NFSv4: Fix the acl cache size calculation NFSv4: Fix pointer arithmetic in decode_getacl NFS: Alias the nfs module to nfs4 NFS: Fix a regression when loading the NFS v4 module NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done pnfs-obj: Better IO pattern in case of unaligned offset NFS41: add pg_layout_private to nfs_pageio_descriptor pnfs: nfs4_proc_layoutget returns void pnfs: defer release of pages in layoutget nfs: tear down caches in nfs_init_writepagecache when allocation fails
2012-08-22Merge branch 'for-linus' of ↵Linus Torvalds2-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull assorted fixes - mostly vfs - from Al Viro: "Assorted fixes, with an unexpected detour into vfio refcounting logics (fell out when digging in an analog of eventpoll race in there)." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: task_work: add a scheduling point in task_work_run() fs: fix fs/namei.c kernel-doc warnings eventpoll: use-after-possible-free in epoll_create1() vfio: grab vfio_device reference *before* exposing the sucker via fd_install() vfio: get rid of vfio_device_put()/vfio_group_get_device* races vfio: get rid of open-coding kref_put_mutex introduce kref_put_mutex() vfio: don't dereference after kfree... mqueue: lift mnt_want_write() outside ->i_mutex, clean up a bit
2012-08-22UBIFS: fix error messages spellingArtem Bityutskiy1-1/+1
Corruptio -> corruption. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-08-22fs: fix fs/namei.c kernel-doc warningsRandy Dunlap1-0/+2
Fix kernel-doc warnings in fs/namei.c: Warning(fs/namei.c:360): No description found for parameter 'inode' Warning(fs/namei.c:672): No description found for parameter 'nd' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-08-22eventpoll: use-after-possible-free in epoll_create1()Al Viro1-1/+1
As soon as we'd installed the file into descriptor table, it can get closed by another thread. Freeing ep in process... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-08-22UBIFS: fix complaints about too small debug buffer sizeArtem Bityutskiy1-1/+1
When debugging is enabled, we use a temporary on-stack buffer for formatting the key strings like "(11368871, direntry, 0xcd0750)". The buffer size is 32 bytes and sometimes it is not enough to fit the key string - e.g., when inode numbers are high. This is not fatal, but the key strings are incomplete and UBIFS complains like this: UBIFS assert failed in dbg_snprintf_key at 137 (pid 1) This is a regression caused by "515315a UBIFS: fix key printing". Fix the issue by increasing the buffer to 48 bytes. Reported-by: Michael Hench <michaelhench@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Tested-by: Michael Hench <michaelhench@gmail.com> Cc: stable@vger.kernel.org [v3.3+]
2012-08-21ceph: avoid divide by zero in __validate_layout()Sage Weil1-1/+2
If "l->stripe_unit" is zero the the mod on the next line will cause a divide by zero bug. This comes from the copy_from_user() in ceph_ioctl_set_layout_policy(). Passing 0 is valid, though (it means "do not change") so avoid the % check in that case. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-08-21ceph: tolerate (and warn on) extraneous dentry from mdsSage Weil1-5/+10
If the MDS gives us a dentry and we weren't prepared to handle it, WARN_ON_ONCE instead of crashing. Reported-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-08-21UBIFS: fix replay regressionArtem Bityutskiy1-2/+1
Commit "d51f17e UBIFS: simplify reply code a bit" introduces a bug with the following symptoms: UBIFS error (pid 1): replay_log_leb: first CS node at LEB 3:0 has wrong commit number 0 expected 1 The issue is that we start replaying the log from UBIFS_LOG_LNUM instead of c->lhead_lnum. This patch fixes that. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-08-21UBIFS: fix crash on error pathArtem Bityutskiy1-1/+4
This patch fixes a regression introduced by "4994297 UBIFS: make ubifs_lpt_init clean-up in case of failure" which I've hit while running the 'integck -p' test. When remount the file-system from R/O mode to R/W mode and 'lpt_init_wr()' fails, we free _all_ LPT resources by calling 'ubifs_lpt_free(c, 0)', even those needed for R/O mode. This leads to subsequent crashes, e.g., if we try to unmount the file-system. Cc: stable@vger.kernel.org [v3.5+] Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-08-21UBIFS: remove stale commentaryArtem Bityutskiy1-3/+0
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-08-20nfsd4: fix security flavor of NFSv4.0 callbackJ. Bruce Fields2-3/+2
Commit d5497fc693a446ce9100fcf4117c3f795ddfd0d2 "nfsd4: move rq_flavor into svc_cred" forgot to remove cl_flavor from the client, leaving two places (cl_flavor and cl_cred.cr_flavor) for the flavor to be stored. After that patch, the latter was the one that was updated, but the former was the one that the callback used. Symptoms were a long delay on utime(). This is because the utime() generated a setattr which recalled a delegation, but the cb_recall was ignored by the client because it had the wrong security flavor. Cc: stable@vger.kernel.org Tested-by: Jamie Heilman <jamie@audible.transient.net> Reported-by: Jamie Heilman <jamie@audible.transient.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20vfs: missed source of ->f_pos racesAl Viro1-2/+8
compat_sys_{read,write}v() need the same "pass a copy of file->f_pos" thing as sys_{read,write}{,v}(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>