summaryrefslogtreecommitdiffstats
path: root/fs/btrfs
AgeCommit message (Collapse)AuthorFilesLines
2014-10-24vfs: export check_sticky()Miklos Szeredi1-19/+1
It's already duplicated in btrfs and about to be used in overlayfs too. Move the sticky bit check to an inline helper and call the out-of-line helper only in the unlikly case of the sticky bit being set. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-10-18Merge branch 'for-linus-update' of ↵Linus Torvalds2-36/+33
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs data corruption fix from Chris Mason: "I'm testing a pull with more fixes, but wanted to get this one out so Greg can pick it up. The corruption isn't easy to hit, you have to do a readonly snapshot and have orphans in the snapshot. But my review and testing missed the bug. Filipe has added a better xfstest to cover it" * 'for-linus-update' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Revert "Btrfs: race free update of commit root for ro snapshots"
2014-10-18Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull core block layer changes from Jens Axboe: "This is the core block IO pull request for 3.18. Apart from the new and improved flush machinery for blk-mq, this is all mostly bug fixes and cleanups. - blk-mq timeout updates and fixes from Christoph. - Removal of REQ_END, also from Christoph. We pass it through the ->queue_rq() hook for blk-mq instead, freeing up one of the request bits. The space was overly tight on 32-bit, so Martin also killed REQ_KERNEL since it's no longer used. - blk integrity updates and fixes from Martin and Gu Zheng. - Update to the flush machinery for blk-mq from Ming Lei. Now we have a per hardware context flush request, which both cleans up the code should scale better for flush intensive workloads on blk-mq. - Improve the error printing, from Rob Elliott. - Backing device improvements and cleanups from Tejun. - Fixup of a misplaced rq_complete() tracepoint from Hannes. - Make blk_get_request() return error pointers, fixing up issues where we NULL deref when a device goes bad or missing. From Joe Lawrence. - Prep work for drastically reducing the memory consumption of dm devices from Junichi Nomura. This allows creating clone bio sets without preallocating a lot of memory. - Fix a blk-mq hang on certain combinations of queue depths and hardware queues from me. - Limit memory consumption for blk-mq devices for crash dump scenarios and drivers that use crazy high depths (certain SCSI shared tag setups). We now just use a single queue and limited depth for that" * 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits) block: Remove REQ_KERNEL blk-mq: allocate cpumask on the home node bio-integrity: remove the needless fail handle of bip_slab creating block: include func name in __get_request prints block: make blk_update_request print prefix match ratelimited prefix blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio block: fix alignment_offset math that assumes io_min is a power-of-2 blk-mq: Make bt_clear_tag() easier to read blk-mq: fix potential hang if rolling wakeup depth is too high block: add bioset_create_nobvec() block: use bio_clone_fast() in blk_rq_prep_clone() block: misplaced rq_complete tracepoint sd: Honor block layer integrity handling flags block: Replace strnicmp with strncasecmp block: Add T10 Protection Information functions block: Don't merge requests if integrity flags differ block: Integrity checksum flag block: Relocate bio integrity flags block: Add a disk flag to block integrity profile block: Add prefix to block integrity profile flags ...
2014-10-17Revert "Btrfs: race free update of commit root for ro snapshots"Chris Mason2-36/+33
This reverts commit 9c3b306e1c9e6be4be09e99a8fe2227d1005effc. Switching only one commit root during a transaction is wrong because it leads the fs into an inconsistent state. All commit roots should be switched at once, at transaction commit time, otherwise backref walking can often miss important references that were only accessible through the old commit root. Plus, the root item for the snapshot's root wasn't getting updated and preventing the next transaction commit to do it. This made several users get into random corruption issues after creation of readonly snapshots. A regression test for xfstests will follow soon. Cc: stable@vger.kernel.org # 3.17 Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-14btrfs: LLVMLinux: Remove VLAISVinícius Tinti1-9/+7
Replaced the use of a Variable Length Array In Struct (VLAIS) with a C99 compliant equivalent. This patch instead allocates the appropriate amount of memory using a char array using the SHASH_DESC_ON_STACK macro. The new code can be compiled with both gcc and clang. Signed-off-by: Vinícius Tinti <viniciustinti@gmail.com> Reviewed-by: Jan-Simon Möller <dl9pf@gmx.de> Reviewed-by: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Behan Webster <behanw@converseincode.com> Acked-by: Chris Mason <clm@fb.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net>
2014-10-13Merge branch 'for-linus' of ↵Linus Torvalds1-4/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "The big thing in this pile is Eric's unmount-on-rmdir series; we finally have everything we need for that. The final piece of prereqs is delayed mntput() - now filesystem shutdown always happens on shallow stack. Other than that, we have several new primitives for iov_iter (Matt Wilcox, culled from his XIP-related series) pushing the conversion to ->read_iter()/ ->write_iter() a bit more, a bunch of fs/dcache.c cleanups and fixes (including the external name refcounting, which gives consistent behaviour of d_move() wrt procfs symlinks for long and short names alike) and assorted cleanups and fixes all over the place. This is just the first pile; there's a lot of stuff from various people that ought to go in this window. Starting with unionmount/overlayfs mess... ;-/" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (60 commits) fs/file_table.c: Update alloc_file() comment vfs: Deduplicate code shared by xattr system calls operating on paths reiserfs: remove pointless forward declaration of struct nameidata don't need that forward declaration of struct nameidata in dcache.h anymore take dname_external() into fs/dcache.c let path_init() failures treated the same way as subsequent link_path_walk() fix misuses of f_count() in ppp and netlink ncpfs: use list_for_each_entry() for d_subdirs walk vfs: move getname() from callers to do_mount() gfs2_atomic_open(): skip lookups on hashed dentry [infiniband] remove pointless assignments gadgetfs: saner API for gadgetfs_create_file() f_fs: saner API for ffs_sb_create_file() jfs: don't hash direct inode [s390] remove pointless assignment of ->f_op in vmlogrdr ->open() ecryptfs: ->f_op is never NULL android: ->f_op is never NULL nouveau: __iomem misannotations missing annotation in fs/file.c fs: namespace: suppress 'may be used uninitialized' warnings ...
2014-10-11Merge branch 'for-linus' of ↵Linus Torvalds48-1518/+3548
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "The largest set of changes here come from Miao Xie. He's cleaning up and improving read recovery/repair for raid, and has a number of related fixes. I've merged another set of fsync fixes from Filipe, and he's also improved the way we handle metadata write errors to make sure we force the FS readonly if things go wrong. Otherwise we have a collection of fixes and cleanups. Dave Sterba gets a cookie for removing the most lines (thanks Dave)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (139 commits) btrfs: Fix compile error when CONFIG_SECURITY is not set. Btrfs: fix compiles when CONFIG_BTRFS_FS_RUN_SANITY_TESTS is off btrfs: Make btrfs handle security mount options internally to avoid losing security label. Btrfs: send, don't delay dir move if there's a new parent inode btrfs: add more superblock checks Btrfs: fix race in WAIT_SYNC ioctl Btrfs: be aware of btree inode write errors to avoid fs corruption Btrfs: remove redundant btrfs_verify_qgroup_counts declaration. btrfs: fix shadow warning on cmp Btrfs: fix compilation errors under DEBUG Btrfs: fix crash of btrfs_release_extent_buffer_page Btrfs: add missing end_page_writeback on submit_extent_page failure btrfs: Fix the wrong condition judgment about subset extent map Btrfs: fix build_backref_tree issue with multiple shared blocks Btrfs: cleanup error handling in build_backref_tree btrfs: move checks for DUMMY_ROOT into a helper btrfs: new define for the inline extent data start btrfs: kill extent_buffer_page helper btrfs: drop constant param from btrfs_release_extent_buffer_page btrfs: hide typecast to definition of BTRFS_SEND_TRANS_STUB ...
2014-10-10Merge branch 'for-3.18' of ↵Linus Torvalds2-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: "A lot of activities on percpu front. Notable changes are... - percpu allocator now can take @gfp. If @gfp doesn't contain GFP_KERNEL, it tries to allocate from what's already available to the allocator and a work item tries to keep the reserve around certain level so that these atomic allocations usually succeed. This will replace the ad-hoc percpu memory pool used by blk-throttle and also be used by the planned blkcg support for writeback IOs. Please note that I noticed a bug in how @gfp is interpreted while preparing this pull request and applied the fix 6ae833c7fe0c ("percpu: fix how @gfp is interpreted by the percpu allocator") just now. - percpu_ref now uses longs for percpu and global counters instead of ints. It leads to more sparse packing of the percpu counters on 64bit machines but the overhead should be negligible and this allows using percpu_ref for refcnting pages and in-memory objects directly. - The switching between percpu and single counter modes of a percpu_ref is made independent of putting the base ref and a percpu_ref can now optionally be initialized in single or killed mode. This allows avoiding percpu shutdown latency for cases where the refcounted objects may be synchronously created and destroyed in rapid succession with only a fraction of them reaching fully operational status (SCSI probing does this when combined with blk-mq support). It's also planned to be used to implement forced single mode to detect underflow more timely for debugging. There's a separate branch percpu/for-3.18-consistent-ops which cleans up the duplicate percpu accessors. That branch causes a number of conflicts with s390 and other trees. I'll send a separate pull request w/ resolutions once other branches are merged" * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits) percpu: fix how @gfp is interpreted by the percpu allocator blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky percpu_ref: add PERCPU_REF_INIT_* flags percpu_ref: decouple switching to percpu mode and reinit percpu_ref: decouple switching to atomic mode and killing percpu_ref: add PCPU_REF_DEAD percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch percpu_ref: replace pcpu_ prefix with percpu_ percpu_ref: minor code and comment updates percpu_ref: relocate percpu_ref_reinit() Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe" Revert "percpu: free percpu allocation info for uniprocessor system" percpu-refcount: make percpu_ref based on longs instead of ints percpu-refcount: improve WARN messages percpu: fix locking regression in the failure path of pcpu_alloc() percpu-refcount: add @gfp to percpu_ref_init() proportions: add @gfp to init functions percpu_counter: add @gfp to percpu_counter_init() percpu_counter: make percpu_counters_lock irq-safe ...
2014-10-09vfs: Make d_invalidate return voidEric W. Biederman1-4/+1
Now that d_invalidate can no longer fail, stop returning a useless return code. For the few callers that checked the return code update remove the handling of d_invalidate failure. Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-08btrfs: Fix compile error when CONFIG_SECURITY is not set.Qu Wenruo1-0/+2
Fix the following compile error when CONFIG_SECURITY is not set: error: 'struct security_mnt_opts' has no member named 'num_mnt_opts' Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-07Merge branch 'for-linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull "trivial tree" updates from Jiri Kosina: "Usual pile from trivial tree everyone is so eagerly waiting for" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) Remove MN10300_PROC_MN2WS0038 mei: fix comments treewide: Fix typos in Kconfig kprobes: update jprobe_example.c for do_fork() change Documentation: change "&" to "and" in Documentation/applying-patches.txt Documentation: remove obsolete pcmcia-cs from Changes Documentation: update links in Changes Documentation: Docbook: Fix generated DocBook/kernel-api.xml score: Remove GENERIC_HAS_IOMAP gpio: fix 'CONFIG_GPIO_IRQCHIP' comments tty: doc: Fix grammar in serial/tty dma-debug: modify check_for_stack output treewide: fix errors in printk genirq: fix reference in devm_request_threaded_irq comment treewide: fix synchronize_rcu() in comments checkstack.pl: port to AArch64 doc: queue-sysfs: minor fixes init/do_mounts: better syntax description MIPS: fix comment spelling powerpc/simpleboot: fix comment ...
2014-10-07Btrfs: fix compiles when CONFIG_BTRFS_FS_RUN_SANITY_TESTS is offChris Mason2-3/+2
Commit fccb84c94 moved added some helpers to cleanup our sanity tests, but it looks like both Dave and I always compile with the tests enabled. This fixes things to work when they are turned off too. Signed-off-by: Chris Mason <clm@fb.com>
2014-10-06btrfs: Make btrfs handle security mount options internally to avoid losing ↵Qu Wenruo2-5/+97
security label. [BUG] Originally when mount btrfs with "-o subvol=" mount option, btrfs will lose all security lable. And if the btrfs fs is mounted somewhere else, due to the lost of security lable, SELinux will refuse to mount since the same super block is being mounted using different security lable. [REPRODUCER] With SELinux enabled: #mkfs -t btrfs /dev/sda5 #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs #btrfs subvolume create /mnt/btrfs/subvol #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/test kernel message: SELinux: mount invalid. Same superblock, different security settings for (dev sda5, type btrfs) [REASON] This happens because btrfs will call vfs_kern_mount() and then mount_subtree() to handle subvolume name lookup. First mount will cut off all the security lables and when it comes to the second vfs_kern_mount(), it has no security label now. [FIX] This patch will makes btrfs behavior much more like nfs, which has the type flag FS_BINARY_MOUNTDATA, making btrfs handles the security label internally. So security label will be set in the real mount time and won't lose label when use with "subvol=" mount option. Reported-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-04Merge branch 'remove-unlikely' of ↵Chris Mason6-16/+16
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus
2014-10-04Merge branch 'cleanup/blocksize-diet-part1' of ↵Chris Mason9-105/+57
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus
2014-10-04Merge branch 'cleanup/misc-for-3.18' of ↵Chris Mason17-135/+130
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus Signed-off-by: Chris Mason <clm@fb.com> Conflicts: fs/btrfs/extent_io.c
2014-10-03Btrfs: send, don't delay dir move if there's a new parent inodeFilipe Manana1-1/+1
If between two snapshots we rename an existing directory named X to Y and make it a child (direct or not) of a new inode named X, we were delaying the move/rename of the former directory unnecessarily, which would result in attempting to rename the new directory from its orphan name to name X prematurely. Minimal reproducer: $ mkfs.btrfs -f /dev/vdd $ mount /dev/vdd /mnt $ mkdir -p /mnt/merlin/RC/OSD/Source $ btrfs subvolume snapshot -r /mnt /mnt/mysnap1 $ mkdir /mnt/OSD $ mv /mnt/merlin/RC/OSD /mnt/OSD/OSD-Plane_788 $ mv /mnt/OSD /mnt/merlin/RC $ btrfs subvolume snapshot -r /mnt /mnt/mysnap2 $ btrfs send /mnt/mysnap1 -f /tmp/1.snap $ btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/2.snap $ mkfs.btrfs -f /dev/vdc $ mount /dev/vdc /mnt2 $ btrfs receive /mnt2 -f /tmp/1.snap $ btrfs receive /mnt2 -f /tmp/2.snap The second receive (from an incremental send) failed with the following error message: "rename o261-7-0 -> merlin/RC/OSD failed". This is a regression introduced in the 3.16 kernel. A test case for xfstests follows. Reported-by: Marc Merlin <marc@merlins.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-03btrfs: add more superblock checksDavid Sterba1-2/+65
Populate btrfs_check_super_valid() with checks that try to verify consistency of superblock by additional conditions that may arise from corrupted devices or bitflips. Some of tests are only hints and issue warnings instead of failing the mount, basically when the checks are derived from the data found in the superblock. Tested on a broken image provided by Qu. Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-03Btrfs: fix race in WAIT_SYNC ioctlSage Weil1-3/+9
We check whether transid is already committed via last_trans_committed and then search through trans_list for pending transactions. If last_trans_committed is updated by btrfs_commit_transaction after we check it (there is no locking), we will fail to find the committed transaction and return EINVAL to the caller. This has been observed occasionally by ceph-osd (which uses this ioctl heavily). Fix by rechecking whether the provided transid <= last_trans_committed after the search fails, and if so return 0. Signed-off-by: Sage Weil <sage@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-03Btrfs: be aware of btree inode write errors to avoid fs corruptionFilipe Manana6-12/+114
While we have a transaction ongoing, the VM might decide at any time to call btree_inode->i_mapping->a_ops->writepages(), which will start writeback of dirty pages belonging to btree nodes/leafs. This call might return an error or the writeback might finish with an error before we attempt to commit the running transaction. If this happens, we might have no way of knowing that such error happened when we are committing the transaction - because the pages might no longer be marked dirty nor tagged for writeback (if a subsequent modification to the extent buffer didn't happen before the transaction commit) which makes filemap_fdata[write|wait]_range unable to find such pages (even if they're marked with SetPageError). So if this happens we must abort the transaction, otherwise we commit a super block with btree roots that point to btree nodes/leafs whose content on disk is invalid - either garbage or the content of some node/leaf from a past generation that got cowed or deleted and is no longer valid (for this later case we end up getting error messages like "parent transid verify failed on 10826481664 wanted 25748 found 29562" when reading btree nodes/leafs from disk). Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's i_mapping would not be enough because we need to distinguish between log tree extents (not fatal) vs non-log tree extents (fatal) and because the next call to filemap_fdatawait_range() will catch and clear such errors in the mapping - and that call might be from a log sync and not from a transaction commit, which means we would not know about the error at transaction commit time. Also, checking for the eb flag EXTENT_BUFFER_IOERR at transaction commit time isn't done and would not be completely reliable, as the eb might be removed from memory and read back when trying to get it, which clears that flag right before reading the eb's pages from disk, making us not know about the previous write error. Using the new 3 flags for the btree inode also makes us achieve the goal of AS_EIO/AS_ENOSPC when writepages() returns success, started writeback for all dirty pages and before filemap_fdatawait_range() is called, the writeback for all dirty pages had already finished with errors - because we were not using AS_EIO/AS_ENOSPC, filemap_fdatawait_range() would return success, as it could not know that writeback errors happened (the pages were no longer tagged for writeback). Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-03Btrfs: remove redundant btrfs_verify_qgroup_counts declaration.Fabian Frederick1-2/+0
Do like disk-io function declared under CONFIG_BTRFS_FS_RUN_SANITY_TESTS and keep prototype in qgroup.h only Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-03btrfs: fix shadow warning on cmpFabian Frederick1-4/+4
cmp was declared twice in btrfs_compare_trees resulting in a shadow warning. This patch renames second internal variable. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-03Btrfs: fix compilation errors under DEBUGFabian Frederick1-2/+2
bi_sector and bi_size moved to bi_iter since commit 4f024f3797c4 ("block: Abstract out bvec iterator") Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-03Btrfs: fix crash of btrfs_release_extent_buffer_pageLiu Bo1-0/+1
This is actually inspired by Filipe's patch. When write_one_eb() fails on submit_extent_page(), it'll give up writing this eb and mark it with EXTENT_BUFFER_IOERR. So if it's not the last page that encounter the failure, there are some left pages which remain DIRTY, and if a later COW on this eb happens, ie. eb is COWed and freed, it'd run into BUG_ON in btrfs_release_extent_buffer_page() for the DIRTY page, ie. BUG_ON(PageDirty(page)); This adds the missing clear_page_dirty_for_io() for the rest pages of eb. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-03Btrfs: add missing end_page_writeback on submit_extent_page failureFilipe Manana1-0/+1
If submit_extent_page() fails in write_one_eb(), we end up with the current page not marked dirty anymore, unlocked and marked for writeback. But we never end up calling end_page_writeback() against the page, which will make calls to filemap_fdatawait_range (e.g. at transaction commit time) hang forever waiting for the writeback bit to be cleared from the page. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-03btrfs: Fix the wrong condition judgment about subset extent mapQu Wenruo1-1/+1
Previous commit: btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map is using wrong condition to judgement whether the range is a subset of a existing extent map. This may cause bug in btrfs no-holes mode. This patch will correct the judgment and fix the bug. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-03Btrfs: fix build_backref_tree issue with multiple shared blocksJosef Bacik1-1/+4
Marc Merlin sent me a broken fs image months ago where it would blow up in the upper->checked BUG_ON() in build_backref_tree. This is because we had a scenario like this block a -- level 4 (not shared) | block b -- level 3 (reloc block, shared) | block c -- level 2 (not shared) | block d -- level 1 (shared) | block e -- level 0 (shared) We go to build a backref tree for block e, we notice block d is shared and add it to the list of blocks to lookup it's backrefs for. Now when we loop around we will check edges for the block, so we will see we looked up block c last time. So we lookup block d and then see that the block that points to it is block c and we can just skip that edge since we've already been up this path. The problem is because we clear need_check when we see block d (as it is shared) we never add block b as needing to be checked. And because block c is in our path already we bail out before we walk up to block b and add it to the backref check list. To fix this we need to reset need_check if we trip over a block that doesn't need to be checked. This will make sure that any subsequent blocks in the path as we're walking up afterwards are added to the list to be processed. With this patch I can now mount Marc's fs image and it'll complete the balance without panicing. Thanks, Reported-by: Marc MERLIN <marc@merlins.org> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-03Btrfs: cleanup error handling in build_backref_treeJosef Bacik1-29/+59
When balance panics it tends to panic in the BUG_ON(!upper->checked); test, because it means it couldn't build the backref tree properly. This is annoying to users and frankly a recoverable error, nothing in this function is actually fatal since it is just an in-memory building of the backrefs for a given bytenr. So go through and change all the BUG_ON()'s to ASSERT()'s, and fix the BUG_ON(!upper->checked) thing to just return an error. This patch also fixes the error handling so it tears down the work we've done properly. This code was horribly broken since we always just panic'ed instead of actually erroring out, so it needed to be completely re-worked. With this patch my broken image no longer panics when I mount it. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-10-02btrfs: move checks for DUMMY_ROOT into a helperDavid Sterba5-21/+23
Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: new define for the inline extent data startDavid Sterba2-10/+9
Use a common definition for the inline data start so we don't have to open-code it and introduce bugs like "Btrfs: fix wrong max inline data size limit" fixed. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: kill extent_buffer_page helperDavid Sterba2-35/+26
It used to be more complex but now it's just a simple array access. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: drop constant param from btrfs_release_extent_buffer_pageDavid Sterba1-9/+6
All callers use the same value, simplify the function. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: hide typecast to definition of BTRFS_SEND_TRANS_STUBDavid Sterba4-5/+4
Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: let merge_reloc_roots return voidDavid Sterba1-2/+1
Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: remove unused members from struct scrub_warningDavid Sterba1-15/+2
Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: use slab for end_io_wq structuresDavid Sterba3-10/+38
The structure is frequently reused. Rename it according to the slab name. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: fix error labels in init_btrfs_fsDavid Sterba1-2/+2
btrfs_interface_init rarely fails but we could leak the prelim_ref slab. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: use enum for wq endio metadata typeDavid Sterba4-18/+14
The enum exists but is not consistently used. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: remove unused extent state bitsDavid Sterba1-4/+0
The last users are long gone. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02Btrfs: set default max_inline to 8KiB instead of 8MiBFilipe David Borba Manana3-2/+3
8MiB is way too large and likely set by mistake. This is not a significant issue as in practice the max amount of data added to an inline extent is also limited by the page cache and btree leaf sizes. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: remove blocksize from btrfs_alloc_free_block and renameDavid Sterba5-27/+21
Rename to btrfs_alloc_tree_block as it fits to the alloc/find/free + _tree_block family. The parameter blocksize was set to the metadata block size, directly or indirectly. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: remove unused parameter blocksize from btrfs_find_tree_blockDavid Sterba4-12/+9
Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: remove parameter blocksize from read_tree_blockDavid Sterba7-37/+18
We know the tree block size, no need to pass it around. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: inline code of reada_tree_block and remove itDavid Sterba1-10/+2
It's trivial with a single user. And remove one pointless BUG_ON. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: return void from readahead_tree_blockDavid Sterba3-8/+4
Errors in readahead are not fatal and ignored elsewhere in the code. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: remove unused parameter from readahead_tree_blockDavid Sterba5-16/+8
The parent_transid parameter has been unused since its introduction in ca7a79ad8dbe2466 ("Pass down the expected generation number when reading tree blocks"). In reada_tree_block, it was even wrongly set to leafsize. Transid check is done in the proper read and readahead ignores errors. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: remove unlikely from data-dependent branches and slow pathsDavid Sterba5-10/+10
There are the branch hints that obviously depend on the data being processed, the CPU predictor will do better job according to the actual load. It also does not make sense to use the hints in slow paths that do a lot of other operations like locking, waiting or IO. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-02btrfs: remove unlikely from NULL checksDavid Sterba2-6/+6
Unlikely is implicit for NULL checks of pointers. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-01btrfs: remove unused variable from btrfs_parse_optionsDavid Sterba1-3/+1
Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-01btrfs: defrag, use unsigned type for extent threshDavid Sterba1-4/+4
Signed type mismatches the ioctl structure, all extent calculations are done on unsigned types. Signed-off-by: David Sterba <dsterba@suse.cz>