summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-04-29btrfs: Remove redundant inode argument from btrfs_add_ordered_sumNikolay Borisov4-8/+6
Ordered csums are keyed off of a btrfs_ordered_extent, which already has a reference to the inode. This implies that an explicit inode argument is redundant. So remove it. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Do mandatory tree block check before submitting bioQu Wenruo1-0/+13
There are at least 2 reports about a memory bit flip sneaking into on-disk data. Currently we only have a relaxed check triggered at btrfs_mark_buffer_dirty() time, as it's not mandatory and only for CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build, it doesn't help users to detect such problem. This patch will address the hole by triggering comprehensive check on tree blocks before writing it back to disk. The design points are: - Timing of the check: Tree block write hook This timing is chosen to reduce the overhead. The comprehensive check should be as expensive as a checksum calculation. Doing full check at btrfs_mark_buffer_dirty() is too expensive for end user. - Loose empty leaf check Originally for an empty leaf, tree-checker will report error if it's not a tree root. The problem for such check at write time is: * False alert for tree root created in current transaction In that case, the commit root still needs to be written to disk. And since current root can differ from commit root, then it will cause false alert. This happens for log tree. * False alert for relocated tree block Relocated tree block can be written to disk due to memory pressure, in that case an empty csum tree root can be written to disk and cause false alert, since csum root node hasn't been updated. Previous patch of removing comprehensive empty leaf owner check has paved the way for this patch. The example error output will be something like: BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0) BTRFS error (device dm-3): block=1350630375424 write time tree block corruption detected BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction) BTRFS info (device dm-3): forced readonly BTRFS warning (device dm-3): Skipping commit of aborted transaction. BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure BTRFS info (device dm-3): delayed_refs has NO entry Reported-by: Leonard Lausen <leonard@lausen.nl> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: Remove comprehensive root owner checkQu Wenruo1-24/+0
Commit 1ba98d086fe3 ("Btrfs: detect corruption when non-root leaf has zero item") introduced comprehensive root owner checker. However it's pretty expensive tree search to locate the owner root, especially when it get reused by mandatory read and write time tree-checker. This patch will remove that check, and completely rely on owner based empty leaf check, which is much faster and still works fine for most case. And since we skip the old root owner check, now write time tree check can be merged with btrfs_check_leaf_full(). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29Btrfs: fix data bytes_may_use underflow with fallocate due to failed quota ↵Robbie Ko1-1/+2
reserve When doing fallocate, we first add the range to the reserve_list and then reserve the quota. If quota reservation fails, we'll release all reserved parts of reserve_list. However, cur_offset is not updated to indicate that this range is already been inserted into the list. Therefore, the same range is freed twice. Once at list_for_each_entry loop, and once at the end of the function. This will result in WARN_ON on bytes_may_use when we free the remaining space. At the end, under the 'out' label we have a call to: btrfs_free_reserved_data_space(inode, data_reserved, alloc_start, alloc_end - cur_offset); The start offset, third argument, should be cur_offset. Everything from alloc_start to cur_offset was freed by the list_for_each_entry_safe_loop. Fixes: 18513091af94 ("btrfs: update btrfs_space_info's bytes_may_use timely") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: get fs_info from eb in read_one_devDavid Sterba1-3/+3
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: get fs_info from eb in read_one_chunkDavid Sterba1-4/+4
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: get fs_info from eb in btrfs_check_chunk_validDavid Sterba3-7/+5
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: get fs_info from eb in should_balance_chunkDavid Sterba1-4/+3
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: get fs_info from eb in btrfs_check_nodeDavid Sterba3-3/+4
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: get fs_info from eb in btrfs_check_leaf_relaxedDavid Sterba3-5/+3
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: get fs_info from eb in btrfs_check_leaf_fullDavid Sterba3-5/+3
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bitNikolay Borisov1-63/+28
Instead of always calling the allocator to search for a free extent, that satisfies the input criteria, switch btrfs_trim_free_extents to using find_first_clear_extent_bit. With this change it's no longer necessary to read the device tree in order to figure out holes in the devices. Now the code always searches in-memory data structure to figure out the space range which contains the requested which should result in speed improvements. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Implement find_first_clear_extent_bitNikolay Borisov2-0/+75
This function is very similar to find_first_extent_bit except that it locates the first contiguous span of space which does not have bits set. It's intended use is in the freespace trimming code. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Optimize unallocated chunks discardNikolay Borisov3-3/+66
Currently unallocated chunks are always trimmed. For example 2 consecutive trims on large storage would trim freespace twice irrespective of whether the space was actually allocated or not between those trims. Optimise this behavior by exploiting the newly introduced alloc_state tree of btrfs_device. A new CHUNK_TRIMMED bit is used to mark those unallocated chunks which have been trimmed and have not been allocated afterwards. On chunk allocation the respective underlying devices' physical space will have its CHUNK_TRIMMED flag cleared. This avoids submitting discards for space which hasn't been changed since the last time discard was issued. This applies to the single mount period of the filesystem as the information is not stored permanently. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Factor out in_range macroNikolay Borisov3-2/+2
This is used in more than one places so let's factor it out in ctree.h. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Remove 'trans' argument from find_free_dev_extent(_start)Nikolay Borisov3-45/+10
Now that these functions no longer require a handle to transaction to inspect pending/pinned chunks the argument can be removed. At the same time also remove any surrounding code which acquired the handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: replace pending/pinned chunks lists with io treeJeff Mahoney9-119/+63
The pending chunks list contains chunks that are allocated in the current transaction but haven't been created yet. The pinned chunks list contains chunks that are being released in the current transaction. Both describe chunks that are not reflected on disk as in use but are unavailable just the same. The pending chunks list is anchored by the transaction handle, which means that we need to hold a reference to a transaction when working with the list. The way we use them is by iterating over both lists to perform comparisons on the stripes they describe for each device. This is backwards and requires that we keep a transaction handle open while we're trimming. This patchset adds an extent_io_tree to btrfs_device that maintains the allocation state of the device. Extents are set dirty when chunks are first allocated -- when the extent maps are added to the mapping tree. They're cleared when last removed -- when the extent maps are removed from the mapping tree. This matches the lifespan of the pending and pinned chunks list and allows us to do trims on unallocated space safely without pinning the transaction for what may be a lengthy operation. We can also use this io tree to mark which chunks have already been trimmed so we don't repeat the operation. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Transpose btrfs_close_devices/btrfs_mapping_tree_free in close_ctreeNikolay Borisov1-1/+1
Following the introduction of the alloc_state tree, some of the callees of btrfs_mapping_tree_free will have to interact with the btrfs_device of the constituent devices. Enable this by moving the code responsible for freeing devices after the last user (btrfs_mapping_tree_free). Otherwise the kernel could crash due to use-after-free. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Stop using call_rcu for device freeingNikolay Borisov2-13/+8
btrfs_device structs are freed from RCU context since device iteration is protected by RCU. Currently this is achieved by using call_rcu since no blocking functions are called within btrfs_free_device. Future refactoring of pending/pinned chunks will require calling sleeping functions. This patch is in preparation for these changes by simply switching from RCU callbacks to explicit calls of synchronize_rcu and calling btrfs_free_device directly. This is functionally equivalent, making sure that there are no readers at that time. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Implement set_extent_bits_nowaitNikolay Borisov2-0/+9
It will be used in a future patch that will require modifying an extent_io_tree struct under a spinlock. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Introduce new bits for device allocation treeNikolay Borisov1-0/+3
Rather than hijacking the existing defines let's just define new bits, with more descriptive names. Instead of using yet more (currently at 18) bits for the new flags, use the fact those flags will be specific to the device allocation tree so define them using existing EXTENT_* flags. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Populate ->orig_block_len during read_one_chunkNikolay Borisov1-19/+22
Chunks read from disk currently don't get their ->orig_block_len member set, in contrast when a new chunk is allocated, the respective extent_map's ->orig_block_len is assigned the size of the stripe of this chunk. Let's apply the same strategy for chunks which are read from disk, not only does this codify the invariant that ->orig_block_len always contains the size of the stripe for a chunk (when the em belongs to the mapping tree). But it's also a preparatory patch for further work around tracking chunk allocation in an extent tree rather than pinned/pending lists. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Rename and export clear_btree_io_treeNikolay Borisov3-33/+34
This function is going to be used to clear out the device extent allocation information. Give it a more generic name and export it. This is in preparation to replacing the pending/pinned chunk lists with an extent tree. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Handle pending/pinned chunks before blockgroup relocation during ↵Nikolay Borisov1-32/+23
device shrink During device shrink pinned/pending chunks (i.e. those which have been deleted/created respectively, in the current transaction and haven't touched disk) need to be accounted when doing device shrink. Presently this happens after the main relocation loop in btrfs_shrink_device, which could lead to making another go in the body of the function. Since there is no hard requirement to perform pinned/pending chunks handling after the relocation loop, move the code before it. This leads to simplifying the code flow around - i.e. no need to use 'goto again'. A notable side effect of this change is that modification of the device's size requires a transaction to be started and committed before the relocation loop starts. This is necessary to ensure that relocation process sees the shrunk device size. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: combine device update operations during transaction commitNikolay Borisov6-64/+51
We currently overload the pending_chunks list to handle updating btrfs_device->commit_bytes used. We don't actually care about the extent mapping or even the device mapping for the chunk - we just need the device, and we can end up processing it multiple times. The fs_devices->resized_list does more or less the same thing, but with the disk size. They are called consecutively during commit and have more or less the same purpose. We can combine the two lists into a single list that attaches to the transaction and contains a list of devices that need updating. Since we always add the device to a list when we change bytes_used or disk_total_size, there's no harm in copying both values at once. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: Honour FITRIM range constraints during free space trimNikolay Borisov1-6/+19
Up until now trimming the freespace was done irrespective of what the arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments will be entirely ignored. Fix it by correctly handling those paramter. This requires breaking if the found freespace extent is after the end of the passed range as well as completing trim after trimming fstrim_range::len bytes. Fixes: 499f377f49f0 ("btrfs: iterate over unused chunk space in FITRIM") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29Btrfs: send, improve clone rangeRobbie Ko1-19/+33
Improve clone_range in two scenarios. 1. Remove the limit of inode size when find clone inodes We can do partial clone, so there is no need to limit the size of the candidate inode. When clone a range, we clone the legal range only by bytenr, offset, len, inode size. 2. In the scenarios of rewrite or clone_range, data_offset rarely matches exactly, so the chance of a clone is missed. e.g. 1. Write a 1M file dd if=/dev/zero of=1M bs=1M count=1 2. Clone 1M file cp --reflink 1M clone 3. Rewrite 4k on the clone file dd if=/dev/zero of=clone bs=4k count=1 conv=notrunc The disk layout is as follows: item 16 key (257 EXTENT_DATA 0) itemoff 15353 itemsize 53 extent data disk byte 1103101952 nr 1048576 extent data offset 0 nr 1048576 ram 1048576 extent compression(none) ... item 22 key (258 EXTENT_DATA 0) itemoff 14959 itemsize 53 extent data disk byte 1104150528 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression(none) item 23 key (258 EXTENT_DATA 4096) itemoff 14906 itemsize 53 extent data disk byte 1103101952 nr 1048576 extent data offset 4096 nr 1044480 ram 1048576 extent compression(none) When send, inode 258 file offset 4096~1048576 (item 23) has a chance to clone_range, but because data_offset does not match inode 257 (item 16), it causes missed clone and can only transfer actual data. Improve the problem by judging whether the current data_offset has overlap with the file extent item, and if so, adjusting offset and extent_len so that we can clone correctly. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: prop: open code btrfs_set_prop in inherit_propAnand Jain1-6/+24
When an inode inherits property from its parent, we call btrfs_set_prop(). btrfs_set_prop() does an elaborate checks, which is not required in the context of inheriting a property. Instead just open-code only the required items from btrfs_set_prop() and then call btrfs_setxattr() directly. So now the only user of btrfs_set_prop() is gone, (except for the wraper function btrfs_set_prop_trans()). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: drop unused parameter in mount_subvolAnand Jain1-2/+2
@device_name in mount_subvol() is not used, drop it. Also see: 5bedc48a8f9e ("btrfs: drop unused parameters from mount_subvol"). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in check_inode_itemDavid Sterba1-3/+3
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in check_dev_itemDavid Sterba1-3/+3
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in dev_item_errDavid Sterba1-8/+7
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in chunk_errDavid Sterba1-14/+14
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in check_leafDavid Sterba1-4/+4
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in check_leaf_itemDavid Sterba1-6/+5
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in check_extent_data_itemDavid Sterba1-3/+3
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in check_block_group_itemDavid Sterba1-3/+2
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in block_group_errDavid Sterba1-9/+9
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in check_dir_itemDavid Sterba1-3/+3
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in dir_item_errDavid Sterba1-13/+13
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in check_csum_itemDavid Sterba1-3/+3
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in file_extent_errDavid Sterba1-11/+11
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: get fs_info from eb in generic_errDavid Sterba1-18/+18
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: inode: Verify inode mode to avoid NULL pointer dereferenceQu Wenruo2-21/+47
[BUG] When accessing a file on a crafted image, btrfs can crash in block layer: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 PGD 136501067 P4D 136501067 PUD 124519067 PMD 0 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc8-default #252 RIP: 0010:end_bio_extent_readpage+0x144/0x700 Call Trace: <IRQ> blk_update_request+0x8f/0x350 blk_mq_end_request+0x1a/0x120 blk_done_softirq+0x99/0xc0 __do_softirq+0xc7/0x467 irq_exit+0xd1/0xe0 call_function_single_interrupt+0xf/0x20 </IRQ> RIP: 0010:default_idle+0x1e/0x170 [CAUSE] The crafted image has a tricky corruption, the INODE_ITEM has a different type against its parent dir: item 20 key (268 INODE_ITEM 0) itemoff 2808 itemsize 160 generation 13 transid 13 size 1048576 nbytes 1048576 block group 0 mode 121644 links 1 uid 0 gid 0 rdev 0 sequence 9 flags 0x0(none) This mode number 0120000 means it's a symlink. But the dir item think it's still a regular file: item 8 key (264 DIR_INDEX 5) itemoff 3707 itemsize 32 location key (268 INODE_ITEM 0) type FILE transid 13 data_len 0 name_len 2 name: f4 item 40 key (264 DIR_ITEM 51821248) itemoff 1573 itemsize 32 location key (268 INODE_ITEM 0) type FILE transid 13 data_len 0 name_len 2 name: f4 For symlink, we don't set BTRFS_I(inode)->io_tree.ops and leave it empty, as symlink is only designed to have inlined extent, all handled by tree block read. Thus no need to trigger btrfs_submit_bio_hook() for inline file extent. However end_bio_extent_readpage() expects tree->ops populated, as it's reading regular data extent. This causes NULL pointer dereference. [FIX] This patch fixes the problem in two ways: - Verify inode mode against its dir item when looking up inode So in btrfs_lookup_dentry() if we find inode mode mismatch with dir item, we error out so that corrupted inode will not be accessed. - Verify inode mode when getting extent mapping Only regular file should have regular or preallocated extent. If we found regular/preallocated file extent for symlink or the rest, we error out before submitting the read bio. With this fix that crafted image can be rejected gracefully: BTRFS critical (device loop0): inode mode mismatch with dir: inode mode=0121644 btrfs type=7 dir type=1 Reported-by: Yoon Jungyeon <jungyeon@gatech.edu> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202763 Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: Verify inode itemQu Wenruo2-0/+109
There is a report in kernel bugzilla about mismatch file type in dir item and inode item. This inspires us to check inode mode in inode item. This patch will check the following members: - inode key objectid Should be ROOT_DIR_DIR or [256, (u64)-256] or FREE_INO. - inode key offset Should be 0 - inode item generation - inode item transid No newer than sb generation + 1. The +1 is for log tree. - inode item mode No unknown bits. No invalid S_IF* bit. NOTE: S_IFMT check is not enough, need to check every know type. - inode item nlink Dir should have no more link than 1. - inode item flags Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: Enhance chunk checker to validate chunk profileQu Wenruo1-0/+7
Btrfs-progs already have a comprehensive type checker, to ensure there is only 0 (SINGLE profile) or 1 (DUP/RAID0/1/5/6/10) bit set for chunk profile bits. Do the same work for kernel. Reported-by: Yoon Jungyeon <jungyeon@gatech.edu> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202765 Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: Verify dev itemQu Wenruo3-9/+83
[BUG] For fuzzed image whose DEV_ITEM has invalid total_bytes as 0, then kernel will just panic: BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 #PF error: [normal kernel read fault] PGD 800000022b2bd067 P4D 800000022b2bd067 PUD 22b2bc067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 1106 Comm: mount Not tainted 5.0.0-rc8+ #9 RIP: 0010:btrfs_verify_dev_extents+0x2a5/0x5a0 Call Trace: open_ctree+0x160d/0x2149 btrfs_mount_root+0x5b2/0x680 [CAUSE] If device extent verification finds a deivce with 0 total_bytes, then it assumes it's a seed dummy, then search for seed devices. But in this case, there is no seed device at all, causing NULL pointer. [FIX] Since this is caused by fuzzed image, let's go the tree-check way, just add a new verification for device item. Reported-by: Yoon Jungyeon <jungyeon@gatech.edu> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202691 Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: Check chunk item at tree block read timeQu Wenruo2-3/+15
Since we have btrfs_check_chunk_valid() in tree-checker, let's do chunk item verification in tree-checker too. Since the tree-checker is run at endio time, if one chunk leaf fails chunk verification, we can still retry the other copy, making btrfs more robust to fuzzed image as we may still get a good chunk item. Also since we have done chunk verification in tree block read time, skip the btrfs_check_chunk_valid() call in read_one_chunk() if we're reading chunk items from leaf. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: Make btrfs_check_chunk_valid() return EUCLEAN instead ↵Qu Wenruo1-11/+11
of EIO To follow the standard behavior of tree-checker. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: Make chunk item checker messages more readableQu Wenruo1-13/+68
Old error message would be something like: BTRFS error (device dm-3): invalid chunk num_stipres: 0 New error message would be: Btrfs critical (device dm-3): corrupt superblock syschunk array: chunk_start=2097152, invalid chunk num_stripes: 0 Or Btrfs critical (device dm-3): corrupt leaf: root=3 block=8388608 slot=3 chunk_start=2097152, invalid chunk num_stripes: 0 And for certain error message, also output expected value. The error message levels are changed from error to critical. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>