summaryrefslogtreecommitdiffstats
path: root/fs/btrfs/tree-checker.c
AgeCommit message (Collapse)AuthorFilesLines
2022-12-05btrfs: pass the extent buffer for the btrfs_item_nr helpersJosef Bacik1-2/+2
This is actually a change for extent tree v2, but it exists in btrfs-progs but not in the kernel. This makes it annoying to sync accessors.h with btrfs-progs, and since this is the way I need it for extent-tree v2 simply update these helpers to take the extent buffer in order to make syncing possible now, and make the extent tree v2 stuff easier moving forward. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: move file_extent_item helpers into file-item.hJosef Bacik1-0/+1
These helpers use functions that are in multiple places, which makes it tricky to sync them into btrfs-progs. Move them to file-item.h and then include file-item.h in places that use these helpers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: extend btrfs_dir_item type to store encryption statusOmar Sandoval1-1/+1
For directories with encrypted files/filenames, we need to store a flag indicating this fact. There's no room in other fields, so we'll need to borrow a bit from dir_type. Since it's now a combination of type and flags, we rename it to dir_flags to reflect its new usage. The new flag, FT_ENCRYPTED, indicates a directory containing encrypted data, which is orthogonal to file type; therefore, add the new flag, and make conversion from directory type to file type strip the flag. As the file types almost never change we can afford to use the bits. Actual usage will be guarded behind an incompat bit, this patch only adds the support for later use by fscrypt. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: move accessor helpers into accessors.hJosef Bacik1-0/+1
This is a large patch, but because they're all macros it's impossible to split up. Simply copy all of the item accessors in ctree.h and paste them in accessors.h, and then update any files to include the header so everything compiles. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ reformat comments, style fixups ] Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: move the printk helpers out of ctree.hJosef Bacik1-0/+1
We have a bunch of printk helpers that are in ctree.h. These have nothing to do with ctree.c, so move them into their own header. Subsequent patches will cleanup the printk helpers. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: move fs wide helpers out of ctree.hJosef Bacik1-0/+1
We have several fs wide related helpers in ctree.h. The bulk of these are the incompat flag test helpers, but there are things such as btrfs_fs_closing() and the read only helpers that also aren't directly related to the ctree code. Move these into a fs.h header, which will serve as the location for file system wide related helpers. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-17btrfs: tree-checker: check for overlapping extent itemsJosef Bacik1-2/+23
We're seeing a weird problem in production where we have overlapping extent items in the extent tree. It's unclear where these are coming from, and in debugging we realized there's no check in the tree checker for this sort of problem. Add a check to the tree-checker to make sure that the extents do not overlap each other. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-16btrfs: tree-checker: check extent buffer owner against owner rootidQu Wenruo1-0/+55
Btrfs doesn't check whether the tree block respects the root owner. This means, if a tree block referred by a parent in extent tree, but has owner of 5, btrfs can still continue reading the tree block, as long as it doesn't trigger other sanity checks. Normally this is fine, but combined with the empty tree check in check_leaf(), if we hit an empty extent tree, but the root node has csum tree owner, we can let such extent buffer to sneak in. Shrink the hole by: - Do extra eb owner check at tree read time - Make sure the root owner extent buffer exactly matches the root id. Unfortunately we can't yet completely patch the hole, there are several call sites can't pass all info we need: - For reloc/log trees Their owner is key::offset, not key::objectid. We need the full root key to do that accurate check. For now, we just skip the ownership check for those trees. - For add_data_references() of relocation That call site doesn't have any parent/ownership info, as all the bytenrs are all from btrfs_find_all_leafs(). - For direct backref items walk Direct backref items records the parent bytenr directly, thus unlike indirect backref item, we don't do a full tree search. Thus in that case, we don't have full parent owner to check. For the later two cases, they all pass 0 as @owner_root, thus we can skip those cases if @owner_root is 0. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14btrfs: add support for multiple global rootsJosef Bacik1-2/+19
With extent tree v2 you will be able to create multiple csum, extent, and free space trees. They will be used based on the block group, which will now use the block_group_item->chunk_objectid to point to the set of global roots that it will use. When allocating new block groups we'll simply mod the gigabyte offset of the block group against the number of global roots we have and that will be the block groups global id. >From there we can take the bytenr that we're modifying in the respective tree, look up the block group and get that block groups corresponding global root id. From there we can get to the appropriate global root for that bytenr. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14btrfs: tree-checker: don't fail on empty extent roots for extent tree v2Josef Bacik1-1/+13
For extent tree v2 we can definitely have empty extent roots, so skip this particular check if we have that set. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02btrfs: tree-checker: use u64 for item data end to avoid overflowSu Yue1-9/+9
User reported there is an array-index-out-of-bounds access while mounting the crafted image: [350.411942 ] loop0: detected capacity change from 0 to 262144 [350.427058 ] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (1044) [350.428564 ] BTRFS info (device loop0): disk space caching is enabled [350.428568 ] BTRFS info (device loop0): has skinny extents [350.429589 ] [350.429619 ] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1 [350.429636 ] index 1048096 is out of range for type 'page *[16]' [350.429650 ] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4 [350.429652 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [350.429653 ] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs] [350.429772 ] Call Trace: [350.429774 ] <TASK> [350.429776 ] dump_stack_lvl+0x47/0x5c [350.429780 ] ubsan_epilogue+0x5/0x50 [350.429786 ] __ubsan_handle_out_of_bounds+0x66/0x70 [350.429791 ] btrfs_get_16+0xfd/0x120 [btrfs] [350.429832 ] check_leaf+0x754/0x1a40 [btrfs] [350.429874 ] ? filemap_read+0x34a/0x390 [350.429878 ] ? load_balance+0x175/0xfc0 [350.429881 ] validate_extent_buffer+0x244/0x310 [btrfs] [350.429911 ] btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs] [350.429935 ] end_bio_extent_readpage+0x3af/0x850 [btrfs] [350.429969 ] ? newidle_balance+0x259/0x480 [350.429972 ] end_workqueue_fn+0x29/0x40 [btrfs] [350.429995 ] btrfs_work_helper+0x71/0x330 [btrfs] [350.430030 ] ? __schedule+0x2fb/0xa40 [350.430033 ] process_one_work+0x1f6/0x400 [350.430035 ] ? process_one_work+0x400/0x400 [350.430036 ] worker_thread+0x2d/0x3d0 [350.430037 ] ? process_one_work+0x400/0x400 [350.430038 ] kthread+0x165/0x190 [350.430041 ] ? set_kthread_struct+0x40/0x40 [350.430043 ] ret_from_fork+0x1f/0x30 [350.430047 ] </TASK> [350.430047 ] [350.430077 ] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2 btrfs check reports: corrupt leaf: root=3 block=20975616 physical=20975616 slot=1, unexpected item end, have 4294971193 expect 3897 The first slot item offset is 4293005033 and the size is 1966160. In check_leaf, we use btrfs_item_end() to check item boundary versus extent_buffer data size. However, return type of btrfs_item_end() is u32. (u32)(4293005033 + 1966160) == 3897, overflow happens and the result 3897 equals to leaf data size reasonably. Fix it by use u64 variable to store item data end in check_leaf() to avoid u32 overflow. This commit does solve the invalid memory access showed by the stack trace. However, its metadata profile is DUP and another copy of the leaf is fine. So the image can be mounted successfully. But when umount is called, the ASSERT btrfs_mark_buffer_dirty() will be triggered because the only node in extent tree has 0 item and invalid owner. It's solved by another commit "btrfs: check extent buffer owner against the owner rootid". Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299 Reported-by: Wenqing Liu <wenqingliu0120@gmail.com> CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-31btrfs: tree-checker: check item_size for dev_itemSu Yue1-0/+8
Check item size before accessing the device item to avoid out of bound access, similar to inode_item check. Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-31btrfs: tree-checker: check item_size for inode_itemSu Yue1-0/+7
while mounting the crafted image, out-of-bounds access happens: [350.429619] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1 [350.429636] index 1048096 is out of range for type 'page *[16]' [350.429650] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4 #1 [350.429652] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [350.429653] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs] [350.429772] Call Trace: [350.429774] <TASK> [350.429776] dump_stack_lvl+0x47/0x5c [350.429780] ubsan_epilogue+0x5/0x50 [350.429786] __ubsan_handle_out_of_bounds+0x66/0x70 [350.429791] btrfs_get_16+0xfd/0x120 [btrfs] [350.429832] check_leaf+0x754/0x1a40 [btrfs] [350.429874] ? filemap_read+0x34a/0x390 [350.429878] ? load_balance+0x175/0xfc0 [350.429881] validate_extent_buffer+0x244/0x310 [btrfs] [350.429911] btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs] [350.429935] end_bio_extent_readpage+0x3af/0x850 [btrfs] [350.429969] ? newidle_balance+0x259/0x480 [350.429972] end_workqueue_fn+0x29/0x40 [btrfs] [350.429995] btrfs_work_helper+0x71/0x330 [btrfs] [350.430030] ? __schedule+0x2fb/0xa40 [350.430033] process_one_work+0x1f6/0x400 [350.430035] ? process_one_work+0x400/0x400 [350.430036] worker_thread+0x2d/0x3d0 [350.430037] ? process_one_work+0x400/0x400 [350.430038] kthread+0x165/0x190 [350.430041] ? set_kthread_struct+0x40/0x40 [350.430043] ret_from_fork+0x1f/0x30 [350.430047] </TASK> [350.430077] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2 check_leaf() is checking the leaf: corrupt leaf: root=4 block=29396992 slot=1, bad key order, prev (16140901064495857664 1 0) current (1 204 12582912) leaf 29396992 items 6 free space 3565 generation 6 owner DEV_TREE leaf 29396992 flags 0x1(WRITTEN) backref revision 1 fs uuid a62e00e8-e94e-4200-8217-12444de93c2e chunk uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8 item 0 key (16140901064495857664 INODE_ITEM 0) itemoff 3955 itemsize 40 generation 0 transid 0 size 0 nbytes 17592186044416 block group 0 mode 52667 links 33 uid 0 gid 2104132511 rdev 94223634821136 sequence 100305 flags 0x2409000(none) atime 0.0 (1970-01-01 08:00:00) ctime 2973280098083405823.4294967295 (-269783007-01-01 21:37:03) mtime 18446744071572723616.4026825121 (1902-04-16 12:40:00) otime 9249929404488876031.4294967295 (622322949-04-16 04:25:58) item 1 key (1 DEV_EXTENT 12582912) itemoff 3907 itemsize 48 dev extent chunk_tree 3 chunk_objectid 256 chunk_offset 12582912 length 8388608 chunk_tree_uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8 The corrupted leaf of device tree has an inode item. The leaf passed checksum and others checks in validate_extent_buffer until check_leaf_item(). Because of the key type BTRFS_INODE_ITEM, check_inode_item() is called even we are in the device tree. Since the item offset + sizeof(struct btrfs_inode_item) > eb->len, out-of-bounds access is triggered. The item end vs leaf boundary check has been done before check_leaf_item(), so fix it by checking item size in check_inode_item() before access of the inode item in extent buffer. Other check functions except check_dev_item() in check_leaf_item() have their item size checks. The commit for check_dev_item() is followed. No regression observed during running fstests. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299 CC: stable@vger.kernel.org # 5.10+ CC: Wenqing Liu <wenqingliu0120@gmail.com> Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: rename btrfs_item_end_nr to btrfs_item_data_endJosef Bacik1-4/+4
The name btrfs_item_end_nr() is a bit of a misnomer, as it's actually the offset of the end of the data the item points to. In fact all of the helpers that we use btrfs_item_end_nr() use data in their name, like BTRFS_LEAF_DATA_SIZE() and leaf_data(). Rename to btrfs_item_data_end() to make it clear what this helper is giving us. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: drop the _nr from the item helpersJosef Bacik1-24/+24
Now that all call sites are using the slot number to modify item values, rename the SETGET helpers to raw_item_*(), and then rework the _nr() helpers to be the btrfs_item_*() btrfs_set_item_*() helpers, and then rename all of the callers to the new helpers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23btrfs: add ro compat flags to inodesBoris Burkov1-4/+13
Currently, inode flags are fully backwards incompatible in btrfs. If we introduce a new inode flag, then tree-checker will detect it and fail. This can even cause us to fail to mount entirely. To make it possible to introduce new flags which can be read-only compatible, like VERITY, we add new ro flags to btrfs without treating them quite so harshly in tree-checker. A read-only file system can survive an unexpected flag, and can be mounted. As for the implementation, it unfortunately gets a little complicated. The on-disk representation of the inode, btrfs_inode_item, has an __le64 for flags but the in-memory representation, btrfs_inode, uses a u32. David Sterba had the nice idea that we could reclaim those wasted 32 bits on disk and use them for the new ro_compat flags. It turns out that the tree-checker code which checks for unknown flags is broken, and ignores the upper 32 bits we are hoping to use. The issue is that the flags use the literal 1 rather than 1ULL, so the flags are signed ints, and one of them is specifically (1 << 31). As a result, the mask which ORs the flags is a negative integer on machines where int is 32 bit twos complement. When tree-checker evaluates the expression: btrfs_inode_flags(leaf, iitem) & ~BTRFS_INODE_FLAG_MASK) The mask is something like 0x80000abc, which gets promoted to u64 with sign extension to 0xffffffff80000abc. Negating that 64 bit mask leaves all the upper bits zeroed, and we can't detect unexpected flags. This suggests that we can't use those bits after all. Luckily, we have good reason to believe that they are zero anyway. Inode flags are metadata, which is always checksummed, so any bit flips that would introduce 1s would cause a checksum failure anyway (excluding the improbable case of the checksum getting corrupted exactly badly). Further, unless the 1 << 31 flag is used, the cast to u64 of the 32 bit inode flag should preserve its value and not add leading zeroes (at least for twos complement). The only place that flag (BTRFS_INODE_ROOT_ITEM_INIT) is used is in a special inode embedded in the root item, and indeed for that inode we see 0xffffffff80000000 as the flags on disk. However, that inode is never seen by tree checker, nor is it used in a context where verity might be meaningful. Theoretically, a future ro flag might cause trouble on that inode, so we should proactively clean up that mess before it does. With the introduction of the new ro flags, keep two separate unsigned masks and check them against the appropriate u32. Since we no longer run afoul of sign extension, this also stops writing out 0xffffffff80000000 in root_item inodes going forward. Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23btrfs: tree-checker: add missing stripe checks for raid1c3/4 profilesDavid Sterba1-0/+4
The stripe checks for raid1c3/raid1c4 are missing in the sequence in btrfs_check_chunk_valid. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23btrfs: tree-checker: use table values for stripe checksDavid Sterba1-6/+11
There are hardcoded values in several checks regarding chunks and stripe constraints. We have that defined in the raid table and ought to use it. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: tree-checker: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set ↵Josef Bacik1-0/+5
improperly We need to validate that a data extent item does not have the FULL_BACKREF flag set on its flags. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22btrfs: tree-checker: do not error out if extent ref hash doesn't matchJosef Bacik1-12/+4
The tree checker checks the extent ref hash at read and write time to make sure we do not corrupt the file system. Generally extent references go inline, but if we have enough of them we need to make an item, which looks like key.objectid = <bytenr> key.type = <BTRFS_EXTENT_DATA_REF_KEY|BTRFS_TREE_BLOCK_REF_KEY> key.offset = hash(tree, owner, offset) However if key.offset collide with an unrelated extent reference we'll simply key.offset++ until we get something that doesn't collide. Obviously this doesn't match at tree checker time, and thus we error while writing out the transaction. This is relatively easy to reproduce, simply do something like the following xfs_io -f -c "pwrite 0 1M" file offset=2 for i in {0..10000} do xfs_io -c "reflink file 0 ${offset}M 1M" file offset=$(( offset + 2 )) done xfs_io -c "reflink file 0 17999258914816 1M" file xfs_io -c "reflink file 0 35998517829632 1M" file xfs_io -c "reflink file 0 53752752058368 1M" file btrfs filesystem sync And the sync will error out because we'll abort the transaction. The magic values above are used because they generate hash collisions with the first file in the main subvol. The fix for this is to remove the hash value check from tree checker, as we have no idea which offset ours should belong to. Reported-by: Tuomas Lähdekorpi <tuomas.lahdekorpi@gmail.com> Fixes: 0785a9aacf9d ("btrfs: tree-checker: Add EXTENT_DATA_REF check") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add comment] Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-07btrfs: tree-checker: check if chunk item end overflowsSu Yue1-0/+7
While mounting a crafted image provided by user, kernel panics due to the invalid chunk item whose end is less than start. [66.387422] loop: module loaded [66.389773] loop0: detected capacity change from 262144 to 0 [66.427708] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 12 /dev/loop0 scanned by mount (613) [66.431061] BTRFS info (device loop0): disk space caching is enabled [66.431078] BTRFS info (device loop0): has skinny extents [66.437101] BTRFS error: insert state: end < start 29360127 37748736 [66.437136] ------------[ cut here ]------------ [66.437140] WARNING: CPU: 16 PID: 613 at fs/btrfs/extent_io.c:557 insert_state.cold+0x1a/0x46 [btrfs] [66.437369] CPU: 16 PID: 613 Comm: mount Tainted: G O 5.11.0-rc1-custom #45 [66.437374] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 [66.437378] RIP: 0010:insert_state.cold+0x1a/0x46 [btrfs] [66.437420] RSP: 0018:ffff93e5414c3908 EFLAGS: 00010286 [66.437427] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000 [66.437431] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff [66.437434] RBP: ffff93e5414c3938 R08: 0000000000000001 R09: 0000000000000001 [66.437438] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d72aa0 [66.437441] R13: ffff8ec78bc71628 R14: 0000000000000000 R15: 0000000002400000 [66.437447] FS: 00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000 [66.437451] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [66.437455] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0 [66.437460] PKRU: 55555554 [66.437464] Call Trace: [66.437475] set_extent_bit+0x652/0x740 [btrfs] [66.437539] set_extent_bits_nowait+0x1d/0x20 [btrfs] [66.437576] add_extent_mapping+0x1e0/0x2f0 [btrfs] [66.437621] read_one_chunk+0x33c/0x420 [btrfs] [66.437674] btrfs_read_chunk_tree+0x6a4/0x870 [btrfs] [66.437708] ? kvm_sched_clock_read+0x18/0x40 [66.437739] open_ctree+0xb32/0x1734 [btrfs] [66.437781] ? bdi_register_va+0x1b/0x20 [66.437788] ? super_setup_bdi_name+0x79/0xd0 [66.437810] btrfs_mount_root.cold+0x12/0xeb [btrfs] [66.437854] ? __kmalloc_track_caller+0x217/0x3b0 [66.437873] legacy_get_tree+0x34/0x60 [66.437880] vfs_get_tree+0x2d/0xc0 [66.437888] vfs_kern_mount.part.0+0x78/0xc0 [66.437897] vfs_kern_mount+0x13/0x20 [66.437902] btrfs_mount+0x11f/0x3c0 [btrfs] [66.437940] ? kfree+0x5ff/0x670 [66.437944] ? __kmalloc_track_caller+0x217/0x3b0 [66.437962] legacy_get_tree+0x34/0x60 [66.437974] vfs_get_tree+0x2d/0xc0 [66.437983] path_mount+0x48c/0xd30 [66.437998] __x64_sys_mount+0x108/0x140 [66.438011] do_syscall_64+0x38/0x50 [66.438018] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [66.438023] RIP: 0033:0x7f0138827f6e [66.438033] RSP: 002b:00007ffecd79edf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 [66.438040] RAX: ffffffffffffffda RBX: 00007f013894c264 RCX: 00007f0138827f6e [66.438044] RDX: 00005593a4a41360 RSI: 00005593a4a33690 RDI: 00005593a4a3a6c0 [66.438047] RBP: 00005593a4a33440 R08: 0000000000000000 R09: 0000000000000001 [66.438050] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [66.438054] R13: 00005593a4a3a6c0 R14: 00005593a4a41360 R15: 00005593a4a33440 [66.438078] irq event stamp: 18169 [66.438082] hardirqs last enabled at (18175): [<ffffffffb81154bf>] console_unlock+0x4ff/0x5f0 [66.438088] hardirqs last disabled at (18180): [<ffffffffb8115427>] console_unlock+0x467/0x5f0 [66.438092] softirqs last enabled at (16910): [<ffffffffb8a00fe2>] asm_call_irq_on_stack+0x12/0x20 [66.438097] softirqs last disabled at (16905): [<ffffffffb8a00fe2>] asm_call_irq_on_stack+0x12/0x20 [66.438103] ---[ end trace e114b111db64298b ]--- [66.438107] BTRFS error: found node 12582912 29360127 on insert of 37748736 29360127 [66.438127] BTRFS critical: panic in extent_io_tree_panic:679: locking error: extent tree was modified by another thread while locked (errno=-17 Object already exists) [66.441069] ------------[ cut here ]------------ [66.441072] kernel BUG at fs/btrfs/extent_io.c:679! [66.442064] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [66.443018] CPU: 16 PID: 613 Comm: mount Tainted: G W O 5.11.0-rc1-custom #45 [66.444538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 [66.446223] RIP: 0010:extent_io_tree_panic.isra.0+0x23/0x25 [btrfs] [66.450878] RSP: 0018:ffff93e5414c3948 EFLAGS: 00010246 [66.451840] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000 [66.453141] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff [66.454445] RBP: ffff93e5414c3948 R08: 0000000000000001 R09: 0000000000000001 [66.455743] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d728c0 [66.457055] R13: ffff8ec78bc71628 R14: ffff8ec782d72aa0 R15: 0000000002400000 [66.458356] FS: 00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000 [66.459841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [66.460895] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0 [66.462196] PKRU: 55555554 [66.462692] Call Trace: [66.463139] set_extent_bit.cold+0x30/0x98 [btrfs] [66.464049] set_extent_bits_nowait+0x1d/0x20 [btrfs] [66.490466] add_extent_mapping+0x1e0/0x2f0 [btrfs] [66.514097] read_one_chunk+0x33c/0x420 [btrfs] [66.534976] btrfs_read_chunk_tree+0x6a4/0x870 [btrfs] [66.555718] ? kvm_sched_clock_read+0x18/0x40 [66.575758] open_ctree+0xb32/0x1734 [btrfs] [66.595272] ? bdi_register_va+0x1b/0x20 [66.614638] ? super_setup_bdi_name+0x79/0xd0 [66.633809] btrfs_mount_root.cold+0x12/0xeb [btrfs] [66.652938] ? __kmalloc_track_caller+0x217/0x3b0 [66.671925] legacy_get_tree+0x34/0x60 [66.690300] vfs_get_tree+0x2d/0xc0 [66.708221] vfs_kern_mount.part.0+0x78/0xc0 [66.725808] vfs_kern_mount+0x13/0x20 [66.742730] btrfs_mount+0x11f/0x3c0 [btrfs] [66.759350] ? kfree+0x5ff/0x670 [66.775441] ? __kmalloc_track_caller+0x217/0x3b0 [66.791750] legacy_get_tree+0x34/0x60 [66.807494] vfs_get_tree+0x2d/0xc0 [66.823349] path_mount+0x48c/0xd30 [66.838753] __x64_sys_mount+0x108/0x140 [66.854412] do_syscall_64+0x38/0x50 [66.869673] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [66.885093] RIP: 0033:0x7f0138827f6e [66.945613] RSP: 002b:00007ffecd79edf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 [66.977214] RAX: ffffffffffffffda RBX: 00007f013894c264 RCX: 00007f0138827f6e [66.994266] RDX: 00005593a4a41360 RSI: 00005593a4a33690 RDI: 00005593a4a3a6c0 [67.011544] RBP: 00005593a4a33440 R08: 0000000000000000 R09: 0000000000000001 [67.028836] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [67.045812] R13: 00005593a4a3a6c0 R14: 00005593a4a41360 R15: 00005593a4a33440 [67.216138] ---[ end trace e114b111db64298c ]--- [67.237089] RIP: 0010:extent_io_tree_panic.isra.0+0x23/0x25 [btrfs] [67.325317] RSP: 0018:ffff93e5414c3948 EFLAGS: 00010246 [67.347946] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000 [67.371343] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff [67.394757] RBP: ffff93e5414c3948 R08: 0000000000000001 R09: 0000000000000001 [67.418409] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d728c0 [67.441906] R13: ffff8ec78bc71628 R14: ffff8ec782d72aa0 R15: 0000000002400000 [67.465436] FS: 00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000 [67.511660] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [67.535047] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0 [67.558449] PKRU: 55555554 [67.581146] note: mount[613] exited with preempt_count 2 The image has a chunk item which has a logical start 37748736 and length 18446744073701163008 (-8M). The calculated end 29360127 overflows. EEXIST was caught by insert_state() because of the duplicate end and extent_io_tree_panic() was called. Add overflow check of chunk item end to tree checker so it can be detected early at mount time. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208929 CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: tree-checker: annotate all error branches as unlikelyDavid Sterba1-160/+173
The tree checker is called many times as it verifies metadata at read/write time. The checks follow a simple pattern: if (error_condition) { report_error(); return -EUCLEAN; } All the error reporting functions are annotated as __cold that is supposed to hint the compiler to move the statement block out of the hot path. This does not seem to happen that often. As the error condition is expected to be false almost always, we can annotate it with 'unlikely' as this satisfies one of the few use cases for the annotation. The expected outcome is a stronger hint to compiler to reorder the checks test jump to exit test jump to exit ... which can be observed in asm of eg. check_dir_item, btrfs_check_chunk_valid, check_root_item or check_leaf. There's a measurable run time improvement reported by Josef, the testing workload went from 655 MiB/s to 677 MiB/s, which is about +3%. There should be no functional changes but some of the conditions have been rewritten to produce more readable result, some lines are longer than 80, for the sake of readability. Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: switch cached fs_info::csum_size from u16 to u32David Sterba1-1/+1
The fs_info value is 32bit, switch also the local u16 variables. This leads to a better assembly code generated due to movzwl. This simple change will shave some bytes on x86_64 and release config: text data bss dec hex filename 1090000 17980 14912 1122892 11224c pre/btrfs.ko 1089794 17980 14912 1122686 11217e post/btrfs.ko DELTA: -206 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: use cached value of fs_info::csum_size everywhereDavid Sterba1-1/+1
btrfs_get_16 shows up in the system performance profiles (helper to read 16bit values from on-disk structures). This is partially because of the checksum size that's frequently read along with data reads/writes, other u16 uses are from item size or directory entries. Replace all calls to btrfs_super_csum_size by the cached value from fs_info. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: add set/get accessors for root_item::drop_levelDavid Sterba1-2/+2
The drop_level member is used directly unlike all the other int types in root_item. Add the definition and use it everywhere. The type is u8 so there's no conversion necessary and the helpers are properly inlined, this is for consistency. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-23btrfs: tree-checker: add missing returns after data_ref alignment checksDavid Sterba1-0/+2
There are sectorsize alignment checks that are reported but then check_extent_data_ref continues. This was not intended, wrong alignment is not a minor problem and we should return with error. CC: stable@vger.kernel.org # 5.4+ Fixes: 0785a9aacf9d ("btrfs: tree-checker: Add EXTENT_DATA_REF check") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-13btrfs: tree-checker: add missing return after error in root_itemDaniel Xu1-0/+1
There's a missing return statement after an error is found in the root_item, this can cause further problems when a crafted image triggers the error. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210181 Fixes: 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-26btrfs: tree-checker: validate number of chunk stripes and parityDaniel Xu1-0/+18
If there's no parity and num_stripes < ncopies, a crafted image can trigger a division by zero in calc_stripe_length(). The image was generated through fuzzing. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209587 Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07btrfs: tree-checker: fix false alert caused by legacy btrfs root itemQu Wenruo1-5/+12
Commit 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check") introduced btrfs root item size check, however btrfs root item has two versions, the legacy one which just ends before generation_v2 member, is smaller than current btrfs root item size. This caused btrfs kernel to reject valid but old tree root leaves. Fix this problem by also allowing legacy root item, since kernel can already handle them pretty well and upgrade to newer root item format when needed. Reported-by: Martin Steigerwald <martin@lichtvoll.de> Fixes: 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check") CC: stable@vger.kernel.org # 5.4+ Tested-By: Martin Steigerwald <martin@lichtvoll.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-27btrfs: tree-checker: fix the error message for transid errorQu Wenruo1-1/+1
The error message for inode transid is the same as for inode generation, which makes us unable to detect the real problem. Reported-by: Tyler Richmond <t.d.richmond@gmail.com> Fixes: 496245cac57e ("btrfs: tree-checker: Verify inode item") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25btrfs: tree-checker: remove duplicate definition of 'inode_item_err'Zheng Wei1-4/+0
Remove the duplicate definition of 'inode_item_err' in the file tree-checker.c that got there by accident in c23c77b097dc ("btrfs: tree-checker: Refactor inode key check into seperate function"). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Zheng Wei <wei.zheng@vivo.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20btrfs: tree-checker: Verify location key for DIR_ITEM/DIR_INDEXQu Wenruo1-0/+21
[PROBLEM] There is a user report in the mail list, showing the following corrupted tree blocks: item 62 key (486836 DIR_ITEM 2543451757) itemoff 6273 itemsize 74 location key (4065004 INODE_ITEM 1073741824) type FILE transid 21397 data_len 0 name_len 44 name: FILENAME Note that location key, its offset should be 0 for all INODE_ITEMS. This caused failed lookup of the inode. [CAUSE] That offending value, 1073741824, is 0x40000000. So this looks like a memory bit flip. [FIX] This patch will enhance tree-checker to check location key of DIR_INDEX/DIR_ITEM/XATTR_ITEM. There are several different combinations needs to check: - item_key.type == DIR_INDEX/DIR_ITEM * location_key.type == BTRFS_INODE_ITEM_KEY This location_key should follow the check in inode_item check. * location_key.type == BTRFS_ROOT_ITEM_KEY Despite the existing check, DIR_INDEX/DIR_ITEM can only points to subvolume trees. * All other keys are not allowed. - item_key.type == XATTR_ITEM location_key should be all 0. Reported-by: Mike Gilbert <floppymaster@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20btrfs: tree-checker: Refactor root key check into separate functionQu Wenruo1-15/+47
ROOT_ITEM key check itself is not as simple as single line check, and will be reused for both ROOT_ITEM and DIR_ITEM/DIR_INDEX location key check, so refactor such check into check_root_key(). Also since we are here, fix a comment error about ROOT_ITEM offset, which is transid of snapshot creation, not some "older kernel behavior". Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20btrfs: tree-checker: Refactor inode key check into seperate functionQu Wenruo1-18/+60
Inode key check is not as easy as several lines, and it will be called in more than one location (INODE_ITEM check and DIR_ITEM/DIR_INDEX/XATTR_ITEM location key check). So here refactor such check into check_inode_key(). And add extra checks for XATTR_ITEM. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20btrfs: tree-checker: Clean up fs_info parameter from error message wrapperQu Wenruo1-13/+13
The @fs_info parameter can be extracted from extent_buffer structure, and there are already some wrappers getting rid of the @fs_info parameter. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20btrfs: tree-checker: Check leaf chunk item sizeQu Wenruo1-1/+39
Inspired by btrfs-progs github issue #208, where chunk item in chunk tree has invalid num_stripes (0). Although that can already be caught by current btrfs_check_chunk_valid(), that function doesn't really check item size as it needs to handle chunk item in super block sys_chunk_array(). This patch will add two extra checks for chunk items in chunk tree: - Basic chunk item size If the item is smaller than btrfs_chunk (which already contains one stripe), exit right now as reading num_stripes may even go beyond eb boundary. - Item size check against num_stripes If item size doesn't match with calculated chunk size, then either the item size or the num_stripes is corrupted. Error out anyway. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-13Btrfs: make tree checker detect checksum items with overlapping rangesFilipe Manana1-2/+16
Having checksum items, either on the checksums tree or in a log tree, that represent ranges that overlap each other is a sign of a corruption. Such case confuses the checksum lookup code and can result in not being able to find checksums or find stale checksums. So add a check for such case. This is motivated by a recent fix for a case where a log tree had checksum items covering ranges that overlap each other due to extent cloning, and resulted in missing checksums after replaying the log tree. It also helps detect past issues such as stale and outdated checksums due to overlapping, commit 27b9a8122ff71a ("Btrfs: fix csum tree corruption, duplicate and outdated checksums"). CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-13btrfs: tree-checker: Fix error format string for size_tAndreas Färber1-1/+1
Argument BTRFS_FILE_EXTENT_INLINE_DATA_START is defined as offsetof(), which returns type size_t, so we need %zu instead of %lu. This fixes a build warning on 32-bit ARM: ../fs/btrfs/tree-checker.c: In function 'check_extent_data_item': ../fs/btrfs/tree-checker.c:230:43: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Wformat=] 230 | "invalid item size, have %u expect [%lu, %u)", | ~~^ | long unsigned int | %u Fixes: 153a6d299956 ("btrfs: tree-checker: Check item size before reading file extent type") Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andreas Färber <afaerber@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: tree-checker: Check item size before reading file extent typeQu Wenruo1-0/+11
In check_extent_data_item(), we read file extent type without verifying if the item size is valid. Add such check to ensure the file extent type we read is correct. The check is not as accurate as we need to cover both inline and regular extents, so it only checks if the item size is larger or equal to inline header. So the existing size checks on inline/regular extents are still needed. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: rename block_group_item on-stack accessors to follow namingDavid Sterba1-5/+5
All accessors defined by BTRFS_SETGET_STACK_FUNCS contain _stack_ in the name, the block group ones were not following that scheme, so let's switch them. Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: use better definition of number of compression typeChengguang Xu1-2/+2
The compression type upper limit constant is the same as the last value and this is confusing. In order to keep coding style consistent, use BTRFS_NR_COMPRESS_TYPES as the total number that follows the idom of 'NR' being one more than the last value. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: use enum for extent type definesChengguang Xu1-2/+2
Use enum to replace macro definitions of extent types. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: tree-checker: Refactor prev_key check for ino into a functionQu Wenruo1-41/+72
Refactor the check for prev_key->objectid of the following key types into one function, check_prev_ino(): - EXTENT_DATA - INODE_REF - DIR_INDEX - DIR_ITEM - XATTR_ITEM Also add the check of prev_key for INODE_REF. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: use has_single_bit_set for clarityDavid Sterba1-7/+8
Replace is_power_of_2 with the helper that is self-documenting and remove the open coded call in alloc_profile_is_valid. Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: tree-checker: Add check for INODE_REFQu Wenruo1-0/+53
For INODE_REF we will check: - Objectid (ino) against previous key To detect missing INODE_ITEM. - No overflow/padding in the data payload Much like DIR_ITEM, but with less members to check. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: tree-checker: Try to detect missing INODE_ITEMQu Wenruo1-2/+25
For the following items, key->objectid is inode number: - DIR_ITEM - DIR_INDEX - XATTR_ITEM - EXTENT_DATA - INODE_REF So in the subvolume tree, such items must have its previous item share the same objectid, e.g.: (257 INODE_ITEM 0) (257 DIR_INDEX xxx) (257 DIR_ITEM xxx) (258 INODE_ITEM 0) (258 INODE_REF 0) (258 XATTR_ITEM 0) (258 EXTENT_DATA 0) But if we have the following sequence, then there is definitely something wrong, normally some INODE_ITEM is missing, like: (257 INODE_ITEM 0) (257 DIR_INDEX xxx) (257 DIR_ITEM xxx) (258 XATTR_ITEM 0) <<< objecitd suddenly changed to 258 (258 EXTENT_DATA 0) So just by checking the previous key for above inode based key types, we can detect a missing inode item. For INODE_REF key type, the check will be added along with INODE_REF checker. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-25btrfs: tree-checker: Fix wrong check on max devidQu Wenruo1-8/+0
[BUG] The following script will cause false alert on devid check. #!/bin/bash dev1=/dev/test/test dev2=/dev/test/scratch1 mnt=/mnt/btrfs umount $dev1 &> /dev/null umount $dev2 &> /dev/null umount $mnt &> /dev/null mkfs.btrfs -f $dev1 mount $dev1 $mnt _fail() { echo "!!! FAILED !!!" exit 1 } for ((i = 0; i < 4096; i++)); do btrfs dev add -f $dev2 $mnt || _fail btrfs dev del $dev1 $mnt || _fail dev_tmp=$dev1 dev1=$dev2 dev2=$dev_tmp done [CAUSE] Tree-checker uses BTRFS_MAX_DEVS() and BTRFS_MAX_DEVS_SYS_CHUNK() as upper limit for devid. But we can have devid holes just like above script. So the check for devid is incorrect and could cause false alert. [FIX] Just remove the whole devid check. We don't have any hard requirement for devid assignment. Furthermore, even devid could get corrupted by a bitflip, we still have dev extents verification at mount time, so corrupted data won't sneak in. This fixes fstests btrfs/194. Reported-by: Anand Jain <anand.jain@oracle.com> Fixes: ab4ba2e13346 ("btrfs: tree-checker: Verify dev item") CC: stable@vger.kernel.org # 5.2+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: Detect unbalanced tree with empty leaf before crashing btree operationsQu Wenruo1-0/+6
[BUG] With crafted image, btrfs will panic at btree operations: kernel BUG at fs/btrfs/ctree.c:3894! invalid opcode: 0000 [#1] SMP PTI CPU: 0 PID: 1138 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9 RIP: 0010:__push_leaf_left+0x6b6/0x6e0 RSP: 0018:ffffc0bd4128b990 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffa0a4ab8f0e38 RCX: 0000000000000000 RDX: ffffa0a280000000 RSI: 0000000000000000 RDI: ffffa0a4b3814000 RBP: ffffc0bd4128ba38 R08: 0000000000001000 R09: ffffc0bd4128b948 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000240 R13: ffffa0a4b556fb60 R14: ffffa0a4ab8f0af0 R15: ffffa0a4ab8f0af0 FS: 0000000000000000(0000) GS:ffffa0a4b7a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2461c80020 CR3: 000000022b32a006 CR4: 00000000000206f0 Call Trace: ? _cond_resched+0x1a/0x50 push_leaf_left+0x179/0x190 btrfs_del_items+0x316/0x470 btrfs_del_csums+0x215/0x3a0 __btrfs_free_extent.isra.72+0x5a7/0xbe0 __btrfs_run_delayed_refs+0x539/0x1120 btrfs_run_delayed_refs+0xdb/0x1b0 btrfs_commit_transaction+0x52/0x950 ? start_transaction+0x94/0x450 transaction_kthread+0x163/0x190 kthread+0x105/0x140 ? btrfs_cleanup_transaction+0x560/0x560 ? kthread_destroy_worker+0x50/0x50 ret_from_fork+0x35/0x40 Modules linked in: ---[ end trace c2425e6e89b5558f ]--- [CAUSE] The offending csum tree looks like this: checksum tree key (CSUM_TREE ROOT_ITEM 0) node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE ... key (EXTENT_CSUM EXTENT_CSUM 85975040) block 29630464 gen 17 key (EXTENT_CSUM EXTENT_CSUM 89911296) block 29642752 gen 17 <<< key (EXTENT_CSUM EXTENT_CSUM 92274688) block 29646848 gen 17 ... leaf 29630464 items 6 free space 1 generation 17 owner CSUM_TREE item 0 key (EXTENT_CSUM EXTENT_CSUM 85975040) itemoff 3987 itemsize 8 range start 85975040 end 85983232 length 8192 ... leaf 29642752 items 0 free space 3995 generation 17 owner 0 ^ empty leaf invalid owner ^ leaf 29646848 items 1 free space 602 generation 17 owner CSUM_TREE item 0 key (EXTENT_CSUM EXTENT_CSUM 92274688) itemoff 627 itemsize 3368 range start 92274688 end 95723520 length 3448832 So we have a corrupted csum tree where one tree leaf is completely empty, causing unbalanced btree, thus leading to unexpected btree balance error. [FIX] For this particular case, we handle it in two directions to catch it: - Check if the tree block is empty through btrfs_verify_level_key() So that invalid tree blocks won't be read out through btrfs_search_slot() and its variants. - Check 0 tree owner in tree checker NO tree is using 0 as its tree owner, detect it and reject at tree block read time. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202821 Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: tree-checker: Add EXTENT_DATA_REF checkQu Wenruo1-0/+48
EXTENT_DATA_REF is a little like DIR_ITEM which contains hash in its key->offset. This patch will check the following contents: - Key->objectid Basic alignment check. - Hash Hash of each extent_data_ref item must match key->offset. - Offset Basic alignment check. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: tree-checker: Add simple keyed refs checkQu Wenruo1-1/+39
For TREE_BLOCK_REF, SHARED_DATA_REF and SHARED_BLOCK_REF we need to check: | TREE_BLOCK_REF | SHARED_BLOCK_REF | SHARED_BLOCK_REF --------------+----------------+-----------------+------------------ key->objectid | Alignment | Alignment | Alignment key->offset | Any value | Alignment | Alignment item_size | 0 | 0 | sizeof(le32) (*) *: sizeof(struct btrfs_shared_data_ref) So introduce a check to check all these 3 key types together. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>