diff options
author | Filipe Manana <fdmanana@suse.com> | 2019-11-14 18:02:43 +0000 |
---|---|---|
committer | David Sterba <dsterba@suse.com> | 2019-11-18 18:07:55 +0100 |
commit | 042528f8d840f42903d91d47e28c0e29da90c2d6 (patch) | |
tree | a5a5f6b1defc8b0618efb482e8b51001af1e8602 /fs/btrfs/scrub.c | |
parent | b12de52896c0e8213f70e3a168fde9e6eee95909 (diff) | |
download | linux-042528f8d840f42903d91d47e28c0e29da90c2d6.tar.bz2 |
Btrfs: fix block group remaining RO forever after error during device replace
When doing a device replace, while at scrub.c:scrub_enumerate_chunks(), we
set the block group to RO mode and then wait for any ongoing writes into
extents of the block group to complete. While doing that wait we overwrite
the value of the variable 'ret' and can break out of the loop if an error
happens without turning the block group back into RW mode. So what happens
is the following:
1) btrfs_inc_block_group_ro() returns 0, meaning it set the block group
to RO mode (its ->ro field set to 1 or incremented to some value > 1);
2) Then btrfs_wait_ordered_roots() returns a value > 0;
3) Then if either joining or committing the transaction fails, we break
out of the loop wihtout calling btrfs_dec_block_group_ro(), leaving
the block group in RO mode forever.
To fix this, just remove the code that waits for ongoing writes to extents
of the block group, since it's not needed because in the initial setup
phase of a device replace operation, before starting to find all chunks
and their extents, we set the target device for replace while holding
fs_info->dev_replace->rwsem, which ensures that after releasing that
semaphore, any writes into the source device are made to the target device
as well (__btrfs_map_block() guarantees that). So while at
scrub_enumerate_chunks() we only need to worry about finding and copying
extents (from the source device to the target device) that were written
before we started the device replace operation.
Fixes: f0e9b7d6401959 ("Btrfs: fix race setting block group readonly during device replace")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'fs/btrfs/scrub.c')
-rw-r--r-- | fs/btrfs/scrub.c | 39 |
1 files changed, 0 insertions, 39 deletions
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index bd3f2266c5f4..21de630b0730 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3579,45 +3579,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, * of btrfs_super_block::sys_chunk_array */ ret = btrfs_inc_block_group_ro(cache, false); - if (!ret && sctx->is_dev_replace) { - /* - * If we are doing a device replace wait for any tasks - * that started delalloc right before we set the block - * group to RO mode, as they might have just allocated - * an extent from it or decided they could do a nocow - * write. And if any such tasks did that, wait for their - * ordered extents to complete and then commit the - * current transaction, so that we can later see the new - * extent items in the extent tree - the ordered extents - * create delayed data references (for cow writes) when - * they complete, which will be run and insert the - * corresponding extent items into the extent tree when - * we commit the transaction they used when running - * inode.c:btrfs_finish_ordered_io(). We later use - * the commit root of the extent tree to find extents - * to copy from the srcdev into the tgtdev, and we don't - * want to miss any new extents. - */ - btrfs_wait_block_group_reservations(cache); - btrfs_wait_nocow_writers(cache); - ret = btrfs_wait_ordered_roots(fs_info, U64_MAX, - cache->start, - cache->length); - if (ret > 0) { - struct btrfs_trans_handle *trans; - - trans = btrfs_join_transaction(root); - if (IS_ERR(trans)) - ret = PTR_ERR(trans); - else - ret = btrfs_commit_transaction(trans); - if (ret) { - scrub_pause_off(fs_info); - btrfs_put_block_group(cache); - break; - } - } - } scrub_pause_off(fs_info); if (ret == 0) { |