summaryrefslogtreecommitdiffstats
path: root/fs/btrfs/disk-io.c
AgeCommit message (Collapse)AuthorFilesLines
2015-06-30Btrfs: fix crash on close_ctree() if cleaner starts new transactionFilipe Manana1-0/+29
Often when running fstests btrfs/079 I was running into the following trace during umount on one of my qemu/kvm test vms: [ 8245.682441] WARNING: CPU: 8 PID: 25064 at fs/btrfs/extent-tree.c:138 btrfs_put_block_group+0x51/0x69 [btrfs]() [ 8245.685039] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 acpi_cpufreq processor psmouse i2c_core thermal_sys parport evdev serio_raw button pcspkr microcode ext4 crc16 jbd2 mbcache sg sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring scsi_mod virtio e1000 [last unloaded: btrfs] [ 8245.693860] CPU: 8 PID: 25064 Comm: umount Tainted: G W 4.1.0-rc5-btrfs-next-10+ #1 [ 8245.695081] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [ 8245.697583] 0000000000000009 ffff88020d047ce8 ffffffff8145eec7 ffffffff81095dce [ 8245.699234] 0000000000000000 ffff88020d047d28 ffffffff8104b399 0000000000000028 [ 8245.700995] ffffffffa04db07b ffff8801c6036c00 ffff8801c6036d68 ffff880202eb40b0 [ 8245.702510] Call Trace: [ 8245.703006] [<ffffffff8145eec7>] dump_stack+0x4f/0x7b [ 8245.705393] [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2 [ 8245.706569] [<ffffffff8104b399>] warn_slowpath_common+0xa1/0xbb [ 8245.707747] [<ffffffffa04db07b>] ? btrfs_put_block_group+0x51/0x69 [btrfs] [ 8245.709101] [<ffffffff8104b456>] warn_slowpath_null+0x1a/0x1c [ 8245.710274] [<ffffffffa04db07b>] btrfs_put_block_group+0x51/0x69 [btrfs] [ 8245.711823] [<ffffffffa04e3473>] btrfs_free_block_groups+0x145/0x322 [btrfs] [ 8245.713251] [<ffffffffa04ef31a>] close_ctree+0x1ef/0x325 [btrfs] [ 8245.714448] [<ffffffff8117d26e>] ? evict_inodes+0xdc/0xeb [ 8245.715539] [<ffffffffa04cb3ad>] btrfs_put_super+0x19/0x1b [btrfs] [ 8245.716835] [<ffffffff81167607>] generic_shutdown_super+0x73/0xef [ 8245.718015] [<ffffffff81167a3a>] kill_anon_super+0x13/0x1e [ 8245.719101] [<ffffffffa04cb1b6>] btrfs_kill_super+0x17/0x23 [btrfs] [ 8245.720316] [<ffffffff81167544>] deactivate_locked_super+0x3b/0x68 [ 8245.721517] [<ffffffff81167dd6>] deactivate_super+0x3f/0x43 [ 8245.722581] [<ffffffff8117fbb9>] cleanup_mnt+0x59/0x78 [ 8245.723538] [<ffffffff8117fc18>] __cleanup_mnt+0x12/0x14 [ 8245.724572] [<ffffffff81065371>] task_work_run+0x8f/0xbc [ 8245.725598] [<ffffffff810028fb>] do_notify_resume+0x45/0x53 [ 8245.726892] [<ffffffff814651ac>] int_signal+0x12/0x17 [ 8245.737887] ---[ end trace a01d038397e99b92 ]--- [ 8245.769363] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 8245.770737] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 acpi_cpufreq processor psmouse i2c_core thermal_sys parport evdev serio_raw button pcspkr microcode ext4 crc16 jbd2 mbcache sg sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring scsi_mod virtio e1000 [last unloaded: btrfs] [ 8245.772641] CPU: 2 PID: 25064 Comm: umount Tainted: G W 4.1.0-rc5-btrfs-next-10+ #1 [ 8245.772641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [ 8245.772641] task: ffff880013005810 ti: ffff88020d044000 task.ti: ffff88020d044000 [ 8245.772641] RIP: 0010:[<ffffffffa051c8e6>] [<ffffffffa051c8e6>] btrfs_queue_work+0x2c/0x14d [btrfs] [ 8245.772641] RSP: 0018:ffff88020d0478b8 EFLAGS: 00010202 [ 8245.772641] RAX: 0000000000000004 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffffa0581488 [ 8245.772641] RDX: 0000000000000000 RSI: ffff880194b7bf48 RDI: ffff880144b6a7a0 [ 8245.772641] RBP: ffff88020d0478d8 R08: 0000000000000000 R09: 000000000000ffff [ 8245.772641] R10: 0000000000000004 R11: 0000000000000005 R12: ffff880194b7bf48 [ 8245.772641] R13: ffff880194b7bf48 R14: 0000000000000410 R15: 0000000000000000 [ 8245.772641] FS: 00007f991e77d840(0000) GS:ffff88023e280000(0000) knlGS:0000000000000000 [ 8245.772641] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 8245.772641] CR2: 00007fbbd325ee68 CR3: 000000021de8e000 CR4: 00000000000006e0 [ 8245.772641] Stack: [ 8245.772641] ffff880194b7bf00 ffff880202eb4000 ffff880194b7bf48 0000000000000410 [ 8245.772641] ffff88020d047958 ffffffffa04ec6d5 ffff8801629b2ee8 0000000082987570 [ 8245.772641] 0000000000a5813f 0000000000000001 ffff880013006100 0000000000000002 [ 8245.772641] Call Trace: [ 8245.772641] [<ffffffffa04ec6d5>] btrfs_wq_submit_bio+0xe1/0x17b [btrfs] [ 8245.772641] [<ffffffff81086bff>] ? check_irq_usage+0x76/0x87 [ 8245.772641] [<ffffffffa04ec825>] btree_submit_bio_hook+0xb6/0xd9 [btrfs] [ 8245.772641] [<ffffffffa04ebb7c>] ? btree_csum_one_bio+0xad/0xad [btrfs] [ 8245.772641] [<ffffffffa04eb1a6>] ? btree_io_failed_hook+0x5e/0x5e [btrfs] [ 8245.772641] [<ffffffffa050a6e7>] submit_one_bio+0x8c/0xc7 [btrfs] [ 8245.772641] [<ffffffffa050d75b>] submit_extent_page.isra.18+0x9d/0x186 [btrfs] [ 8245.772641] [<ffffffffa050d95b>] write_one_eb+0x117/0x1ae [btrfs] [ 8245.772641] [<ffffffffa050a79b>] ? end_extent_buffer_writeback+0x21/0x21 [btrfs] [ 8245.772641] [<ffffffffa0510510>] btree_write_cache_pages+0x2ab/0x385 [btrfs] [ 8245.772641] [<ffffffffa04eb2b8>] btree_writepages+0x23/0x5c [btrfs] [ 8245.772641] [<ffffffff8111c661>] do_writepages+0x23/0x2c [ 8245.772641] [<ffffffff81189cd4>] __writeback_single_inode+0xda/0x5bd [ 8245.772641] [<ffffffff8118aa60>] ? writeback_single_inode+0x2b/0x173 [ 8245.772641] [<ffffffff8118aafd>] writeback_single_inode+0xc8/0x173 [ 8245.772641] [<ffffffff8118ac95>] write_inode_now+0x8a/0x95 [ 8245.772641] [<ffffffff81247bf0>] ? _atomic_dec_and_lock+0x30/0x4e [ 8245.772641] [<ffffffff8117cc5e>] iput+0x17d/0x26a [ 8245.772641] [<ffffffffa04ef355>] close_ctree+0x22a/0x325 [btrfs] [ 8245.772641] [<ffffffff8117d26e>] ? evict_inodes+0xdc/0xeb [ 8245.772641] [<ffffffffa04cb3ad>] btrfs_put_super+0x19/0x1b [btrfs] [ 8245.772641] [<ffffffff81167607>] generic_shutdown_super+0x73/0xef [ 8245.772641] [<ffffffff81167a3a>] kill_anon_super+0x13/0x1e [ 8245.772641] [<ffffffffa04cb1b6>] btrfs_kill_super+0x17/0x23 [btrfs] [ 8245.772641] [<ffffffff81167544>] deactivate_locked_super+0x3b/0x68 [ 8245.772641] [<ffffffff81167dd6>] deactivate_super+0x3f/0x43 [ 8245.772641] [<ffffffff8117fbb9>] cleanup_mnt+0x59/0x78 [ 8245.772641] [<ffffffff8117fc18>] __cleanup_mnt+0x12/0x14 [ 8245.772641] [<ffffffff81065371>] task_work_run+0x8f/0xbc [ 8245.772641] [<ffffffff810028fb>] do_notify_resume+0x45/0x53 [ 8245.772641] [<ffffffff814651ac>] int_signal+0x12/0x17 [ 8245.772641] Code: 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 f4 48 8b 46 70 a8 04 74 09 48 8b 5f 08 48 85 db 75 03 48 8b 1f 49 89 5c 24 68 <83> 7b 5c ff 74 04 f0 ff 43 50 49 83 7c 24 08 00 74 2c 4c 8d 6b [ 8245.772641] RIP [<ffffffffa051c8e6>] btrfs_queue_work+0x2c/0x14d [btrfs] [ 8245.772641] RSP <ffff88020d0478b8> [ 8245.845040] ---[ end trace a01d038397e99b93 ]--- For logical reasons such as the phase of the moon, this happened more often with "-o inode_cache" than without any mount options. After some debugging it turned out to be simple to understand what was happening: 1) close_ctree() is called; 2) It then stops the transaction kthread, which commits the current transaction; 3) It asks the cleaner kthread to stop, which is currently running btrfs_delete_unused_bgs(); 4) btrfs_delete_unused_bgs() finds an unused block group, starts a new transaction, deletes the block group, which implies COWing some tree nodes and leafs and dirtying their respective pages, and then finally it ends the transaction it started, without committing it; 5) The cleaner kthread stops; 6) close_ctree() releases (from memory) the block group objects, which produces the warning in the trace pasted above; 7) Then it invalidates all pages of the btree inode, by calling invalidate_inode_pages2(), which waits for any pages under writeback, and releases any non-dirty pages; 8) All work queues are destroyed (waiting first for their current tasks to finish execution); 9) A final iput() is called against the btree inode; 10) This iput triggers a writeback of the btree inode because it still has dirty pages; 11) This starts the whole chain of callbacks for the btree inode until it eventually reaches btrfs_wq_submit_bio() where it leads to a NULL pointer dereference because the work queues were already destroyed. Fix this by making the cleaner commit any transaction that it started after the transaction kthread was stopped. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-30Btrfs: fix race between balance and unused block group deletionFilipe Manana1-1/+11
We have a race between deleting an unused block group and balancing the same block group that leads to an assertion failure/BUG(), producing the following trace: [181631.208236] BTRFS: assertion failed: 0, file: fs/btrfs/volumes.c, line: 2622 [181631.220591] ------------[ cut here ]------------ [181631.222959] kernel BUG at fs/btrfs/ctree.h:4062! [181631.223932] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [181631.224566] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse acpi_cpufreq parpor$ [181631.224566] CPU: 8 PID: 17451 Comm: btrfs Tainted: G W 4.1.0-rc5-btrfs-next-10+ #1 [181631.224566] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [181631.224566] task: ffff880127e09590 ti: ffff8800b5824000 task.ti: ffff8800b5824000 [181631.224566] RIP: 0010:[<ffffffffa03f19f6>] [<ffffffffa03f19f6>] assfail.constprop.50+0x1e/0x20 [btrfs] [181631.224566] RSP: 0018:ffff8800b5827ae8 EFLAGS: 00010246 [181631.224566] RAX: 0000000000000040 RBX: ffff8800109fc218 RCX: ffffffff81095dce [181631.224566] RDX: 0000000000005124 RSI: ffffffff81464819 RDI: 00000000ffffffff [181631.224566] RBP: ffff8800b5827ae8 R08: 0000000000000001 R09: 0000000000000000 [181631.224566] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800109fc200 [181631.224566] R13: ffff880020095000 R14: ffff8800b1a13f38 R15: ffff880020095000 [181631.224566] FS: 00007f70ca0b0c80(0000) GS:ffff88013ec00000(0000) knlGS:0000000000000000 [181631.224566] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [181631.224566] CR2: 00007f2872ab6e68 CR3: 00000000a717c000 CR4: 00000000000006e0 [181631.224566] Stack: [181631.224566] ffff8800b5827ba8 ffffffffa03f3916 ffff8800b5827b38 ffffffffa03d080e [181631.224566] ffffffffa03d1423 ffff880020095000 ffff88001233c000 0000000000000001 [181631.224566] ffff880020095000 ffff8800b1a13f38 0000000a69c00000 0000000000000000 [181631.224566] Call Trace: [181631.224566] [<ffffffffa03f3916>] btrfs_remove_chunk+0xa4/0x6bb [btrfs] [181631.224566] [<ffffffffa03d080e>] ? join_transaction.isra.8+0xb9/0x3ba [btrfs] [181631.224566] [<ffffffffa03d1423>] ? wait_current_trans.isra.13+0x22/0xfc [btrfs] [181631.224566] [<ffffffffa03f3fbc>] btrfs_relocate_chunk.isra.29+0x8f/0xa7 [btrfs] [181631.224566] [<ffffffffa03f54df>] btrfs_balance+0xaa4/0xc52 [btrfs] [181631.224566] [<ffffffffa03fd388>] btrfs_ioctl_balance+0x23f/0x2b0 [btrfs] [181631.224566] [<ffffffff810872f9>] ? trace_hardirqs_on+0xd/0xf [181631.224566] [<ffffffffa04019a3>] btrfs_ioctl+0xfe2/0x2220 [btrfs] [181631.224566] [<ffffffff812603ed>] ? __this_cpu_preempt_check+0x13/0x15 [181631.224566] [<ffffffff81084669>] ? arch_local_irq_save+0x9/0xc [181631.224566] [<ffffffff81138def>] ? handle_mm_fault+0x834/0xcd2 [181631.224566] [<ffffffff81138def>] ? handle_mm_fault+0x834/0xcd2 [181631.224566] [<ffffffff8103e48c>] ? __do_page_fault+0x211/0x424 [181631.224566] [<ffffffff811755e6>] do_vfs_ioctl+0x3c6/0x479 (...) The sequence of steps leading to this are: CPU 0 CPU 1 btrfs_balance() btrfs_relocate_chunk() btrfs_relocate_block_group(bg X) btrfs_lookup_block_group(bg X) cleaner_kthread locks fs_info->cleaner_mutex btrfs_delete_unused_bgs() finds bg X, which became unused in the previous transaction checks bg X ->ro == 0, so it proceeds sets bg X ->ro to 1 (btrfs_set_block_group_ro(bg X)) blocks on fs_info->cleaner_mutex btrfs_remove_chunk(bg X) unlocks fs_info->cleaner_mutex acquires fs_info->cleaner_mutex relocate_block_group() --> does nothing, no extents found in the extent tree from bg X unlocks fs_info->cleaner_mutex btrfs_relocate_block_group(bg X) returns btrfs_remove_chunk(bg X) extent map not found --> ASSERT(0) Fix this by using a new mutex to make sure these 2 operations, block group relocation and removal, are serialized. This issue is reproducible by running fstests generic/038 (which stresses chunk allocation and automatic removal of unused block groups) together with the following balance loop: while true; do btrfs balance start -dusage=0 <mountpoint> ; done Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-23Merge branch 'sysfs-fsdevices-4.2-part1' of ↵Chris Mason1-2/+17
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into anand
2015-06-12Btrfs: fix use-after-free in btrfs_replay_logLiu Bo1-1/+2
@log_root_tree should not be referenced after kfree. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: delayed-ref: Use list to replace the ref_root in ref_head.Qu Wenruo1-4/+4
This patch replace the rbtree used in ref_head to list. This has the following advantage: 1) Easier merge logic. With the new list implement, we only need to care merging the tail ref_node with the new ref_node. And this can be done quite easy at insert time, no need to do a indicated merge at run_delayed_refs(). Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-03Btrfs: fix up read_tree_block to return proper errorLiu Bo1-13/+15
The return value of read_tree_block() can confuse callers as it always returns NULL for either -ENOMEM or -EIO, so it's likely that callers parse it to a wrong error, for instance, in btrfs_read_tree_root(). This fixes the above issue. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2015-05-27Btrfs: sysfs: separate kobject and attribute creationAnand Jain1-1/+17
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
2015-05-27Btrfs: sysfs: fix, fs_info kobject_unregister has init_completion() twiceAnand Jain1-1/+0
kobject_unregister is to handle the release of the kobject, its completion init is being called in btrfs_sysfs_add_one(), so we don't have to do the same in the open_ctree() again. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
2015-04-13btrfs: Fix NO_SPACE bug caused by delayed-iputZhao Lei1-1/+2
Steps to reproduce: while true; do dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%] rm /btrfs_dir/file sync done And we'll see dd failed because btrfs return NO_SPACE. Reason: Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs() in end to free fs space for next write, but sometimes it hadn't done work on time, because btrfs-cleaner thread get delayed-iputs from list before, but do iput() after next write. This is log: [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin [ 2569.084280] comm=sync func=btrfs_commit_transaction() call btrfs_run_delayed_iputs() [ 2569.085418] comm=sync func=btrfs_commit_transaction() done btrfs_run_delayed_iputs() [ 2569.087554] comm=sync func=btrfs_commit_transaction() end [ 2569.191081] comm=dd begin [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28 [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 32677888 [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 = 56512512 ... [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 21762048 = 965738496 [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end Fix: Make btrfs_commit_transaction() wait current running btrfs-cleaner's delayed-iputs() done in end. Test: Use script similar to above(more complex), before patch: 7 failed in 100 * 20 loop. after patch: 0 failed in 100 * 20 loop. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-04-10Btrfs: fix use after free when close_ctree frees the orphan_rsvChris Mason1-1/+1
Near the end of close_ctree, we're calling btrfs_free_block_rsv to free up the orphan rsv. The problem is this call updates the space_info, which has already been freed. This adds a new __ function that directly calls kfree instead of trying to update the space infos. Signed-off-by: Chris Mason <clm@fb.com>
2015-04-10Btrfs: allow block group cache writeout outside critical section in commitChris Mason1-0/+1
We loop through all of the dirty block groups during commit and write the free space cache. In order to make sure the cache is currect, we do this while no other writers are allowed in the commit. If a large number of block groups are dirty, this can introduce long stalls during the final stages of the commit, which can block new procs trying to change the filesystem. This commit changes the block group cache writeout to take appropriate locks and allow it to run earlier in the commit. We'll still have to redo some of the block groups, but it means we can get most of the work out of the way without blocking the entire FS. Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26Btrfs: Remove the check for old-style mkfsLiu Bo1-6/+0
This was used to make sure that a fresh btrfs from an older mkfs.btrfs, but it also allows us to mount a buggy btrfs if this btrfs has the right superblock head part but has something wrong with chunk tree part[1], and after that we can hit BUG_ON()s set in the code to prevent something impossible. Since David has released "Btrfs progs v3.19-rc2", just remove the check, if anyone who wants to make a fresh btrfs, please use the latest one. [1]: http://www.spinics.net/lists/linux-btrfs/msg42358.html Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Omar Sandoval <osandov@osandov.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2015-03-25Merge branch 'cleanups-post-3.19' of ↵Chris Mason1-253/+301
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1 Signed-off-by: Chris Mason <clm@fb.com> Conflicts: fs/btrfs/disk-io.c
2015-03-25Merge branch 'cleanups-for-4.1-v2' of ↵Chris Mason1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1
2015-03-21Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Most of these are fixing extent reservation accounting, or corners with tree writeback during commit. Josef's set does add a test, which isn't strictly a fix, but it'll keep us from making this same mistake again" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix outstanding_extents accounting in DIO Btrfs: add sanity test for outstanding_extents accounting Btrfs: just free dummy extent buffers Btrfs: account merges/splits properly Btrfs: prepare block group cache before writing Btrfs: fix ASSERT(list_empty(&cur_trans->dirty_bgs_list) Btrfs: account for the correct number of extents for delalloc reservations Btrfs: fix merge delalloc logic Btrfs: fix comp_oper to get right order Btrfs: catch transaction abortion after waiting for it btrfs: fix sizeof format specifier in btrfs_check_super_valid()
2015-03-13btrfs: fix sizeof format specifier in btrfs_check_super_valid()Fabian Frederick1-1/+1
This patch fixes mips compilation warning: fs/btrfs/disk-io.c: In function 'btrfs_check_super_valid': fs/btrfs/disk-io.c:3927:21: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat] Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Chris Mason <clm@fb.com>
2015-03-03btrfs: cleanup, use kmalloc_array/kcalloc array helpersDavid Sterba1-1/+1
Convert kmalloc(nr * size, ..) to kmalloc_array that does additional overflow checks, the zeroing variant is kcalloc. Signed-off-by: David Sterba <dsterba@suse.cz>
2015-03-03btrfs: cleanup 64bit/32bit divs, compile time constantsDavid Sterba1-1/+1
Switch to div_u64 if the divisor is a numeric constant or sum of sizeof()s. We can remove a few instances of do_div that has the hidden semtantics of changing the 1st argument. Small power-of-two divisors are converted to bitshifts, large values are kept intact for clarity. Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-19Merge branch 'for-linus' of ↵Linus Torvalds1-27/+75
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "This pull is mostly cleanups and fixes: - The raid5/6 cleanups from Zhao Lei fixup some long standing warts in the code and add improvements on top of the scrubbing support from 3.19. - Josef has round one of our ENOSPC fixes coming from large btrfs clusters here at FB. - Dave Sterba continues a long series of cleanups (thanks Dave), and Filipe continues hammering on corner cases in fsync and others This all was held up a little trying to track down a use-after-free in btrfs raid5/6. It's not clear yet if this is just made easier to trigger with this pull or if its a new bug from the raid5/6 cleanups. Dave Sterba is the only one to trigger it so far, but he has a consistent way to reproduce, so we'll get it nailed shortly" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (68 commits) Btrfs: don't remove extents and xattrs when logging new names Btrfs: fix fsync data loss after adding hard link to inode Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group Btrfs: account for large extents with enospc Btrfs: don't set and clear delalloc for O_DIRECT writes Btrfs: only adjust outstanding_extents when we do a short write btrfs: Fix out-of-space bug Btrfs: scrub, fix sleep in atomic context Btrfs: fix scheduler warning when syncing log Btrfs: Remove unnecessary placeholder in btrfs_err_code btrfs: cleanup init for list in free-space-cache btrfs: delete chunk allocation attemp when setting block group ro btrfs: clear bio reference after submit_one_bio() Btrfs: fix scrub race leading to use-after-free Btrfs: add missing cleanup on sysfs init failure Btrfs: fix race between transaction commit and empty block group removal btrfs: add more checks to btrfs_read_sys_array btrfs: cleanup, rename a few variables in btrfs_read_sys_array btrfs: add checks for sys_chunk_array sizes btrfs: more superblock checks, lower bounds on devices and sectorsize/nodesize ...
2015-02-16btrfs: cleanup, reduce temporary variables in btrfs_read_rootsDavid Sterba1-29/+25
Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16btrfs: use correct type for workqueue flagsDavid Sterba1-1/+1
Through all the local wrappers to alloc_workqueue, __alloc_workqueue_key takes an unsigned int. Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16btrfs: factor btrfs_read_roots() out of open_ctree()Eric Sandeen1-65/+66
Also, remove the two local variables create_uuid_tree and check_uuid_tree; we can use the existence of the uuid root and/or the RESCAN_UUID_TREE flag to determine what action to take. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16btrfs: factor btrfs_replay_log() out of open_ctree()Eric Sandeen1-40/+53
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16btrfs: factor btrfs_init_workqueues() out of open_ctree()Eric Sandeen1-70/+83
Signed-off-by: Eric Sandeen <sandeen@redhat.com> [renamed to btrfs_init_workqueues] Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16btrfs: factor btrfs_init_qgroup() out of open_ctree()Eric Sandeen1-11/+15
Signed-off-by: Eric Sandeen <sandeen@redhat.com> [renamed to btrfs_init_qgroup] Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16btrfs: factor btrfs_init_dev_replace_locks() out of open_ctree()Eric Sandeen1-6/+12
Signed-off-by: Eric Sandeen <sandeen@redhat.com> [renamed to btrfs_init_dev_replace_locks] Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16btrfs: factor btrfs_init_btree_inode() out of open_ctree()Eric Sandeen1-25/+31
Signed-off-by: Eric Sandeen <sandeen@redhat.com> [renamed to btrfs_init_btree_inode] Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16btrfs: factor btrfs_init_balance() out of open_ctree()Eric Sandeen1-8/+12
Signed-off-by: Eric Sandeen <sandeen@redhat.com> [renamed to btrfs_init_balance] Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16btrfs: factor btrfs_init_scrub() out of open_ctree()Eric Sandeen1-7/+12
Signed-off-by: Eric Sandeen <sandeen@redhat.com> [renamed to btrfs_init_scrub] Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16btrfs: consistently use fs_info in close_ctree()Eric Sandeen1-3/+3
close_ctree() has a local fs_info var for convienience; use it consistently. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16btrfs: remove unused fs_info arg from btrfs_close_extra_devices()Eric Sandeen1-2/+2
The commit: 8dabb74 Btrfs: change core code of btrfs to support the device replace operations added the fs_info argument, but never used it - just remove it again. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16btrfs: fix sizeof format specifier in btrfs_check_super_valid()Fabian Frederick1-1/+1
This patch fixes mips compilation warning: fs/btrfs/disk-io.c: In function 'btrfs_check_super_valid': fs/btrfs/disk-io.c:3927:21: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat] Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16btrfs: constify structs with op functions or static definitionsDavid Sterba1-2/+2
There are some op tables that can be easily made const, similarly the sysfs feature and raid tables. This is motivated by PaX CONSTIFY plugin. Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16Btrfs: disk-io: replace root args iff only fs_info usedDaniel Dressler1-16/+16
This is the 3rd independent patch of a larger project to cleanup btrfs's internal usage of btrfs_root. Many functions take btrfs_root only to grab the fs_info struct. By requiring a root these functions cause programmer overhead. That these functions can accept any valid root is not obvious until inspection. This patch reduces the specificity of such functions to accept the fs_info directly. These patches can be applied independently and thus are not being submitted as a patch series. There should be about 26 patches by the project's completion. Each patch will cleanup between 1 and 34 functions apiece. Each patch covers a single file's functions. This patch affects the following function(s): 1) csum_tree_block 2) csum_dirty_buffer 3) check_tree_block_fsid 4) btrfs_find_tree_block 5) clean_tree_block Signed-off-by: Daniel Dressler <danieru.dressler@gmail.com> Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-02Btrfs: fix race between transaction commit and empty block group removalFilipe Manana1-0/+1
Committing a transaction can race with automatic removal of empty block groups (cleaner kthread), leading to a BUG_ON() in the transaction commit code while running btrfs_finish_extent_commit(). The following sequence diagram shows how it can happen: CPU 1 CPU 2 btrfs_commit_transaction() fs_info->running_transaction = NULL btrfs_finish_extent_commit() find_first_extent_bit() -> found range for block group X in fs_info->freed_extents[] btrfs_delete_unused_bgs() -> found block group X Removed block group X's range from fs_info->freed_extents[] btrfs_remove_chunk() btrfs_remove_block_group(bg X) unpin_extent_range(bg X range) btrfs_lookup_block_group(bg X) -> returns NULL -> BUG_ON() The trace that results from the BUG_ON() is: [48665.187808] ------------[ cut here ]------------ [48665.188032] kernel BUG at fs/btrfs/extent-tree.c:5675! [48665.188032] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [48665.188032] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop parport_pc evdev microcode [48665.197388] CPU: 2 PID: 31211 Comm: kworker/u32:16 Tainted: G W 3.19.0-rc5-btrfs-next-4+ #1 [48665.197388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [48665.197388] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] [48665.197388] task: ffff880222011810 ti: ffff8801b56a4000 task.ti: ffff8801b56a4000 [48665.197388] RIP: 0010:[<ffffffffa0350d05>] [<ffffffffa0350d05>] unpin_extent_range+0x6a/0x1ba [btrfs] [48665.197388] RSP: 0018:ffff8801b56a7b88 EFLAGS: 00010246 [48665.197388] RAX: 0000000000000000 RBX: ffff8802143a6000 RCX: ffff8802220120c8 [48665.197388] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8800a3c140b0 [48665.197388] RBP: ffff8801b56a7bd8 R08: 0000000000000003 R09: 0000000000000000 [48665.197388] R10: 0000000000000000 R11: 000000000000bbac R12: 0000000012e8e000 [48665.197388] R13: ffff8800a3c14000 R14: 0000000000000000 R15: 0000000000000000 [48665.197388] FS: 0000000000000000(0000) GS:ffff88023ec40000(0000) knlGS:0000000000000000 [48665.197388] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [48665.197388] CR2: 00007f065e42f270 CR3: 0000000206f70000 CR4: 00000000000006e0 [48665.197388] Stack: [48665.197388] ffff8801b56a7bd8 0000000012ea0000 01ff8800a3c14138 0000000012e9ffff [48665.197388] ffff880141df3dd8 ffff8802143a6000 ffff8800a3c14138 ffff880141df3df0 [48665.197388] ffff880141df3dd8 0000000000000000 ffff8801b56a7c08 ffffffffa0354227 [48665.197388] Call Trace: [48665.197388] [<ffffffffa0354227>] btrfs_finish_extent_commit+0xb0/0xd9 [btrfs] [48665.197388] [<ffffffffa0366b4b>] btrfs_commit_transaction+0x791/0x92c [btrfs] [48665.197388] [<ffffffffa0352432>] flush_space+0x43d/0x452 [btrfs] [48665.197388] [<ffffffff814295c3>] ? _raw_spin_unlock+0x28/0x33 [48665.197388] [<ffffffffa035255f>] btrfs_async_reclaim_metadata_space+0x118/0x164 [btrfs] [48665.197388] [<ffffffff81059917>] ? process_one_work+0x14b/0x3ab [48665.197388] [<ffffffff810599ac>] process_one_work+0x1e0/0x3ab [48665.197388] [<ffffffff81079fa9>] ? trace_hardirqs_off+0xd/0xf [48665.197388] [<ffffffff8105a55b>] worker_thread+0x210/0x2d0 [48665.197388] [<ffffffff8105a34b>] ? rescuer_thread+0x2c3/0x2c3 [48665.197388] [<ffffffff8105e5c0>] kthread+0xef/0xf7 [48665.197388] [<ffffffff81429682>] ? _raw_spin_unlock_irq+0x2d/0x39 [48665.197388] [<ffffffff8105e4d1>] ? __kthread_parkme+0xad/0xad [48665.197388] [<ffffffff81429dec>] ret_from_fork+0x7c/0xb0 [48665.197388] [<ffffffff8105e4d1>] ? __kthread_parkme+0xad/0xad [48665.197388] Code: 85 f6 74 14 49 8b 06 49 03 46 09 49 39 c4 72 1d 4c 89 f7 e8 83 ec ff ff 4c 89 e6 4c 89 ef e8 1e f1 ff ff 48 85 c0 49 89 c6 75 02 <0f> 0b 49 8b 1e 49 03 5e 09 48 8b [48665.197388] RIP [<ffffffffa0350d05>] unpin_extent_range+0x6a/0x1ba [btrfs] [48665.197388] RSP <ffff8801b56a7b88> [48665.272246] ---[ end trace b9c6ab9957521376 ]--- Fix this by ensuring that unpining the block group's range in btrfs_finish_extent_commit() is done in a synchronized fashion with removing the block group's range from freed_extents[] in btrfs_delete_unused_bgs() This race got introduced with the change: Btrfs: remove empty block groups automatically commit 47ab2a6c689913db23ccae38349714edf8365e0a Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-02-02btrfs: add checks for sys_chunk_array sizesDavid Sterba1-0/+19
Verify that possible minimum and maximum size is set, validity of contents is checked in btrfs_read_sys_array. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2015-02-02btrfs: more superblock checks, lower bounds on devices and sectorsize/nodesizeDavid Sterba1-0/+19
I received a few crafted images from Jiri, all got through the recently added superblock checks. The lower bounds checks for num_devices and sector/node -sizes were missing and caused a crash during mount. Tools for symbolic code execution were used to prepare the images contents. Reported-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2015-01-21Btrfs: fix unused members in struct btrfs_rootAnand Jain1-2/+0
There isn't any real use of following members of struct btrfs_root so delete them. struct kobject root_kobj; struct completion kobj_unregister; Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2015-01-21btrfs: set proper message level for skinny metadataDavid Sterba1-1/+1
This has been confusing people for too long, the message is really just informative. CC: <stable@vger.kernel.org> # 3.10+ Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2015-01-21btrfs: update message levels after checksum errorsDavid Sterba1-1/+1
The errors are worth noting and might get missed with INFO level. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2015-01-21btrfs: update message levels during failed mountDavid Sterba1-8/+8
All error conditions from open_ctree shall be ERR. Warning would suggest that something's wrong and we can continue. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2015-01-21btrfs: update message levels for errorsDavid Sterba1-4/+5
Several messages that point to some internal problem, level INFO is wrong here. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2015-01-21Merge branch 'cleanup/blocksize-diet-part2' of ↵Chris Mason1-9/+8
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus
2015-01-20fs: remove default_backing_dev_infoChristoph Hellwig1-1/+1
Now that default_backing_dev_info is not used for writeback purposes we can git rid of it easily: - instead of using it's name for tracing unregistered bdi we just use "unknown" - btrfs and ceph can just assign the default read ahead window themselves like several other filesystems already do. - we can assign noop_backing_dev_info as the default one in alloc_super. All filesystems already either assigned their own or noop_backing_dev_info. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20fs: remove mapping->backing_dev_infoChristoph Hellwig1-1/+0
Now that we never use the backing_dev_info pointer in struct address_space we can simply remove it and save 4 to 8 bytes in every inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20fs: introduce f_op->mmap_capabilities for nommu mmap supportChristoph Hellwig1-2/+1
Since "BDI: Provide backing device capability information [try #3]" the backing_dev_info structure also provides flags for the kind of mmap operation available in a nommu environment, which is entirely unrelated to it's original purpose. Introduce a new nommu-only file operation to provide this information to the nommu mmap code instead. Splitting this from the backing_dev_info structure allows to remove lots of backing_dev_info instance that aren't otherwise needed, and entirely gets rid of the concept of providing a backing_dev_info for a character device. It also removes the need for the mtd_inodefs filesystem. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tejun Heo <tj@kernel.org> Acked-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-14btrfs: expand btrfs_find_item if found_key is NULLDavid Sterba1-2/+6
If the found_key is NULL, then btrfs_find_item becomes a verbose wrapper for simple btrfs_search_slot. After we've removed all such callers, passing a NULL key is not valid anymore. Signed-off-by: David Sterba <dsterba@suse.cz>
2015-01-14btrfs: fix leak of path in btrfs_find_itemDavid Sterba1-1/+8
If btrfs_find_item is called with NULL path it allocates one locally but does not free it. Affected paths are inserting an orphan item for a file and for a subvol root. Move the path allocation to the callers. CC: <stable@vger.kernel.org> # 3.14+ Fixes: 3f870c289900 ("btrfs: expand btrfs_find_item() to include find_orphan_item functionality") Signed-off-by: David Sterba <dsterba@suse.cz>
2014-12-12btrfs: sink parameter len to alloc_extent_bufferDavid Sterba1-3/+2
Because we're using globally known nodesize. Do the same for the sanity test function variant. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-12-12btrfs: sink blocksize parameter to btrfs_find_create_tree_blockDavid Sterba1-6/+6
Finally it's clear that the requested blocksize is always equal to nodesize, with one exception, the superblock. Superblock has fixed size regardless of the metadata block size, but uses the same helpers to initialize sys array/chunk tree and to work with the chunk items. So it pretends to be an extent_buffer for a moment, btrfs_read_sys_array is full of special cases, we're adding one more. Signed-off-by: David Sterba <dsterba@suse.cz>