summaryrefslogtreecommitdiffstats
path: root/fs/btrfs
AgeCommit message (Collapse)AuthorFilesLines
2011-05-27Merge branch 'for-linus' of ↵Linus Torvalds48-4557/+6481
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (58 commits) Btrfs: use the device_list_mutex during write_dev_supers Btrfs: setup free ino caching in a more asynchronous way btrfs scrub: don't coalesce pages that are logically discontiguous Btrfs: return -ENOMEM in clear_extent_bit Btrfs: add mount -o auto_defrag Btrfs: using rcu lock in the reader side of devices list Btrfs: drop unnecessary device lock Btrfs: fix the race between remove dev and alloc chunk Btrfs: fix the race between reading and updating devices Btrfs: fix bh leak on __btrfs_open_devices path Btrfs: fix unsafe usage of merge_state Btrfs: allocate extent state and check the result properly fs/btrfs: Add missing btrfs_free_path Btrfs: check return value of btrfs_inc_extent_ref() Btrfs: return error to caller if read_one_inode() fails Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item Btrfs: return error code to caller when btrfs_del_item fails Btrfs: return error code to caller when btrfs_previous_item fails btrfs: fix typo 'testeing' -> 'testing' btrfs: typo: 'btrfS' -> 'btrfs' ...
2011-05-27Btrfs: use the device_list_mutex during write_dev_supersChris Mason1-2/+2
write_dev_supers was changed to use RCU to protect the list of devices, but it was then sleeping while it actually wrote the supers. This fixes it to just use the mutex, since we really don't any concurrency in write_dev_supers anyway. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-26Btrfs: setup free ino caching in a more asynchronous wayLi Zefan1-6/+22
For a filesystem that has lots of files in it, the first time we mount it with free ino caching support, it can take quite a long time to setup the caching before we can create new files. Here we fill the cache with [highest_ino, BTRFS_LAST_FREE_OBJECTID] before we start the caching thread to search through the extent tree. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-26btrfs scrub: don't coalesce pages that are logically discontiguousArne Jansen1-1/+2
scrub_page collects several pages into one bio as long as they are physically contiguous. As we only save one logical address for the whole bio, don't collect pages that are physically contiguous but logically discontiguous. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-26Btrfs: return -ENOMEM in clear_extent_bitChris Mason1-1/+2
The btrfs releasepage function depends on ENOMEM coming back when it is called atomic. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-26Btrfs: add mount -o auto_defragChris Mason8-135/+678
This will detect small random writes into files and queue the up for an auto defrag process. It isn't well suited to database workloads yet, but works for smaller files such as rpm, sqlite or bdb databases. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-26Merge branch 'for-linus' of ↵Linus Torvalds2-0/+11
git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem: xen: cleancache shim to Xen Transcendent Memory ocfs2: add cleancache support ext4: add cleancache support btrfs: add cleancache support ext3: add cleancache support mm/fs: add hooks to support cleancache mm: cleancache core ops functions and config fs: add field to superblock to support cleancache mm/fs: cleancache documentation Fix up trivial conflict in fs/btrfs/extent_io.c due to includes
2011-05-26btrfs: add cleancache supportDan Magenheimer2-0/+11
This sixth patch of eight in this cleancache series "opts-in" cleancache for btrfs. Filesystems must explicitly enable cleancache by calling cleancache_init_fs anytime an instance of the filesystem is mounted. Btrfs uses its own readpage which must be hooked, but all other cleancache hooks are in the VFS layer including the matching cleancache_flush_fs hook which must be called on unmount. Details and a FAQ can be found in Documentation/vm/cleancache.txt [v6-v8: no changes] [v5: jeremy@goop.org: simplify init hook and any future fs init changes] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org>
2011-05-23Merge branch 'cleanups_and_fixes' into inode_numbersChris Mason16-145/+187
Conflicts: fs/btrfs/tree-log.c fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Btrfs: using rcu lock in the reader side of devices listXiao Guangrong4-36/+72
fs_devices->devices is only updated on remove and add device paths, so we can use rcu to protect it in the reader side Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Btrfs: drop unnecessary device lockXiao Guangrong1-7/+6
Drop device_list_mutex for the reader side on clone_fs_devices and btrfs_rm_device pathes since the fs_info->volume_mutex can ensure the device list is not updated btrfs_close_extra_devices is the initialized path, we can not add or remove device at this time, so we can simply drop the mutex safely, like other initialized function does(add_missing_dev, __find_device, __btrfs_open_devices ...). Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Btrfs: fix the race between remove dev and alloc chunkXiao Guangrong1-0/+6
On remove device path, it updates device->dev_alloc_list but does not hold chunk lock Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Btrfs: fix the race between reading and updating devicesXiao Guangrong2-0/+9
On btrfs_congested_fn and __unplug_io_fn paths, we should hold device_list_mutex to avoid remove/add device path to update fs_devices->devices On __btrfs_close_devices and btrfs_prepare_sprout paths, the devices in fs_devices->devices or fs_devices->devices is updated, so we should hold the mutex to avoid the reader side to reach them Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Btrfs: fix bh leak on __btrfs_open_devices pathXiao Guangrong1-0/+1
'bh' is forgot to release if no error is detected Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Btrfs: fix unsafe usage of merge_stateXiao Guangrong1-8/+14
merge_state can free the current state if it can be merged with the next node, but in set_extent_bit(), after merge_state, we still use the current extent to get the next node and cache it into cached_state Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Btrfs: allocate extent state and check the result properlyXiao Guangrong1-8/+26
It doesn't allocate extent_state and check the result properly: - in set_extent_bit, it doesn't allocate extent_state if the path is not allowed wait - in clear_extent_bit, it doesn't check the result after atomic-ly allocate, we trigger BUG_ON() if it's fail - if allocate fail, we trigger BUG_ON instead of returning -ENOMEM since the return value of clear_extent_bit() is ignored by many callers Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23fs/btrfs: Add missing btrfs_free_pathJulia Lawall2-1/+4
Btrfs_alloc_path should be matched with btrfs_free_path in error-handling code. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression struct btrfs_path * x; expression ra,rb; position p1,p2; @@ x = btrfs_alloc_path@p1(...) ... when != btrfs_free_path(x,...) when != if (...) { ... btrfs_free_path(x,...) ...} when != x = ra if(...) { ... when != x = rb when forall when != btrfs_free_path(x,...) \(return <+...x...+>; \| return@p2...; \) } @script:python@ p1 << r.p1; p2 << r.p2; @@ cocci.print_main("alloc",p1) cocci.print_secs("return",p2) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Btrfs: check return value of btrfs_inc_extent_ref()Tsutomu Itoh1-0/+1
If return value of btrfs_inc_extent_ref() is not 0, BUG() is called. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Btrfs: return error to caller if read_one_inode() failsTsutomu Itoh1-6/+18
When read_one_inode() fails, error code is returned to caller instead of BUG_ON(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & ↵Tsutomu Itoh7-17/+2
btrfs_extend_item Currently, btrfs_truncate_item and btrfs_extend_item returns only 0. So, the check by BUG_ON in the caller is unnecessary. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Btrfs: return error code to caller when btrfs_del_item failsTsutomu Itoh4-11/+19
The error code is returned instead of calling BUG_ON when btrfs_del_item returns the error. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Btrfs: return error code to caller when btrfs_previous_item failsTsutomu Itoh1-2/+3
The error code is returned instead of calling BUG_ON when btrfs_previous_item returns the error. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23btrfs: fix typo 'testeing' -> 'testing'Sergei Trofimovich1-2/+2
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23btrfs: typo: 'btrfS' -> 'btrfs'Sergei Trofimovich1-1/+1
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23btrfs: don't spin in shrink_delalloc if there is nothing to freeSergei Trofimovich1-0/+4
Observed as a large delay when --mixed filesystem is filled up. Test example: 1. create tiny --mixed FS: $ dd if=/dev/zero of=2G.img seek=$((2048 * 1024 * 1024 - 1)) count=1 bs=1 $ mkfs.btrfs --mixed 2G.img $ mount -oloop 2G.img /mnt/ut/ 2. Try to fill it up: $ dd if=/dev/urandom of=10M.file bs=10240 count=1024 $ seq 1 256 | while read file_no; do echo $file_no; time cp 10M.file ${file_no}.copy; done Up to '200.copy' it goes fast, but when disk fills-up each -ENOSPC message takes 3 seconds to pop-up _every_ ENOSPC (and in usermode linux it's even more: 30-60 seconds!). (Maybe, time depends on kernel's timer resolution). No IO, no CPU load, just rescheduling. Some debugging revealed busy spinning in shrink_delalloc. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23btrfs: Delete unused version.sh script.Jamey Sharp1-43/+0
In 2008, commit b4f6c45dfbf84f47c21f73f6370ad1292b0627fd dropped the use of fs/btrfs/version.sh, but left the script behind. Kill it. Commit by Jamey Sharp and Josh Triplett. Signed-off-by: Jamey Sharp <jamey@minilop.net> Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23btrfs: Ensure the tree search ioctl returns the right number of recordsHugo Mills1-3/+1
Btrfs's tree search ioctl has a field to indicate that no more than a given number of records should be returned. The ioctl doesn't honour this, as the tested value is not incremented until the end of the copy_to_sk function. This patch removes an unnecessary local variable, and updates the num_found counter as each key is found in the tree. Signed-off-by: Hugo Mills <hugo@carfax.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23BTRFS: Remove unused node_lockAndi Kleen2-4/+0
240f62c8756 replaced the node_lock with rcu_read_lock, but forgot to remove the actual lock in the data structure. Remove it here. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) b43: fix comment typo reqest -> request Haavard Skinnemoen has left Atmel cris: typo in mach-fs Makefile Kconfig: fix copy/paste-ism for dell-wmi-aio driver doc: timers-howto: fix a typo ("unsgined") perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course'). treewide: fix a few typos in comments regulator: change debug statement be consistent with the style of the rest Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations" audit: acquire creds selectively to reduce atomic op overhead rtlwifi: don't touch with treewide double semicolon removal treewide: cleanup continuations and remove logging message whitespace ath9k_hw: don't touch with treewide double semicolon removal include/linux/leds-regulator.h: fix syntax in example code tty: fix typo in descripton of tty_termios_encode_baud_rate xtensa: remove obsolete BKL kernel option from defconfig m68k: fix comment typo 'occcured' arch:Kconfig.locks Remove unused config option. treewide: remove extra semicolons ...
2011-05-23Btrfs: do not flush csum items of unchanged file data during treelogliubo1-0/+3
The current code relogs the entire inode every time during fsync log, and it is much better suited to small files rather than large ones. During my performance test, the fsync performace of large files sucks, and we can ascribe this to the tremendous amount of csum infos of the large ones, cause we have to flush all of these csum infos into log trees even when there are only _one_ change in the whole file data. Apparently, to optimize fsync, we need to create a filter to skip the unnecessary csum ones, that is, the corresponding file data remains unchanged before this fsync. Here I have some test results to show, I use sysbench to do "random write + fsync". === sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K --file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync --file-extra-flags= [prepare, run] === Sysbench args: - Number of threads: 1 - Extra file open flags: 0 - 2 files, 4Gb each - Block size 4Kb - Number of random requests for random IO: 10000 - Read/Write ratio for combined random IO test: 1.50 - Periodic FSYNC enabled, calling fsync() each 100 requests. - Calling fsync() at the end of test, Enabled. - Using synchronous I/O mode - Doing random write test Sysbench results: === Operations performed: 0 Read, 10000 Write, 200 Other = 10200 Total Read 0b Written 39.062Mb Total transferred 39.062Mb === a) without patch: (*SPEED* : 451.01Kb/sec) 112.75 Requests/sec executed b) with patch: (*SPEED* : 4.7533Mb/sec) 1216.84 Requests/sec executed PS: I've made a _sub transid_ stuff patch, but it does not perform as effectively as this patch, and I'm wanderring where the problem is and trying to improve it more. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23Merge branch 'for-chris' of ↵Chris Mason13-13/+1649
git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne into inode_numbers Conflicts: fs/btrfs/Makefile fs/btrfs/ctree.h fs/btrfs/volumes.h Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-22Merge branch 'allocator' of ↵Chris Mason3-334/+201
git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne into inode_numbers Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-22Merge branch 'cleanups' of git://repo.or.cz/linux-2.6/btrfs-unstable into ↵Chris Mason38-3234/+295
inode_numbers Conflicts: fs/btrfs/extent-tree.c fs/btrfs/free-space-cache.c fs/btrfs/inode.c fs/btrfs/tree-log.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-22Btrfs: update the delayed inode code to use the btrfs_ino helper.Chris Mason2-6/+7
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-22Merge branch 'delayed_inode' into inode_numbersChris Mason16-91/+2074
Conflicts: fs/btrfs/inode.c fs/btrfs/ioctl.c fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-21btrfs: implement delayed inode items operationMiao Xie16-91/+2074
Changelog V5 -> V6: - Fix oom when the memory load is high, by storing the delayed nodes into the root's radix tree, and letting btrfs inodes go. Changelog V4 -> V5: - Fix the race on adding the delayed node to the inode, which is spotted by Chris Mason. - Merge Chris Mason's incremental patch into this patch. - Fix deadlock between readdir() and memory fault, which is reported by Itaru Kitayama. Changelog V3 -> V4: - Fix nested lock, which is reported by Itaru Kitayama, by updating space cache inode in time. Changelog V2 -> V3: - Fix the race between the delayed worker and the task which does delayed items balance, which is reported by Tsutomu Itoh. - Modify the patch address David Sterba's comment. - Fix the bug of the cpu recursion spinlock, reported by Chris Mason Changelog V1 -> V2: - break up the global rb-tree, use a list to manage the delayed nodes, which is created for every directory and file, and used to manage the delayed directory name index items and the delayed inode item. - introduce a worker to deal with the delayed nodes. Compare with Ext3/4, the performance of file creation and deletion on btrfs is very poor. the reason is that btrfs must do a lot of b+ tree insertions, such as inode item, directory name item, directory name index and so on. If we can do some delayed b+ tree insertion or deletion, we can improve the performance, so we made this patch which implemented delayed directory name index insertion/deletion and delayed inode update. Implementation: - introduce a delayed root object into the filesystem, that use two lists to manage the delayed nodes which are created for every file/directory. One is used to manage all the delayed nodes that have delayed items. And the other is used to manage the delayed nodes which is waiting to be dealt with by the work thread. - Every delayed node has two rb-tree, one is used to manage the directory name index which is going to be inserted into b+ tree, and the other is used to manage the directory name index which is going to be deleted from b+ tree. - introduce a worker to deal with the delayed operation. This worker is used to deal with the works of the delayed directory name index items insertion and deletion and the delayed inode update. When the delayed items is beyond the lower limit, we create works for some delayed nodes and insert them into the work queue of the worker, and then go back. When the delayed items is beyond the upper bound, we create works for all the delayed nodes that haven't been dealt with, and insert them into the work queue of the worker, and then wait for that the untreated items is below some threshold value. - When we want to insert a directory name index into b+ tree, we just add the information into the delayed inserting rb-tree. And then we check the number of the delayed items and do delayed items balance. (The balance policy is above.) - When we want to delete a directory name index from the b+ tree, we search it in the inserting rb-tree at first. If we look it up, just drop it. If not, add the key of it into the delayed deleting rb-tree. Similar to the delayed inserting rb-tree, we also check the number of the delayed items and do delayed items balance. (The same to inserting manipulation) - When we want to update the metadata of some inode, we cached the data of the inode into the delayed node. the worker will flush it into the b+ tree after dealing with the delayed insertion and deletion. - We will move the delayed node to the tail of the list after we access the delayed node, By this way, we can cache more delayed items and merge more inode updates. - If we want to commit transaction, we will deal with all the delayed node. - the delayed node will be freed when we free the btrfs inode. - Before we log the inode items, we commit all the directory name index items and the delayed inode update. I did a quick test by the benchmark tool[1] and found we can improve the performance of file creation by ~15%, and file deletion by ~20%. Before applying this patch: Create files: Total files: 50000 Total time: 1.096108 Average time: 0.000022 Delete files: Total files: 50000 Total time: 1.510403 Average time: 0.000030 After applying this patch: Create files: Total files: 50000 Total time: 0.932899 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.215732 Average time: 0.000024 [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3 Many thanks for Kitayama-san's help! Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: David Sterba <dave@jikos.cz> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Tested-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-21Merge branch 'ino-alloc' of git://repo.or.cz/linux-btrfs-devel into ↵Chris Mason19-637/+1407
inode_numbers Conflicts: fs/btrfs/free-space-cache.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-20sanitize <linux/prefetch.h> usageLinus Torvalds1-0/+1
Commit e66eed651fd1 ("list: remove prefetching from regular list iterators") removed the include of prefetch.h from list.h, which uncovered several cases that had apparently relied on that rather obscure header file dependency. So this fixes things up a bit, using grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]') grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]') to guide us in finding files that either need <linux/prefetch.h> inclusion, or have it despite not needing it. There are more of them around (mostly network drivers), but this gets many core ones. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds3-22/+44
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix FS_IOC_SETFLAGS ioctl Btrfs: fix FS_IOC_GETFLAGS ioctl fs: remove FS_COW_FL Btrfs: fix easily get into ENOSPC in mixed case Prevent oopsing in posix_acl_valid()
2011-05-14Btrfs: fix FS_IOC_SETFLAGS ioctlLi Zefan1-0/+2
Steps to reproduce the bug: - Call FS_IOC_SETLFAGS ioctl with flags=FS_COMPR_FL - Call FS_IOC_SETFLAGS ioctl with flags=0 - Call FS_IOC_GETFLAGS ioctl, and you'll see FS_COMPR_FL is still set! Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-14Btrfs: fix FS_IOC_GETFLAGS ioctlLi Zefan1-0/+7
As we've added per file compression/cow support. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-14fs: remove FS_COW_FLLi Zefan1-9/+6
FS_COW_FL and FS_NOCOW_FL were newly introduced to control per file COW in btrfs, but FS_NOCOW_FL is sufficient. The fact is we don't have corresponding BTRFS_INODE_COW flag. COW is default, and FS_NOCOW_FL can be used to switch off COW for a single file. If we mount btrfs with nodatacow, a newly created file will be set with the FS_NOCOW_FL flag. So to turn on COW for it, we can just clear the FS_NOCOW_FL flag. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-14Btrfs: fix easily get into ENOSPC in mixed caseliubo1-11/+26
When a btrfs disk is created by mixed data & metadata option, it will have no pure data or pure metadata space info. In btrfs's for-linus branch, commit 78b1ea13838039cd88afdd62519b40b344d6c920 (Btrfs: fix OOPS of empty filesystem after balance) initializes space infos at the very beginning. The problem is this initialization does not take the mixed case into account, which will cause btrfs will easily get into ENOSPC in mixed case. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-14Prevent oopsing in posix_acl_valid()Daniel J Blueman1-2/+3
If posix_acl_from_xattr() returns an error code, a negative address is dereferenced causing an oops; fix by checking for error code first. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-13btrfs: quasi-round-robin for chunk allocationArne Jansen2-305/+177
In a multi device setup, the chunk allocator currently always allocates chunks on the devices in the same order. This leads to a very uneven distribution, especially with RAID1 or RAID10 and an uneven number of devices. This patch always sorts the devices before allocating, and allocates the stripes on the devices with the most available space, as long as there is enough space available. In a low space situation, it first tries to maximize striping. The patch also simplifies the allocator and reduces the checks for corner cases. The simplification is done by several means. First, it defines the properties of each RAID type upfront. These properties are used afterwards instead of differentiating cases in several places. Second, the old allocator defined a minimum stripe size for each block group type, tried to find a large enough chunk, and if this fails just allocates a smaller one. This is now done in one step. The largest possible chunk (up to max_chunk_size) is searched and allocated. Because we now have only one pass, the allocation of the map (struct map_lookup) is moved down to the point where the number of stripes is already known. This way we avoid reallocation of the map. We still avoid allocating stripes that are not a multiple of STRIPE_SIZE.
2011-05-13btrfs: heed alloc_startArne Jansen1-4/+1
currently alloc_start is disregarded if the requested chunk size is bigger than (device size - alloc_start), but smaller than the device size. The only situation where I see this could have made sense was when a chunk equal the size of the device has been requested. This was possible as the allocator failed to take alloc_start into account when calculating the request chunk size. As this gets fixed by this patch, the workaround is not necessary anymore.
2011-05-13btrfs: move btrfs_cmp_device_free_bytes to super.cArne Jansen3-28/+26
this function won't be used here anymore, so move it super.c where it is used for df-calculation
2011-05-12btrfs: use unsigned type for single bit bitfieldDavid Sterba1-4/+4
Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-12btrfs: use printk_ratelimited instead of printk_ratelimitDavid Sterba2-24/+10
As per printk_ratelimit comment, it should not be used. Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-12btrfs: add readonly flagArne Jansen4-12/+16
setting the readonly flag prevents writes in case an error is detected Signed-off-by: Arne Jansen <sensille@gmx.net>