summaryrefslogtreecommitdiffstats
path: root/fs/btrfs/inode.c
AgeCommit message (Collapse)AuthorFilesLines
2010-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-351/+1362
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (27 commits) Btrfs: add more error checking to btrfs_dirty_inode Btrfs: allow unaligned DIO Btrfs: drop verbose enospc printk Btrfs: Fix block generation verification race Btrfs: fix preallocation and nodatacow checks in O_DIRECT Btrfs: avoid ENOSPC errors in btrfs_dirty_inode Btrfs: move O_DIRECT space reservation to btrfs_direct_IO Btrfs: rework O_DIRECT enospc handling Btrfs: use async helpers for DIO write checksumming Btrfs: don't walk around with task->state != TASK_RUNNING Btrfs: do aio_write instead of write Btrfs: add basic DIO read/write support direct-io: do not merge logically non-contiguous requests direct-io: add a hook for the fs to provide its own submit_bio function fs: allow short direct-io reads to be completed via buffered IO Btrfs: Metadata ENOSPC handling for balance Btrfs: Pre-allocate space for data relocation Btrfs: Metadata ENOSPC handling for tree log Btrfs: Metadata reservation for orphan inodes Btrfs: Introduce global metadata reservation ...
2010-05-27Btrfs: add more error checking to btrfs_dirty_inodeChris Mason1-2/+13
The ENOSPC code will now return ENOSPC to btrfs_start_transaction. btrfs_dirty_inode needs to check for this and error out appropriately. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: allow unaligned DIOChris Mason1-3/+35
In order to support DIO that isn't aligned to the filesystem blocksize, we fall back to buffered for any unaligned DIOs. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: fix preallocation and nodatacow checks in O_DIRECTChris Mason1-16/+140
The O_DIRECT code wasn't checking for multiple references on preallocated or nodatacow extents. This means it wasn't honoring snapshots properly. The fix here is to add an explicit check for multiple references This also fixes the math for selecting the correct disk block, making sure not to go past the end of the extent. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: avoid ENOSPC errors in btrfs_dirty_inodeChris Mason1-3/+11
btrfs_dirty_inode tries to sneak in without much waiting or space reservation, mostly for performance reasons. This usually works well but can cause problems when there are many many writers. When btrfs_update_inode fails with ENOSPC, we fallback to a slower btrfs_start_transaction call that will reserve some space. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: move O_DIRECT space reservation to btrfs_direct_IOChris Mason1-1/+8
This moves the delalloc space reservation done for O_DIRECT into btrfs_direct_IO. This way we don't leak reserved space if the generic O_DIRECT write code errors out before it calls into btrfs_direct_IO. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: rework O_DIRECT enospc handlingChris Mason1-11/+41
This changes O_DIRECT write code to mark extents as delalloc while it is processing them. Yan Zheng has reworked the enospc accounting based on tracking delalloc extents and this makes it much easier to track enospc in the O_DIRECT code. There are a few space cases with the O_DIRECT code though, it only sets the EXTENT_DELALLOC bits, instead of doing EXTENT_DELALLOC | EXTENT_DIRTY | EXTENT_UPTODATE, because we don't want to mess with clearing the dirty and uptodate bits when things go wrong. This is important because there are no pages in the page cache, so any extent state structs that we put in the tree won't get freed by releasepage. We have to clear them ourselves as the DIO ends. With this commit, we reserve space at in btrfs_file_aio_write, and then as each btrfs_direct_IO call progresses it sets EXTENT_DELALLOC on the range. btrfs_get_blocks_direct is responsible for clearing the delalloc at the same time it drops the extent lock. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: use async helpers for DIO write checksummingChris Mason1-9/+31
The async helper threads offload crc work onto all the CPUs, and make streaming writes much faster. This changes the O_DIRECT write code to use them. The only small complication was that we need to pass in the logical offset in the file for each bio, because we can't find it in the bio's pages. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: add basic DIO read/write supportJosef Bacik1-24/+463
This provides basic DIO support for reading and writing. It does not do the work to recover from mismatching checksums, that will come later. A few design changes have been made from Jim's code (sorry Jim!) 1) Use the generic direct-io code. Jim originally re-wrote all the generic DIO code in order to account for all of BTRFS's oddities, but thanks to that work it seems like the best bet is to just ignore compression and such and just opt to fallback on buffered IO. 2) Fallback on buffered IO for compressed or inline extents. Jim's code did it's own buffering to make dio with compressed extents work. Now we just fallback onto normal buffered IO. 3) Use ordered extents for the writes so that all of the lock_extent() lookup_ordered() type checks continue to work. 4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with DIO writes. I've tested this with fsx and everything works great. This patch depends on my dio and filemap.c patches to work. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Pre-allocate space for data relocationYan, Zheng1-20/+24
Pre-allocate space for data relocation. This can detect ENOPSC condition caused by fragmentation of free space. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Metadata reservation for orphan inodesYan, Zheng1-42/+257
reserve metadata space for handling orphan inodes Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Introduce global metadata reservationYan, Zheng1-2/+11
Reserve metadata space for extent tree, checksum tree and root tree Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Update metadata reservation for delayed allocationYan, Zheng1-78/+53
Introduce metadata reservation context for delayed allocation and update various related functions. This patch also introduces EXTENT_FIRST_DELALLOC control bit for set/clear_extent_bit. It tells set/clear_bit_hook whether they are processing the first extent_state with EXTENT_DELALLOC bit set. This change is important if set/clear_extent_bit involves multiple extent_state. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Integrate metadata reservation with start_transactionYan, Zheng1-146/+257
Besides simplify the code, this change makes sure all metadata reservation for normal metadata operations are released after committing transaction. Changes since V1: Add code that check if unlink and rmdir will free space. Add ENOSPC handling for clone ioctl. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Kill init_btrfs_i()Yan, Zheng1-36/+28
All code in init_btrfs_i can be moved into btrfs_alloc_inode() Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Shrink delay allocated space in a synchronizedYan, Zheng1-0/+32
Shrink delayed allocation space in a synchronized manner is more controllable than flushing all delay allocated space in an async thread. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-21btrfs: replace inode uid,gid,mode initialization with helper functionDmitry Monakhov1-10/+1
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-55/+4
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: add check for changed leaves in setup_leaf_for_split Btrfs: create snapshot references in same commit as snapshot Btrfs: fix small race with delalloc flushing waitqueue's Btrfs: use add_to_page_cache_lru, use __page_cache_alloc Btrfs: fix chunk allocate size calculation Btrfs: kill max_extent mount option Btrfs: fail to mount if we have problems reading the block groups Btrfs: check btrfs_get_extent return for IS_ERR() Btrfs: handle kmalloc() failure in inode lookup ioctl Btrfs: dereferencing freed memory Btrfs: Simplify num_stripes's calculation logical for __btrfs_alloc_chunk() Btrfs: Add error handle for btrfs_search_slot() in btrfs_read_chunk_tree() Btrfs: Remove unnecessary finish_wait() in wait_current_trans() Btrfs: add NULL check for do_walk_down() Btrfs: remove duplicate include in ioctl.c Fix trivial conflict in fs/btrfs/compression.c due to slab.h include cleanups.
2010-03-30Btrfs: kill max_extent mount optionJosef Bacik1-55/+4
As Yan pointed out, theres not much reason for all this complicated math to account for file extents being split up into max_extent chunks, since they are likely to all end up in the same leaf anyway. Since there isn't much reason to use max_extent, just remove the option altogether so we have one less thing we need to test. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo1-0/+1
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-53/+86
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (30 commits) Btrfs: fix the inode ref searches done by btrfs_search_path_in_tree Btrfs: allow treeid==0 in the inode lookup ioctl Btrfs: return keys for large items to the search ioctl Btrfs: fix key checks and advance in the search ioctl Btrfs: buffer results in the space_info ioctl Btrfs: use __u64 types in ioctl.h Btrfs: fix search_ioctl key advance Btrfs: fix gfp flags masking in the compression code Btrfs: don't look at bio flags after submit_bio btrfs: using btrfs_stack_device_id() get devid btrfs: use memparse Btrfs: add a "df" ioctl for btrfs Btrfs: cache the extent state everywhere we possibly can V2 Btrfs: cache ordered extent when completing io Btrfs: cache extent state in find_delalloc_range Btrfs: change the ordered tree to use a spinlock instead of a mutex Btrfs: finish read pages in the order they are submitted btrfs: fix btrfs_mkdir goto for no free objectids Btrfs: flush data on snapshot creation Btrfs: make df be a little bit more understandable ...
2010-03-15Btrfs: cache the extent state everywhere we possibly can V2Josef Bacik1-42/+69
This patch just goes through and fixes everybody that does lock_extent() blah unlock_extent() to use lock_extent_bits() blah unlock_extent_cached() and pass around a extent_state so we only have to do the searches once per function. This gives me about a 3 mb/s boots on my random write test. I have not converted some things, like the relocation and ioctl's, since they aren't heavily used and the relocation stuff is in the middle of being re-written. I also changed the clear_extent_bit() to only unset the cached state if we are clearing EXTENT_LOCKED and related stuff, so we can do things like this lock_extent_bits() clear delalloc bits unlock_extent_cached() without losing our cached state. I tested this thoroughly and turned on LEAK_DEBUG to make sure we weren't leaking extent states, everything worked out fine. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-15Btrfs: cache ordered extent when completing ioJosef Bacik1-3/+2
When finishing io we run btrfs_dec_test_ordered_pending, and then immediately run btrfs_lookup_ordered_extent, but btrfs_dec_test_ordered_pending does that already, so we're searching twice when we don't have to. This patch lets us pass a btrfs_ordered_extent in to btrfs_dec_test_ordered_pending so if we do complete io on that ordered extent we can just use the one we found then instead of having to do another btrfs_lookup_ordered_extent. This made my fio job with the other patch go from 24 mb/s to 29 mb/s. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-15btrfs: fix btrfs_mkdir goto for no free objectidsMiao Xie1-1/+1
btrfs_mkdir() must jump to the place of ending transaction after btrfs_find_free_objectid() failed. Or this transaction can't end. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-15Btrfs: add new defrag-range ioctl.Chris Mason1-3/+8
The btrfs defrag ioctl was limited to doing the entire file. This commit adds a new interface that can defrag a specific range inside the file. It can also force compression on the file, allowing you to selectively compress individual files after they were created, even when mount -o compress isn't turned on. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-15Btrfs: change how we mount subvolumesJosef Bacik1-4/+6
This work is in preperation for being able to set a different root as the default mounting root. There is currently a problem with how we mount subvolumes. We cannot currently mount a subvolume of a subvolume, you can only mount subvolumes/snapshots of the default subvolume. So say you take a snapshot of the default subvolume and call it snap1, and then take a snapshot of snap1 and call it snap2, so now you have / /snap1 /snap1/snap2 as your available volumes. Currently you can only mount / and /snap1, you cannot mount /snap1/snap2. To fix this problem instead of passing subvolid=<name> you must pass in subvolid=<treeid>, where <treeid> is the tree id that gets spit out via the subvolume listing you get from the subvolume listing patches (btrfs filesystem list). This allows us to mount /, /snap1 and /snap1/snap2 as the root volume. In addition to the above, we also now read the default dir item in the tree root to get the root key that it points to. For now this just points at what has always been the default subvolme, but later on I plan to change it to point at whatever root you want to be the new default root, so you can just set the default mount and not have to mount with -o subvolid=<treeid>. I tested this out with the above scenario and it worked perfectly. Thanks, mount -o subvol operates inside the selected subvolid. For example: mount -o subvol=snap1,subvolid=256 /dev/xxx /mnt /mnt will have the snap1 directory for the subvolume with id 256. mount -o subvol=snap /dev/xxx /mnt /mnt will be the snap directory of whatever the default subvolume is. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-05pass writeback_control to ->write_inodeChristoph Hellwig1-2/+2
This gives the filesystem more information about the writeback that is happening. Trond requested this for the NFS unstable write handling, and other filesystems might benefit from this too by beeing able to distinguish between the different callers in more detail. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-04Btrfs: apply updated fallocate i_size fixAneesh Kumar K.V1-1/+3
This version of the i_size fix for fallocate makes sure we only update the i_size when the current fallocate is really operating outside of i_size. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-02-04Btrfs: do not try and lookup the file extent when finishing ordered ioJosef Bacik1-44/+2
When running the following fio job [torrent] filename=torrent-test rw=randwrite size=4g filesize=4g bs=4k ioengine=sync you would see long stalls where no work was being done. That is because we were doing all this extra work to read in the file extent outside of the transaction, however in the random io case this ends up hurting us because the file extents are not there to begin with. So axe this logic, since we end up reading in the file extent when we go to update it anyway. This took the fio job from 11 mb/s with several ~10 second stalls to 24 mb/s to a couple of 1-2 second stalls. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: run orphan cleanup on default fs rootJosef Bacik1-6/+0
This patch revert's commit 6c090a11e1c403b727a6a8eff0b97d5fb9e95cb5 Since it introduces this problem where we can run orphan cleanup on a volume that can have orphan entries re-added. Instead of my original fix, Yan Zheng pointed out that we can just revert my original fix and then run the orphan cleanup in open_ctree after we look up the fs_root. I have tested this with all the tests that gave me problems and this patch fixes both problems. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: Use correct values when updating inode i_size on fallocateAneesh Kumar K.V1-4/+9
commit f2bc9dd07e3424c4ec5f3949961fe053d47bc825 Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Date: Wed Jan 20 12:57:53 2010 +0530 Btrfs: Use correct values when updating inode i_size on fallocate Even though we allocate more, we should be updating inode i_size as per the arguments passed Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: Add mount -o compress-forceChris Mason1-1/+2
The default btrfs mount -o compress mode will quickly back off compressing a file if it notices that compression does not reduce the size of the data being written. This can save considerable CPU because all future writes to the file go through uncompressed. But some files are both very large and have mixed data stored in them. In that case, we want to add the ability to always try compressing data before writing it. This commit adds mount -o compress-force. A later commit will add a new inode flag that does the same thing. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-17Btrfs: fix regression in orphan cleanupJosef Bacik1-0/+6
Currently orphan cleanup only ever gets triggered if we cross subvolumes during a lookup, which means that if we just mount a plain jane fs that has orphans in it, they will never get cleaned up. This results in panic's like these http://www.kerneloops.org/oops.php?number=1109085 where adding an orphan entry results in -EEXIST being returned and we panic. In order to fix this, we check to see on lookup if our root has had the orphan cleanup done, and if not go ahead and do it. This is easily reproduceable by running this testcase #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> #include <unistd.h> #include <stdio.h> int main(int argc, char **argv) { char data[4096]; char newdata[4096]; int fd1, fd2; memset(data, 'a', 4096); memset(newdata, 'b', 4096); while (1) { int i; fd1 = creat("file1", 0666); if (fd1 < 0) break; for (i = 0; i < 512; i++) write(fd1, data, 4096); fsync(fd1); close(fd1); fd2 = creat("file2", 0666); if (fd2 < 0) break; ftruncate(fd2, 4096 * 512); for (i = 0; i < 512; i++) write(fd2, newdata, 4096); close(fd2); i = rename("file2", "file1"); unlink("file1"); } return 0; } and then pulling the power on the box, and then trying to run that test again when the box comes back up. I've tested this locally and it fixes the problem. Thanks to Tomas Carnecky for helping me track this down initially. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-17btrfs: fix missing last-entry in readdir(3)Jan Engelhardt1-1/+5
parent 49313cdac7b34c9f7ecbb1780cfc648b1c082cd7 (v2.6.32-1-g49313cd) commit ff48c08e1c05c67e8348ab6f8a24de8034e0e34d Author: Jan Engelhardt <jengelh@medozas.de> Date: Wed Dec 9 22:57:36 2009 +0100 Btrfs: fix missing last-entry in readdir(3) When one does a 32-bit readdir(3), the last entry of a directory is missing. This is however not due to passing a large value to filldir, but it seems to have to do with glibc doing telldir or something quirky. In any case, this patch fixes it in practice. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17Btrfs: make sure fallocate properly starts a transactionChris Mason1-4/+9
The recent patch to make fallocate enospc friendly would send down a NULL trans handle to the allocator. This moves the transaction start to properly fix things. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17Btrfs: deny sys_link across subvolumes.TARUISI Hiroaki1-0/+4
I rebased Christian Parpart's patch to deny hard link across subvolumes. Original patch modifies also btrfs_rename, but I excluded it because we can move across subvolumes now and it make no problem. ----------------- Hard link across subvolumes should not allowed in Btrfs. btrfs_link checks root of 'to' directory is same as root of 'from' file. If not same, btrfs_link returns -EPERM. Signed-off-by: TARUISI Hiroaki <taruishi.hiroak@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17Btrfs: Add delayed iputYan, Zheng1-2/+53
iput() can trigger new transactions if we are dropping the final reference, so calling it in btrfs_commit_transaction may end up deadlock. This patch adds delayed iput to avoid the issue. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17Btrfs: Pass transaction handle to security and ACL initialization functionsYan, Zheng1-7/+8
Pass transaction handle down to security and ACL initialization functions, so we can avoid starting nested transactions Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17Btrfs: Make truncate(2) more ENOSPC friendlyYan, Zheng1-124/+192
truncating and deleting regular files are unbound operations, so it's not good to do them in a single transaction. This patch makes btrfs_truncate and btrfs_delete_inode start a new transaction after all items in a tree leaf are deleted. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17Btrfs: Make fallocate(2) more ENOSPC friendlyYan, Zheng1-33/+32
fallocate(2) may allocate large number of file extents, so it's not good to do it in a single transaction. This patch make fallocate(2) start a new transaction for each file extents it allocates. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17Btrfs: Avoid orphan inodes cleanup while replaying logYan, Zheng1-3/+16
We do log replay in a single transaction, so it's not good to do unbound operations. This patch cleans up orphan inodes cleanup after replaying the log. It also avoids doing other unbound operations such as truncating a file during replaying log. These unbound operations are postponed to the orphan inode cleanup stage. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17Btrfs: Fix disk_i_size update corner caseYan, Zheng1-27/+44
There are some cases file extents are inserted without involving ordered struct. In these cases, we update disk_i_size directly, without checking pending ordered extent and DELALLOC bit. This patch extends btrfs_ordered_update_i_size() to handle these cases. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-15Btrfs: Rewrite btrfs_drop_extentsYan, Zheng1-18/+9
Rewrite btrfs_drop_extents by using btrfs_duplicate_item, so we can avoid calling lock_extent within transaction. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-15/+80
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix panic when trying to destroy a newly allocated Btrfs: allow more metadata chunk preallocation Btrfs: fallback on uncompressed io if compressed io fails Btrfs: find ideal block group for caching Btrfs: avoid null deref in unpin_extent_cache() Btrfs: skip btrfs_release_path in btrfs_update_root and btrfs_del_root Btrfs: fix some metadata enospc issues Btrfs: fix how we set max_size for free space clusters Btrfs: cleanup transaction starting and fix journal_info usage Btrfs: fix data allocation hint start
2009-11-11Btrfs: fix panic when trying to destroy a newly allocatedJosef Bacik1-0/+10
There is a problem where iget5_locked will look for an inode, not find it, and then subsequently try to allocate it. Another CPU will have raced in and allocated the inode instead, so when iget5_locked gets the inode spin lock again and does a search, it finds the new inode. So it goes ahead and calls destroy_inode on the inode it just allocated. The problem is we don't set BTRFS_I(inode)->root until the new inode is completely initialized. This patch makes us set root to NULL when alloc'ing a new inode, so when we get to btrfs_destroy_inode and we see that root is NULL we can just free up the memory and continue on. This fixes the panic http://www.kerneloops.org/submitresult.php?number=812690 Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Btrfs: fallback on uncompressed io if compressed io failsJosef Bacik1-8/+23
Currently compressed IO does not deal with not having its entire extent able to be allocated. So if we have enough free space to allocate for the extent, but its not contiguous, it will fail spectacularly. This patch fixes this by falling back on uncompressed IO which lets us spread the delalloc extent across multiple extents. I tested this by making us randomly think the reservation had failed to make it fallback on the uncompressed io way and it seemed to work fine. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Btrfs: fix some metadata enospc issuesJosef Bacik1-5/+31
We weren't reserving metadata space for rename, rmdir and unlink, which could cause problems. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Btrfs: fix data allocation hint startJosef Bacik1-2/+16
Sometimes our start allocation hint when we cow a file can be either EXTENT_HOLE or some other such place holder, which is not optimal. So if we find that our em->block_start is one of these special values, check to see where the first block of the inode is stored, and use that as a hint. If that block is also a special value, just fallback on a hint of 0 and let the allocator figure out a good place to put the data. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-15Merge branch 'master' of ↵Linus Torvalds1-4/+29
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: always pin metadata in discard mode Btrfs: enable discard support Btrfs: add -o discard option Btrfs: properly wait log writers during log sync Btrfs: fix possible ENOSPC problems with truncate Btrfs: fix btrfs acl #ifdef checks Btrfs: streamline tree-log btree block writeout Btrfs: avoid tree log commit when there are no changes Btrfs: only write one super copy during fsync
2009-10-14Btrfs: fix possible ENOSPC problems with truncateJosef Bacik1-3/+24
There's a problem where we don't do any space reservation for truncates, which can cause you to OOPs because you will be allowed to go off in the weeds a bit since we don't account for the delalloc bytes that are created as a result of the truncate. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>