summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2010-06-11Btrfs: uninitialized data is check_path_shared()Dan Carpenter1-1/+1
refs can be used with uninitialized data if btrfs_lookup_extent_info() fails on the first pass through the loop. In the original code if that happens then check_path_shared() probably returns 1, this patch changes it to return 1 for safety. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11Btrfs: fix fallocate regressionJosef Bacik1-1/+1
Seems that when btrfs_fallocate was converted to use the new ENOSPC stuff we dropped passing the mode to the function that actually does the preallocation. This breaks anybody who wants to use FALLOC_FL_KEEP_SIZE. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11Btrfs: fix loop device on top of btrfsMiao Xie1-0/+1
We cannot use the loop device which has been connected to a file in the btrf The reproduce steps is following: # dd if=/dev/zero of=vdev0 bs=1M count=1024 # losetup /dev/loop0 vdev0 # mkfs.btrfs /dev/loop0 ... failed to zero device start -5 The reason is that the btrfs don't implement either ->write_begin or ->write the VFS API, so we fix it by setting ->write to do_sync_write(). Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-27Btrfs: add more error checking to btrfs_dirty_inodeChris Mason1-2/+13
The ENOSPC code will now return ENOSPC to btrfs_start_transaction. btrfs_dirty_inode needs to check for this and error out appropriately. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: allow unaligned DIOChris Mason1-3/+35
In order to support DIO that isn't aligned to the filesystem blocksize, we fall back to buffered for any unaligned DIOs. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: drop verbose enospc printkChris Mason1-0/+2
Less printk is good printk. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: Fix block generation verification raceYan, Zheng1-1/+1
After the path is released, the generation number got from block pointer is no long valid. The race may cause disk corruption, because verify_parent_transid() calls clear_extent_buffer_uptodate() when generation numbers mismatch. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: fix preallocation and nodatacow checks in O_DIRECTChris Mason1-16/+140
The O_DIRECT code wasn't checking for multiple references on preallocated or nodatacow extents. This means it wasn't honoring snapshots properly. The fix here is to add an explicit check for multiple references This also fixes the math for selecting the correct disk block, making sure not to go past the end of the extent. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: avoid ENOSPC errors in btrfs_dirty_inodeChris Mason1-3/+11
btrfs_dirty_inode tries to sneak in without much waiting or space reservation, mostly for performance reasons. This usually works well but can cause problems when there are many many writers. When btrfs_update_inode fails with ENOSPC, we fallback to a slower btrfs_start_transaction call that will reserve some space. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-26Btrfs: move O_DIRECT space reservation to btrfs_direct_IOChris Mason2-6/+8
This moves the delalloc space reservation done for O_DIRECT into btrfs_direct_IO. This way we don't leak reserved space if the generic O_DIRECT write code errors out before it calls into btrfs_direct_IO. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: rework O_DIRECT enospc handlingChris Mason4-30/+49
This changes O_DIRECT write code to mark extents as delalloc while it is processing them. Yan Zheng has reworked the enospc accounting based on tracking delalloc extents and this makes it much easier to track enospc in the O_DIRECT code. There are a few space cases with the O_DIRECT code though, it only sets the EXTENT_DELALLOC bits, instead of doing EXTENT_DELALLOC | EXTENT_DIRTY | EXTENT_UPTODATE, because we don't want to mess with clearing the dirty and uptodate bits when things go wrong. This is important because there are no pages in the page cache, so any extent state structs that we put in the tree won't get freed by releasepage. We have to clear them ourselves as the DIO ends. With this commit, we reserve space at in btrfs_file_aio_write, and then as each btrfs_direct_IO call progresses it sets EXTENT_DELALLOC on the range. btrfs_get_blocks_direct is responsible for clearing the delalloc at the same time it drops the extent lock. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: use async helpers for DIO write checksummingChris Mason5-17/+52
The async helper threads offload crc work onto all the CPUs, and make streaming writes much faster. This changes the O_DIRECT write code to use them. The only small complication was that we need to pass in the logical offset in the file for each bio, because we can't find it in the bio's pages. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: don't walk around with task->state != TASK_RUNNINGChris Mason2-3/+4
Yan Zheng noticed two places we were doing a lot of work without task->state set to TASK_RUNNING. This sets the state properly after we get ready to sleep but decide not to. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: do aio_write instead of writeJosef Bacik2-83/+104
In order for AIO to work, we need to implement aio_write. This patch converts our btrfs_file_write to btrfs_aio_write. I've tested this with xfstests and nothing broke, and the AIO stuff magically started working. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: add basic DIO read/write supportJosef Bacik6-36/+631
This provides basic DIO support for reading and writing. It does not do the work to recover from mismatching checksums, that will come later. A few design changes have been made from Jim's code (sorry Jim!) 1) Use the generic direct-io code. Jim originally re-wrote all the generic DIO code in order to account for all of BTRFS's oddities, but thanks to that work it seems like the best bet is to just ignore compression and such and just opt to fallback on buffered IO. 2) Fallback on buffered IO for compressed or inline extents. Jim's code did it's own buffering to make dio with compressed extents work. Now we just fallback onto normal buffered IO. 3) Use ordered extents for the writes so that all of the lock_extent() lookup_ordered() type checks continue to work. 4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with DIO writes. I've tested this with fsx and everything works great. This patch depends on my dio and filemap.c patches to work. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25direct-io: do not merge logically non-contiguous requestsJosef Bacik1-2/+18
Btrfs cannot handle having logically non-contiguous requests submitted. For example if you have Logical: [0-4095][HOLE][8192-12287] Physical: [0-4095] [4096-8191] Normally the DIO code would put these into the same BIO's. The problem is we need to know exactly what offset is associated with what BIO so we can do our checksumming and unlocking properly, so putting them in the same BIO doesn't work. So add another check where we submit the current BIO if the physical blocks are not contigous OR the logical blocks are not contiguous. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25direct-io: add a hook for the fs to provide its own submit_bio functionJosef Bacik2-8/+45
Because BTRFS can do RAID and such, we need our own submit hook so we can setup the bio's in the correct fashion, and handle checksum errors properly. So there are a few changes here 1) The submit_io hook. This is straightforward, just call this instead of submit_bio. 2) Allow the fs to return -ENOTBLK for reads. Usually this has only worked for writes, since writes can fallback onto buffered IO. But BTRFS needs the option of falling back on buffered IO if it encounters a compressed extent, since we need to read the entire extent in and decompress it. So if we get -ENOTBLK back from get_block we'll return back and fallback on buffered just like the write case. I've tested these changes with fsx and everything seems to work. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25fs: allow short direct-io reads to be completed via buffered IOJosef Bacik1-5/+31
This is similar to what already happens in the write case. If we have a short read while doing O_DIRECT, instead of just returning, fallthrough and try to read the rest via buffered IO. BTRFS needs this because if we encounter a compressed or inline extent during DIO, we need to fallback on buffered. If the extent is compressed we need to read the entire thing into memory and de-compress it into the users pages. I have tested this with fsx and everything works great. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Metadata ENOSPC handling for balanceYan, Zheng5-747/+1163
This patch adds metadata ENOSPC handling for the balance code. It is consisted by following major changes: 1. Avoid COW tree leave in the phrase of merging tree. 2. Handle interaction with snapshot creation. 3. make the backref cache can live across transactions. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Pre-allocate space for data relocationYan, Zheng3-45/+92
Pre-allocate space for data relocation. This can detect ENOPSC condition caused by fragmentation of free space. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Metadata ENOSPC handling for tree logYan, Zheng5-128/+156
Previous patches make the allocater return -ENOSPC if there is no unreserved free metadata space. This patch updates tree log code and various other places to propagate/handle the ENOSPC error. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Metadata reservation for orphan inodesYan, Zheng9-66/+365
reserve metadata space for handling orphan inodes Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Introduce global metadata reservationYan, Zheng8-76/+241
Reserve metadata space for extent tree, checksum tree and root tree Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Update metadata reservation for delayed allocationYan, Zheng9-415/+232
Introduce metadata reservation context for delayed allocation and update various related functions. This patch also introduces EXTENT_FIRST_DELALLOC control bit for set/clear_extent_bit. It tells set/clear_bit_hook whether they are processing the first extent_state with EXTENT_DELALLOC bit set. This change is important if set/clear_extent_bit involves multiple extent_state. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Integrate metadata reservation with start_transactionYan, Zheng15-528/+678
Besides simplify the code, this change makes sure all metadata reservation for normal metadata operations are released after committing transaction. Changes since V1: Add code that check if unlink and rmdir will free space. Add ENOSPC handling for clone ioctl. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Introduce contexts for metadata reservationYan, Zheng7-385/+853
Introducing metadata reseravtion contexts has two major advantages. First, it makes metadata reseravtion more traceable. Second, it can reclaim freed space and re-add them to the itself after transaction committed. Besides add btrfs_block_rsv structure and related helper functions, This patch contains following changes: Move code that decides if freed tree block should be pinned into btrfs_free_tree_block(). Make space accounting more accurate, mainly for handling read only block groups. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Kill init_btrfs_i()Yan, Zheng1-36/+28
All code in init_btrfs_i can be moved into btrfs_alloc_inode() Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Shrink delay allocated space in a synchronizedYan, Zheng4-121/+88
Shrink delayed allocation space in a synchronized manner is more controllable than flushing all delay allocated space in an async thread. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Kill allocate_wait in space_infoYan, Zheng2-76/+58
We already have fs_info->chunk_mutex to avoid concurrent chunk creation. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Link block groups of different raid typesYan, Zheng3-55/+121
The size of reserved space is stored in space_info. If block groups of different raid types are linked to separate space_info, changing allocation profile will corrupt reserved space accounting. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-16Linus 2.6.34v2.6.34Linus Torvalds1-1/+1
2010-05-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds5-81/+160
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: rtnetlink: make SR-IOV VF interface symmetric sctp: delete active ICMP proto unreachable timer when free transport tcp: fix MD5 (RFC2385) support
2010-05-16Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linusLinus Torvalds4-7/+21
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: MIPS: Oprofile: Fix Loongson irq handler MIPS: N32: Use compat version for sys_ppoll. MIPS FPU emulator: allow Cause bits of FCSR to be writeable by ctc1
2010-05-16rtnetlink: make SR-IOV VF interface symmetricChris Wright2-53/+129
Now we have a set of nested attributes: IFLA_VFINFO_LIST (NESTED) IFLA_VF_INFO (NESTED) IFLA_VF_MAC IFLA_VF_VLAN IFLA_VF_TX_RATE This allows a single set to operate on multiple attributes if desired. Among other things, it means a dump can be replayed to set state. The current interface has yet to be released, so this seems like something to consider for 2.6.34. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16sctp: delete active ICMP proto unreachable timer when free transportWei Yongjun1-0/+4
transport may be free before ICMP proto unreachable timer expire, so we should delete active ICMP proto unreachable timer when transport is going away. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16tcp: fix MD5 (RFC2385) supportEric Dumazet2-28/+27
TCP MD5 support uses percpu data for temporary storage. It currently disables preemption so that same storage cannot be reclaimed by another thread on same cpu. We also have to make sure a softirq handler wont try to use also same context. Various bug reports demonstrated corruptions. Fix is to disable preemption and BH. Reported-by: Bhaskar Dutta <bhaskie@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 MIPS: Oprofile: Fix Loongson irq handlerWu Zhangjin1-1/+1
The interrupt enable bit for the performance counters is in the Control Register $24, not in the counter register. loongson2_perfcount_handler(), we need to use Reported-by: Xu Hengyang <hengyang@mail.ustc.edu.cn> Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com> Cc: linux-mips@linux-mips.org Patchwork: http://patchwork.linux-mips.org/patch/1198/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> ---
2010-05-15 MIPS: N32: Use compat version for sys_ppoll.Chandrakala Chavva1-1/+1
The sys_ppoll() takes struct 'struct timespec'. This is different for the N32 and N64 ABIs. Use the compat version to do the proper conversions. Signed-off-by: David Daney <ddaney@caviumnetworks.com> To: linux-mips@linux-mips.org Patchwork: http://patchwork.linux-mips.org/patch/1210/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> ---
2010-05-15 MIPS FPU emulator: allow Cause bits of FCSR to be writeable by ctc1Shane McDonald2-5/+19
In the FPU emulator code of the MIPS, the Cause bits of the FCSR register are not currently writeable by the ctc1 instruction. In odd corner cases, this can cause problems. For example, a case existed where a divide-by-zero exception was generated by the FPU, and the signal handler attempted to restore the FPU registers to their state before the exception occurred. In this particular setup, writing the old value to the FCSR register would cause another divide-by-zero exception to occur immediately. The solution is to change the ctc1 instruction emulator code to allow the Cause bits of the FCSR register to be writeable. This is the behaviour of the hardware that the code is emulating. This problem was found by Shane McDonald, but the credit for the fix goes to Kevin Kissell. In Kevin's words: I submit that the bug is indeed in that ctc_op: case of the emulator. The Cause bits (17:12) are supposed to be writable by that instruction, but the CTC1 emulation won't let them be updated by the instruction. I think that actually if you just completely removed lines 387-388 [...] things would work a good deal better. At least, it would be a more accurate emulation of the architecturally defined FPU. If I wanted to be really, really pedantic (which I sometimes do), I'd also protect the reserved bits that aren't necessarily writable. Signed-off-by: Shane McDonald <mcdonald.shane@gmail.com> To: anemo@mba.ocn.ne.jp To: kevink@paralogos.com To: sshtylyov@mvista.com Patchwork: http://patchwork.linux-mips.org/patch/1205/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> ---
2010-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds1-0/+5
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: check for read permission on src file in the clone ioctl
2010-05-15lib/btree: fix possible NULL pointer dereferencekirjanov@gmail.com1-1/+2
mempool_alloc() can return null in atomic case. Signed-off-by: Denis Kirjanov <kirjanov@gmail.com> Cc: Joern Engel <joern@logfs.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-15mmc: at91_mci: modify cache flush routinesNicolas Ferre1-1/+1
As we were using an internal dma flushing routine, this patch changes to the DMA API flush_kernel_dcache_page(). Driver is able to compile now. [akpm@linux-foundation.org: flush_kernel_dcache_page() comes before kunmap_atomic()] Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-15Btrfs: check for read permission on src file in the clone ioctlDan Rosenberg1-0/+5
The existing code would have allowed you to clone a file that was only open for writing Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-15Merge branch 'for-linus' of ↵Linus Torvalds8-26/+49
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: JFS: Free sbi memory in error path fs/sysv: dereferencing ERR_PTR() Fix double-free in logfs Fix the regression created by "set S_DEAD on unlink()..." commit
2010-05-15Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf record: Add a fallback to the reference relocation symbol
2010-05-15JFS: Free sbi memory in error pathJan Blunck1-7/+6
I spotted the missing kfree() while removing the BKL. [akpm@linux-foundation.org: avoid multiple returns so it doesn't happen again] Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-15fs/sysv: dereferencing ERR_PTR()Dan Carpenter1-1/+1
I moved the dir_put_page() inside the if condition so we don't dereference "page", if it's an ERR_PTR(). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-15Fix double-free in logfsAl Viro1-7/+7
iput() is needed *until* we'd done successful d_alloc_root() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-15Fix the regression created by "set S_DEAD on unlink()..." commitAl Viro5-11/+35
1) i_flags simply doesn't work for mount/unlink race prevention; we may have many links to file and rm on one of those obviously shouldn't prevent bind on top of another later on. To fix it right way we need to mark _dentry_ as unsuitable for mounting upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and i_mutex on the inode in question. Set it (with dont_mount(dentry)) in unlink/rmdir/etc., check (with cant_mount(dentry)) in places in namespace.c that used to check for S_DEAD. Setting S_DEAD is still needed in places where we used to set it (for directories getting killed), since we rely on it for readdir/rmdir race prevention. 2) rename()/mount() protection has another bogosity - we unhash the target before we'd checked that it's not a mountpoint. Fixed. 3) ancient bogosity in pivot_root() - we locked i_mutex on the right directory, but checked S_DEAD on the different (and wrong) one. Noticed and fixed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-14Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds12-35/+99
* master.kernel.org:/home/rmk/linux-2.6-arm: ARM: 6126/1: ARM mpcore_wdt: fix build failure and other fixes ARM: 6125/1: ARM TWD: move TWD registers to common header ARM: 6110/1: Fix Thumb-2 kernel builds when UACCESS_WITH_MEMCPY is enabled ARM: 6112/1: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP ARM: 6111/1: Implement read/write for ownership in the ARMv6 DMA cache ops ARM: 6106/1: Implement copy_to_user_page() for noMMU ARM: 6105/1: Fix the __arm_ioremap_caller() definition in nommu.c