summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2008-10-16befs: annotate fs32 on tests for superblock endiannessHarvey Harrison3-5/+9
Does compile-time byteswapping rather than runtime. Noticed by sparse: fs/befs/super.c:29:6: warning: cast to restricted __le32 fs/befs/super.c:29:6: warning: cast from restricted fs32 fs/befs/super.c:31:11: warning: cast to restricted __be32 fs/befs/super.c:31:11: warning: cast from restricted fs32 fs/befs/super.c:31:11: warning: cast to restricted __be32 fs/befs/super.c:31:11: warning: cast from restricted fs32 fs/befs/super.c:31:11: warning: cast to restricted __be32 fs/befs/super.c:31:11: warning: cast from restricted fs32 fs/befs/super.c:31:11: warning: cast to restricted __be32 fs/befs/super.c:31:11: warning: cast from restricted fs32 fs/befs/super.c:31:11: warning: cast to restricted __be32 fs/befs/super.c:31:11: warning: cast from restricted fs32 fs/befs/super.c:31:11: warning: cast to restricted __be32 fs/befs/super.c:31:11: warning: cast from restricted fs32 fs/befs/linuxvfs.c:811:7: warning: cast to restricted __le32 fs/befs/linuxvfs.c:811:7: warning: cast from restricted fs32 fs/befs/linuxvfs.c:812:7: warning: cast to restricted __be32 fs/befs/linuxvfs.c:812:7: warning: cast from restricted fs32 fs/befs/linuxvfs.c:812:7: warning: cast to restricted __be32 fs/befs/linuxvfs.c:812:7: warning: cast from restricted fs32 fs/befs/linuxvfs.c:812:7: warning: cast to restricted __be32 fs/befs/linuxvfs.c:812:7: warning: cast from restricted fs32 fs/befs/linuxvfs.c:812:7: warning: cast to restricted __be32 fs/befs/linuxvfs.c:812:7: warning: cast from restricted fs32 fs/befs/linuxvfs.c:812:7: warning: cast to restricted __be32 fs/befs/linuxvfs.c:812:7: warning: cast from restricted fs32 fs/befs/linuxvfs.c:812:7: warning: cast to restricted __be32 fs/befs/linuxvfs.c:812:7: warning: cast from restricted fs32 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: "Sergey S. Kostyliov" <rathamahata@php4.ru> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16ext2: avoid printk floods in the face of directory corruptionEric Sandeen1-25/+35
A very large directory with many read failures (either due to storage problems, or due to invalid size & blocks from corruption) will generate a printk storm as the filesystem continues to try to read all the blocks. This flood of messages can tie up the box until it is complete - which may be a very long time, especially for very large corrupted values. This is fixed by only reporting the corruption once each time we try to read the directory. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Eugene Teo <eugeneteo@kernel.sg> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16ext2: fix ext2 block reservation early ENOSPC issueMingming Cao1-1/+2
We could run into ENOSPC error on ext2, even when there is free blocks on the filesystem. The problem is triggered in the case the goal block group has 0 free blocks , and the rest block groups are skipped due to the check of "free_blocks < windowsz/2". Current code could fall back to non reservation allocation to prevent early ENOSPC after examing all the block groups with reservation on , but this code was bypassed if the reservation window is turned off already, which is true in this case. This patch fixed two issues: 1) We don't need to turn off block reservation if the goal block group has 0 free blocks left and continue search for the rest of block groups. Current code the intention is to turn off the block reservation if the goal allocation group has a few (some) free blocks left (not enough for make the desired reservation window),to try to allocation in the goal block group, to get better locality. But if the goal blocks have 0 free blocks, it should leave the block reservation on, and continues search for the next block groups,rather than turn off block reservation completely. 2) we don't need to check the window size if the block reservation is off. The problem was originally found and fixed in ext4. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16autofs4: add miscellaneous device for ioctlsIan Kent5-12/+915
Add a miscellaneous device to the autofs4 module for routing ioctls. This provides the ability to obtain an ioctl file handle for an autofs mount point that is possibly covered by another mount. The actual problem with autofs is that it can't reconnect to existing mounts. Immediately one things of just adding the ability to remount autofs file systems would solve it, but alas, that can't work. This is because autofs direct mounts and the implementation of "on demand mount and expire" of nested mount trees have the file system mounted on top of the mount trigger dentry. To resolve this a miscellaneous device node for routing ioctl commands to these mount points has been implemented in the autofs4 kernel module and a library added to autofs. This provides the ability to open a file descriptor for these over mounted autofs mount points. Please refer to Documentation/filesystems/autofs4-mount-control.txt for a discussion of the problem, implementation alternatives considered and a description of the interface. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: build fix] Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16autofs4: track uid and gid of last mount requesterIan Kent3-0/+39
Track the uid and gid of the last process to request a mount for on an autofs dentry. [akpm@linux-foundation.org: fix tpyo in comment] Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16autofs4: cleanup autofs mount type usageIan Kent4-12/+10
Usage of the AUTOFS_TYPE_* defines is a little confusing and appears inconsistent. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16eCryptfs: remove netlink transportTyler Hicks6-388/+60
The netlink transport code has not worked for a while and the miscdev transport is a simpler solution. This patch removes the netlink code and makes the miscdev transport the only eCryptfs kernel to userspace transport. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Dustin Kirkland <kirkland@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16ecryptfs: convert to use new aopsBadari Pulavarty1-31/+50
Convert ecryptfs to use write_begin/write_end Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16eCryptfs: remove retry loop in ecryptfs_readdir()Michael Halcrow1-9/+8
The retry block in ecryptfs_readdir() has been in the eCryptfs code base for a while, apparently for no good reason. This loop could potentially run without terminating. This patch removes the loop, instead erroring out if vfs_readdir() on the lower file fails. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Reported-by: Al Viro <viro@ZinIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16Allow recursion in binfmt_script and binfmt_miscKirill A. Shutemov3-5/+6
binfmt_script and binfmt_misc disallow recursion to avoid stack overflow using sh_bang and misc_bang. It causes problem in some cases: $ echo '#!/bin/ls' > /tmp/t0 $ echo '#!/tmp/t0' > /tmp/t1 $ echo '#!/tmp/t1' > /tmp/t2 $ chmod +x /tmp/t* $ /tmp/t2 zsh: exec format error: /tmp/t2 Similar problem with binfmt_misc. This patch introduces field 'recursion_depth' into struct linux_binprm to track recursion level in binfmt_misc and binfmt_script. If recursion level more then BINPRM_MAX_RECURSION it generates -ENOEXEC. [akpm@linux-foundation.org: make linux_binprm.recursion_depth a uint] Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16alpha: introduce field 'taso' into struct linux_binprmKirill A. Shutemov1-1/+1
This change is Alpha-specific. It adds field 'taso' into struct linux_binprm to remember if the application is TASO. Previously, field sh_bang was used for this purpose. Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16binfmt_som.c: add MODULE_LICENSEAdrian Bunk1-0/+2
Add the missing MODULE_LICENSE("GPL"). Reported-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16compat: move cp_compat_stat to common codeChristoph Hellwig1-0/+39
struct stat / compat_stat is the same on all architectures, so cp_compat_stat should be, too. Turns out it is, except that various architectures have slightly and some high2lowuid/high2lowgid or the direct assignment instead of the SET_UID/SET_GID that expands to the correct one anyway. This patch replaces the arch-specific cp_compat_stat implementations with a common one based on the x86-64 one. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David S. Miller <davem@davemloft.net> [ sparc bits ] Acked-by: Kyle McMartin <kyle@mcmartin.ca> [ parisc bits ] Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16Remove Andrew Morton's old email accountsFrancois Cami3-4/+4
People can use the real name an an index into MAINTAINERS to find the current email address. Signed-off-by: Francois Cami <francois.cami@free.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16epoll: drop unnecessary testDavide Libenzi1-6/+3
Thomas found that there is an unnecessary (always true) test in ep_send_events(). The callback never inserts into ->rdllink while the send loop is performed, and also does the ~EP_PRIVATE_BITS test. Given we're holding the mutex during this time, the conditions tested inside the loop are always true. This patch drops the test done inside the re-insertion loop. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16exec.c, compat.c: fix count(), compat_count() bounds checkingJason Baron2-2/+2
With MAX_ARG_STRINGS set to 0x7FFFFFFF, and being passed to 'count()' and compat_count(), it would appear that the current max bounds check of fs/exec.c:394: if(++i > max) return -E2BIG; would never trigger. Since 'i' is of type int, so values would wrap and the function would continue looping. Simple fix seems to be chaning ++i to i++ and checking for '>='. Signed-off-by: Jason Baron <jbaron@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Ollie Wild" <aaw@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16uclinux: fix gzip header parsing in binfmt_flat.cVolodymyr G. Lukiianyk1-3/+3
There are off-by-one errors in decompress_exec() when calculating the length of optional "original file name" and "comment" fields: the "ret" index is not incremented when terminating '\0' character is reached. The check of the buffer overflow (after an "extra-field" length was taken into account) is also fixed. I've encountered this off-by-one error when tried to reuse gzip-header-parsing part of the decompress_exec() function. There was an "original file name" field in the payload (with miscalculated length) and zlib_inflate() returned Z_DATA_ERROR. But after the fix similar to this one all worked fine. Signed-off-by: Volodymyr G Lukiianyk <volodymyrgl@gmail.com> Acked-by: Greg Ungerer <gerg@snapgear.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-15xfs: fix remount rw with unrecognized optionsChristoph Hellwig1-1/+1
When we skip unrecognized options in xfs_fs_remount we should just break out of the switch and not return because otherwise we may skip clearing the xfs-internal read-only flag. This will only show up on some operations like touch because most read-only checks are done by the VFS which thinks this filesystem is r/w. Eventually we should replace the XFS read-only flag with a helper that always checks the VFS flag to make sure they can never get out of sync. Bug reported and fix verified by Marcel Beister on #xfs. Bug fix verified by updated xfstests/189. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Timothy Shimmin <tes@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-14ocfs2: fix build errorMark Fasheh1-8/+6
I merged the latest ocfs2_read_blocks() changes in xattr.c wrong. This makes Ocfs2 compile again. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14Merge branch 'upstream-linus' of ↵Linus Torvalds41-946/+7353
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (56 commits) ocfs2: Make cached block reads the common case. ocfs2: Kill the last naked wait_on_buffer() for cached reads. ocfs2: Move ocfs2_bread() into dir.c ocfs2: Simplify ocfs2_read_block() ocfs2: Require an inode for ocfs2_read_block(s)(). ocfs2: Separate out sync reads from ocfs2_read_blocks() ocfs2: Refactor xattr list and remove ocfs2_xattr_handler(). ocfs2: Calculate EA hash only by its suffix. ocfs2: Move trusted and user attribute support into xattr.c ocfs2: Uninline ocfs2_xattr_name_hash() ocfs2: Don't check for NULL before brelse() ocfs2: use smaller counters in ocfs2_remove_xattr_clusters_from_cache ocfs2: Documentation update for user_xattr / nouser_xattr mount options ocfs2: make la_debug_mutex static ocfs2: Remove pointless !! ocfs2: Add empty bucket support in xattr. ocfs2/xattr.c: Fix a bug when inserting xattr. ocfs2: Add xattr mount option in ocfs2_show_options() ocfs2: Switch over to JBD2. ocfs2: Add the 'inode64' mount option. ...
2008-10-14Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linuxLinus Torvalds27-372/+621
* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux: (59 commits) svcrdma: Fix IRD/ORD polarity svcrdma: Update svc_rdma_send_error to use DMA LKEY svcrdma: Modify the RPC reply path to use FRMR when available svcrdma: Modify the RPC recv path to use FRMR when available svcrdma: Add support to svc_rdma_send to handle chained WR svcrdma: Modify post recv path to use local dma key svcrdma: Add a service to register a Fast Reg MR with the device svcrdma: Query device for Fast Reg support during connection setup svcrdma: Add FRMR get/put services NLM: Remove unused argument from svc_addsock() function NLM: Remove "proto" argument from lockd_up() NLM: Always start both UDP and TCP listeners lockd: Remove unused fields in the nlm_reboot structure lockd: Add helper to sanity check incoming NOTIFY requests lockd: change nlmclnt_grant() to take a "struct sockaddr *" lockd: Adjust nlmsvc_lookup_host() to accomodate AF_INET6 addresses lockd: Adjust nlmclnt_lookup_host() signature to accomodate non-AF_INET lockd: Support non-AF_INET addresses in nlm_lookup_host() NLM: Convert nlm_lookup_host() to use a single argument svcrdma: Add Fast Reg MR Data Types ...
2008-10-14ocfs2: Make cached block reads the common case.Joel Becker7-17/+24
ocfs2_read_blocks() currently requires the CACHED flag for cached I/O. However, that's the common case. Let's flip it around and provide an IGNORE_CACHE flag for the special users. This has the added benefit of cleaning up the code some (ignore_cache takes on its special meaning earlier in the loop). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Kill the last naked wait_on_buffer() for cached reads.Joel Becker1-4/+3
ocfs2's cached buffer I/O goes through ocfs2_read_block(s)(). dir.c had a naked wait_on_buffer() to wait for some readahead, but it should use ocfs2_read_block() instead. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Move ocfs2_bread() into dir.cJoel Becker3-52/+43
dir.c is the only place using ocfs2_bread(), so let's make it static to that file. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Simplify ocfs2_read_block()Joel Becker16-94/+55
More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED. Only six pass a different flag set. Rather than have every caller care, let's make ocfs2_read_block() take no flags and always do a cached read. The remaining six places can call ocfs2_read_blocks() directly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Require an inode for ocfs2_read_block(s)().Joel Becker17-148/+116
Now that synchronous readers are using ocfs2_read_blocks_sync(), all callers of ocfs2_read_blocks() are passing an inode. Use it unconditionally. Since it's there, we don't need to pass the ocfs2_super either. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Separate out sync reads from ocfs2_read_blocks()Joel Becker5-10/+96
The ocfs2_read_blocks() function currently handles sync reads, cached, reads, and sometimes cached reads. We're going to add some functionality to it, so first we should simplify it. The uncached, synchronous reads are much easer to handle as a separate function, so we instroduce ocfs2_read_blocks_sync(). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Refactor xattr list and remove ocfs2_xattr_handler().Tao Ma1-35/+60
According to Christoph Hellwig's advice, we really don't need a ->list to handle one xattr's list. Just a map from index to xattr prefix is enough. And I also refactor the old list method with the reference from fs/xfs/linux-2.6/xfs_xattr.c and the xattr list method in btrfs. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Calculate EA hash only by its suffix.Tao Ma1-30/+5
According to Christoph Hellwig's advice, the hash value of EA is only calculated by its suffix. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Move trusted and user attribute support into xattr.cMark Fasheh4-179/+111
Per Christoph Hellwig's suggestion - don't split these up. It's not like we gained much by having the two tiny files around. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Uninline ocfs2_xattr_name_hash()Mark Fasheh1-5/+5
This is too big to be inlined. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Don't check for NULL before brelse()Mark Fasheh12-147/+74
This is pointless as brelse() already does the check. Signed-off-by: Mark Fasheh
2008-10-13ocfs2: use smaller counters in ocfs2_remove_xattr_clusters_from_cacheMark Fasheh1-1/+1
i and b_len don't really need to be u64's. Xattr extent lengths should be limited by the VFS, and then the size of our on-disk length field. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: make la_debug_mutex staticMark Fasheh1-2/+1
It can also be moved into ocfs2_la_debug_read(). Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Remove pointless !!Mark Fasheh1-1/+1
ocfs2_stack_supports_plocks() doesn't need this to properly return a zero or one value. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add empty bucket support in xattr.Tao Ma1-154/+43
As Mark mentioned, it may be time-consuming when we remove the empty xattr bucket, so this patch try to let empty bucket exist in xattr operation. The modification includes: 1. Remove the functin of bucket and extent record deletion during xattr delete. 2. In xattr set: 1) Don't clean the last entry so that if the bucket is empty, the hash value of the bucket is the hash value of the entry which is deleted last. 2) During insert, if we meet with an empty bucket, just use the 1st entry. 3. In binary search of xattr bucket, use the bucket hash value(which stored in the 1st xattr entry) to find the right place. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2/xattr.c: Fix a bug when inserting xattr.Tao Ma1-1/+3
During the process of xatt insertion, we use binary search to find the right place and "low" is set to it. But when there is one xattr which has the same name hash as the inserted one, low is the wrong value. So set it to the right position. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add xattr mount option in ocfs2_show_options()Sunil Mushran1-0/+5
Patch adds check for [no]user_xattr in ocfs2_show_options() that completes the list of all mount options. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Switch over to JBD2.Joel Becker12-81/+224
ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is limiting our maximum filesystem size. It's a pretty trivial change. Most functions are just renamed. The only functional change is moving to Jan's inode-based ordered data mode. It's better, too. Because JBD2 reads and writes JBD journals, this is compatible with any existing filesystem. It can even interact with JBD-based ocfs2 as long as the journal is formated for JBD. We provide a compatibility option so that paranoid people can still use JBD for the time being. This will go away shortly. [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to ocfs2_truncate_for_delete(). --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add the 'inode64' mount option.Joel Becker3-2/+21
Now that ocfs2 limits inode numbers to 32bits, add a mount option to disable the limit. This parallels XFS. 64bit systems can handle the larger inode numbers. [ Added description of inode64 mount option in ocfs2.txt. --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Limit inode allocation to 32bits.Joel Becker3-19/+130
ocfs2 inode numbers are block numbers. For any filesystem with less than 2^32 blocks, this is not a problem. However, when ocfs2 starts using JDB2, it will be able to support filesystems with more than 2^32 blocks. This would result in inode numbers higher than 2^32. The problem is that stat(2) can't handle those numbers on 32bit machines. The simple solution is to have ocfs2 allocate all inodes below that boundary. The suballoc code is changed to honor an optional block limit. Only the inode suballocator sets that limit - all other allocations stay unlimited. The biggest trick is to grow the inode suballocator beneath that limit. There's no point in allocating block groups that are above the limit, then rejecting their elements later on. We want to prevent the inode allocator from ever having block groups above the limit. This involves a little gyration with the local alloc code. If the local alloc window is above the limit, it signals the caller to try the global bitmap but does not disable the local alloc file (which can be used for other allocations). [ Minor cleanup - removed an ML_NOTICE comment. --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Resolve deadlock in ocfs2_xattr_free_block.Tao Ma1-70/+82
In ocfs2_xattr_free_block, we take a cluster lock on xb_alloc_inode while we have a transaction open. This will deadlock the downconvert thread, so fix it. We can clean up how xattr blocks are removed while here - this patch also moves the mechanism of releasing xattr block (including both value, xattr tree and xattr block) into this function. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: bug-fix for journal extend in xattr.Tao Ma1-1/+14
In ocfs2_extend_trans, when we can't extend the current transaction, it will commit current transaction and restart a new one. So if the previous credits we have allocated aren't used(the block isn't dirtied before our extend), we will not have enough credits for any future operation(it will cause jbd complain and bug out). So check this and re-extend it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree()Joel Becker6-67/+44
The original get/put_extent_tree() functions held a reference on et_root_bh. However, every single caller already has a safe reference, making the get/put cycle irrelevant. We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree(). It no longer gets a reference on et_root_bh. ocfs2_put_extent_tree() is removed. Callers now have a simpler init+use pattern. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Comment struct ocfs2_extent_tree_operations.Joel Becker1-2/+43
struct ocfs2_extent_tree_operations provides methods for the different on-disk btrees in ocfs2. Describing what those methods do is probably a good idea. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Make ocfs2_extent_tree the first-class representation of a tree.Joel Becker8-332/+240
We now have three different kinds of extent trees in ocfs2: inode data (dinode), extended attributes (xattr_tree), and extended attribute values (xattr_value). There is a nice abstraction for them, ocfs2_extent_tree, but it is hidden in alloc.c. All the calling functions have to pick amongst a varied API and pass in type bits and often extraneous pointers. A better way is to make ocfs2_extent_tree a first-class object. Everyone converts their object to an ocfs2_extent_tree() via the ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all tree calls to alloc.c. This simplifies a lot of callers, making for readability. It also provides an easy way to add additional extent tree types, as they only need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree() function. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add an insertion check to ocfs2_extent_tree_operations.Joel Becker1-25/+44
A couple places check an extent_tree for a valid inode. We move that out to add an eo_insert_check() operation. It can be called from ocfs2_insert_extent() and elsewhere. We also have the wrapper calls ocfs2_et_insert_check() and ocfs2_et_sanity_check() ignore NULL ops. That way we don't have to provide useless operations for xattr types. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Create specific get_extent_tree functions.Joel Becker2-22/+56
A caller knows what kind of extent tree they have. There's no reason they have to call ocfs2_get_extent_tree() with a NULL when they could just as easily call a specific function to their type of extent tree. Introduce ocfs2_dinode_get_extent_tree(), ocfs2_xattr_tree_get_extent_tree(), and ocfs2_xattr_value_get_extent_tree(). They only take the necessary arguments, calling into the underlying __ocfs2_get_extent_tree() to do the real work. __ocfs2_get_extent_tree() is the old ocfs2_get_extent_tree(), but without needing any switch-by-type logic. ocfs2_get_extent_tree() is now a wrapper around the specific calls. It exists because a couple alloc.c functions can take et_type. This will go later. Another benefit is that ocfs2_xattr_value_get_extent_tree() can take a struct ocfs2_xattr_value_root* instead of void*. This gives us typechecking where we didn't have it before. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Determine an extent tree's max_leaf_clusters in an et_op.Joel Becker1-3/+15
Provide an optional extent_tree_operation to specify the max_leaf_clusters of an ocfs2_extent_tree. If not provided, the value is 0 (unlimited). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Use struct ocfs2_extent_tree in ocfs2_num_free_extents().Joel Becker1-25/+5
ocfs2_num_free_extents() re-implements the logic of ocfs2_get_extent_tree(). Now that ocfs2_get_extent_tree() does not allocate, let's use it in ocfs2_num_free_extents() to simplify the code. The inode validation code in ocfs2_num_free_extents() is not needed. All callers are passing in pre-validated inodes. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>