summaryrefslogtreecommitdiffstats
path: root/fs/dcache.c
AgeCommit message (Collapse)AuthorFilesLines
2010-08-18fs: brlock vfsmount_lockNick Piggin1-5/+6
fs: brlock vfsmount_lock Use a brlock for the vfsmount lock. It must be taken for write whenever modifying the mount hash or associated fields, and may be taken for read when performing mount hash lookups. A new lock is added for the mnt-id allocator, so it doesn't need to take the heavy vfsmount write-lock. The number of atomics should remain the same for fastpath rlock cases, though code would be slightly slower due to per-cpu access. Scalability is not not be much improved in common cases yet, due to other locks (ie. dcache_lock) getting in the way. However path lookups crossing mountpoints should be one case where scalability is improved (currently requiring the global lock). The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node Altix system (high latency to remote nodes), a simple umount microbenchmark (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it took 6.8s, afterwards took 7.1s, about 5% slower. Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18fs: remove extra lookup in __lookup_hashNick Piggin1-25/+35
fs: remove extra lookup in __lookup_hash Optimize lookup for create operations, where no dentry should often be common-case. In cases where it is not, such as unlink, the added overhead is much smaller than the removed. Also, move comments about __d_lookup racyness to the __d_lookup call site. d_lookup is intuitive; __d_lookup is what needs commenting. So in that same vein, add kerneldoc comments to __d_lookup and clean up some of the comments: - We are interested in how the RCU lookup works here, particularly with renames. Make that explicit, and point to the document where it is explained in more detail. - RCU is pretty standard now, and macros make implementations pretty mindless. If we want to know about RCU barrier details, we look in RCU code. - Delete some boring legacy comments because we don't care much about how the code used to work, more about the interesting parts of how it works now. So comments about lazy LRU may be interesting, but would better be done in the LRU or refcount management code. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-14fs/dcache: fix function param name in kernel-docRandy Dunlap1-1/+1
Fix parameter name in kernel-doc notation (causes a warning). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11vfs: show unreachable paths in getcwd and procMiklos Szeredi1-4/+50
Prepend "(unreachable)" to path strings if the path is not reachable from the current root. Two places updated are - the return string from getcwd() - and symlinks under /proc/$PID. Other uses of d_path() are left unchanged (we know that some old software crashes if /proc/mounts is changed). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11vfs: only add " (deleted)" where necessaryMiklos Szeredi1-10/+22
__d_path() has 4 callers: d_path() sys_getcwd() seq_path_root() tomoyo_realpath_from_path2() Of these the only one which needs the " (deleted)" ending is d_path(). sys_getcwd() checks for existence before calling __d_path(). seq_path_root() is used to show the mountpoint path in /proc/PID/mountinfo, which is always a positive. And tomoyo doesn't want the deleted ending. Create a helper "path_with_deleted()" as subsequent patches will need this in multiple places. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11vfs: add prepend_path() helperMiklos Szeredi1-36/+58
Split off prepend_path() from __d_path(). This new helper takes an end-of-buffer pointer and buffer-length pointer just like the other prepend_* functions. Move the " (deleted)" postfix out to __d_path(). This patch doesn't change any functionality but paves the way for the following patches. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11vfs: __d_path: dont prepend the name of the root dentryMiklos Szeredi1-3/+9
In the old times pseudo-filesystems set the name of theroot dentry to some prefix like "pipe:" and the name of the child dentry to "[123]" and relied on a hack in __d_path() to replace the preceding slash with the root's name to get "pipe:[123]". Then the d_dname() dentry operation was introduced which solved the same problem without having to pre-fill the name in each dentry. Currently the following pseudo filesystems exist in the kernel: perfmon mtd anon_inode bdev pipe socket Of these only perfmon, anon_inode, pipe and socket create sub-dentries, all of which have now been switched to using d_dname(). bdev and mtd only create inodes. This means that now the hack to overwrite the slash can be removed, so for unreachable paths (e.g. within a detached mount) the path string won't be polluted with garbage. For these cases a subsequent patch will add a prefix, indicating that the path is unreachable. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11vfs: add helpers to get root and pwdMiklos Szeredi1-10/+2
Add three helpers that retrieve a refcounted copy of the root and cwd from the supplied fs_struct. get_fs_root() get_fs_pwd() get_fs_root_and_pwd() Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09no need for list_for_each_entry_safe()/resetting with superblock listAl Viro1-5/+7
just delay __put_super() a bit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09new helper: __dentry_path()Al Viro1-5/+22
builds path relative to fs root, called under dcache_lock, doesn't append any nonsense to unlinked ones. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-07-19mm: add context argument to shrinker callbackDave Chinner1-1/+1
The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-06-29fs: fix superblock iteration racenpiggin@suse.de1-0/+2
list_for_each_entry_safe is not suitable to protect against concurrent modification of the list. 6754af6 introduced a race in sb walking. list_for_each_entry can use the trick of pinning the current entry in the list before we drop and retake the lock because it subsequently follows cur->next. However list_for_each_entry_safe saves n=cur->next for following before entering the loop body, so when the lock is dropped, n may be deleted. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-21fix prune_dcache()/umount() raceAl Viro1-11/+6
... and get rid of the last __put_super_and_need_restart() caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21Leave superblocks on s_list until the endAl Viro1-0/+2
We used to remove from s_list and s_instances at the same time. So let's *not* do the former and skip superblocks that have empty s_instances in the loops over s_list. The next step, of course, will be to get rid of rescan logics in those loops. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21clean DCACHE_CANT_MOUNT in d_delete()Al Viro1-0/+1
We set the "it's dead, don't mount on it" flag _and_ do not remove it if we turn the damn thing negative and leave it around. And if it goes positive afterwards, well... Fortunately, there's only one place where that needs to be caught: only d_delete() can turn the sucker negative without immediately freeing it; all other places that can lead to ->d_iput() call are followed by unconditionally freeing struct dentry in question. So the fix is obvious: Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16014 Reported-by: Adam Tkac <vonsch@gmail.com> Tested-by: Adam Tkac <vonsch@gmail.com> Cc: <stable@kernel.org> [2.6.34.x] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03fix race in d_splice_alias()Al Viro1-1/+0
rehashing the negative placeholder opens a race with d_lookup(); we unhash it almost immediately (by d_move()), but the race window is there. Since d_move() doesn't rely on target being hashed, we don't need that d_rehash() at all. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03New helper: path_is_under(path1, path2)Al Viro1-0/+24
Analog of is_subdir for vfsmount,dentry pairs, moved from audit_tree.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03fs/dcache.c: CodingStyle cleanupH Hartley Sweeten1-23/+22
Cleanup EXPORT* macros according to Documantation/CodingStyle. Move EXPORT* macros to the line immediately after the closing function brace. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16libfs: move EXPORT_SYMBOL for d_alloc_nameH Hartley Sweeten1-0/+1
The EXPORT_SYMBOL for d_alloc_name is in fs/libfs.c but the function is in fs/dcache.c. Move the EXPORT_SYMBOL to the line immediately after the closing function brace line in fs/dcache.c as mentioned in Documentation/CodingStyle. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-07-18sched: Pull up the might_sleep() check into cond_resched()Frederic Weisbecker1-0/+1
might_sleep() is called late-ish in cond_resched(), after the need_resched()/preempt enabled/system running tests are checked. It's better to check the sleeps while atomic earlier and not depend on some environment datas that reduce the chances to detect a problem. Also define cond_resched_*() helpers as macros, so that the FILE/LINE reported in the sleeping while atomic warning displays the real origin and not sched.h Changes in v2: - Call __might_sleep() directly instead of might_sleep() which may call cond_resched() - Turn cond_resched() into a macro so that the file:line couple reported refers to the caller of cond_resched() and not __cond_resched() itself. Changes in v3: - Also propagate this __might_sleep() pull up to cond_resched_lock() and cond_resched_softirq() Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-6-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11dcache: extrace and use d_unlinked()Alexey Dobriyan1-4/+3
d_unlinked() will be used in middle-term to ban checkpointing when opened but unlinked file is detected, and in long term, to detect such situation and special case on it. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-05-09fs: dcache fix LRU orderingnpiggin@suse.de1-1/+1
Fix ordering of LRU when moving referenced dentries to the head of the list (they should go to the head of the list in the same order as they were found from the tail, rather than reverse order). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-20No need for crossing to mountpoint in audit_tag_tree()Al Viro1-1/+0
is_under() will DTRT anyway. And yes, is_subdir() behaviour is intentional. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-03-31Trim includes of fdtable.hAl Viro1-1/+0
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-03-31Get rid of indirect include of fs_struct.hAl Viro1-0/+1
Don't pull it in sched.h; very few files actually need it and those can include directly. sched.h itself only needs forward declaration of struct fs_struct; Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-03-27cleanup d_add_ciChristoph Hellwig1-30/+18
Make sure that comments describe what's going on and not how, and always use __d_instantiate instead of two separate branches, one with d_instantiate and one with __d_instantiate. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-02-27EXPORT_SYMBOL(d_obtain_alias) rather than EXPORT_SYMBOL_GPLBenny Halevy1-1/+1
Commit 4ea3ada2955e4519befa98ff55dd62d6dfbd1705 declares d_obtain_alias() as EXPORT_SYMBOL_GPL where it's supposed to replace d_alloc_anon which was previously declared as EXPORT_SYMBOL and thus available to any loadable module. This patch reverts that. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-14[CVE-2009-0029] System call wrappers part 20Heiko Carstens1-1/+1
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-08generic swap(): dcache: use swap() instead of private do_switch()Wu Fengguang1-9/+5
Use the new generic implementation. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-31filp_cachep can be static in fs/file_table.cEric Dumazet1-6/+0
Instead of creating the "filp" kmem_cache in vfs_caches_init(), we can do it a litle be later in files_init(), so that filp_cachep is static to fs/file_table.c Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31expand some comments (d_path / seq_path)Arjan van de Ven1-2/+6
Explain that you really need to use the return value of d_path rather than the buffer you passed into it. Also fix the comment for seq_path(), the function arguments changed recently but the comment hadn't been updated in sync. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31correct wrong function name of d_put in kernel document and source commentZhaolei1-1/+1
no function named d_put(), it should be dput(). Impact: fix document and comment, no functionality changed Signed-off-by: Zhao Lei <zhaolei@cn.fuijtsu.com> Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31fix switch_names() breakage in short-to-short caseAl Viro1-2/+3
We want ->name.len to match the resulting name on *both* source and target Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31shrink struct dentryNick Piggin1-4/+0
struct dentry is one of the most critical structures in the kernel. So it's sad to see it going neglected. With CONFIG_PROFILING turned on (which is probably the common case at least for distros and kernel developers), sizeof(struct dcache) == 208 here (64-bit). This gives 19 objects per slab. I packed d_mounted into a hole, and took another 4 bytes off the inline name length to take the padding out from the end of the structure. This shinks it to 200 bytes. I could have gone the other way and increased the length to 40, but I'm aiming for a magic number, read on... I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant: why was this ever a good idea? The cookie system should increase its hash size or use a tree or something if lookups are a problem. Also the "fast dcookie lookups" in oprofile should be moved into the dcookie code -- how can oprofile possibly care about the dcookie_mutex? It gets dropped after get_dcookie() returns so it can't be providing any sort of protection. At 192 bytes, 21 objects fit into a 4K page, saving about 3MB on my system with ~140 000 entries allocated. 192 is also a multiple of 64, so we get nice cacheline alignment on 64 and 32 byte line systems -- any given dentry will now require 3 cachelines to touch all fields wheras previously it would require 4. I know the inline name size was chosen quite carefully, however with the reduction in cacheline footprint, it should actually be just about as fast to do a name lookup for a 36 character name as it was before the patch (and faster for other sizes). The memory footprint savings for names which are <= 32 or > 36 bytes long should more than make up for the memory cost for 33-36 byte names. Performance is a feature... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23[PATCH] fs: add a sanity check in d_freeArjan van de Ven1-0/+1
Hi Al, remember that debug session we did at KS? You suggested this patch back then.... From 7751eaf30474b8cbfaea64795805a17eab05ac53 Mon Sep 17 00:00:00 2001 From: Arjan van de Ven <arjan@linux.intel.com> Date: Tue, 16 Sep 2008 16:51:17 -0700 Subject: [PATCH] fs: add a sanity check in d_free we're seeing some corruption in the dentry->d_alias list that appears like a free of an entry still on the list; this patch adds a WARN_ON() to catch this scenario, as suggested by Al Viro Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2008-10-23[PATCH] fs/dcache.c: update comment of d_validate()Qinghuang Feng1-2/+0
Parameters @hash and @len have been removed since 2.4.3, now just to delete them. Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
2008-10-23[PATCH vfs-2.6 4/6] vfs: remove unnecessary fsnotify_d_instantiate()OGAWA Hirofumi1-1/+0
This calls d_move(), so fsnotify_d_instantiate() is unnecessary like rename path. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2008-10-23[PATCH vfs-2.6 3/6] vfs: add __d_instantiate() helperOGAWA Hirofumi1-15/+16
This adds __d_instantiate() for users which is already taking dcache_lock, and replace with it. The part of d_add_ci() isn't equivalent. But it should be needed fsnotify_d_instantiate() actually, because the path is to add the inode to negative dentry. fsnotify_d_instantiate() should be called after change from negative to positive. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2008-10-23[PATCH vfs-2.6 2/6] vfs: add d_ancestor()OGAWA Hirofumi1-22/+23
This adds d_ancestor() instead of d_isparent(), then use it. If new_dentry == old_dentry, is_subdir() returns 1, looks strange. "new_dentry == old_dentry" is not subdir obviously. But I'm not checking callers for now, so this keeps current behavior. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2008-10-23[PATCH vfs-2.6 1/6] vfs: replace parent == dentry->d_parent by IS_ROOT()OGAWA Hirofumi1-9/+12
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2008-10-23[PATCH] kill d_alloc_anonChristoph Hellwig1-71/+37
Remove d_alloc_anon now that no users are left. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23[PATCH] switch all filesystems over to d_obtain_aliasChristoph Hellwig1-5/+5
Switch all users of d_alloc_anon to d_obtain_alias. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23[PATCH] new helper: d_obtain_aliasChristoph Hellwig1-0/+35
The calling conventions of d_alloc_anon are rather unfortunate for all users, and it's name is not very descriptive either. Add d_obtain_alias as a new exported helper that drops the inode reference in the failure case, too and allows to pass-through NULL pointers and inodes to allow for tail-calls in the export operations. Incidentally this helper already existed as a private function in libfs.c as exportfs_d_alloc so kill that one and switch the callers to d_obtain_alias. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-09-29Fix NULL pointer dereference in proc_sys_compareLinus Torvalds1-4/+6
The VFS interface for the 'd_compare()' is a bit special (read: 'odd'), because it really just essentially replaces a memcmp(). The filesystem is supposed to just compare the two names with whatever case-independent or other function. And when I say 'is supposed to', I obviously mean that 'procfs does odd things, and actually looks at the dentry that we don't even pass down, rather than just the name'. Which results in problems, because we actually call d_compare before we have even verified that the dentry is still hashed at all. And that causes a problm since the inode that procfs looks at may have been free'd and the d_inode pointer is NULL. procfs just assumes that all dentries are positive, since procfs itself never generates a negative one. But memory pressure will still result in the dentry getting torn down, and as it is removed by RCU, it still remains visible on some lists - and to d_compare. If the filesystem just did a name comparison, we wouldn't care. And we could just fix procfs to know about negative dentries too. But rather than have the low-level filesystems know about internal VFS details, just move the check for a unhashed dentry up a bit, so that we will only call d_compare on dentries that are still active. The actual oops this caused didn't look like a NULL pointer dereference because procfs did a 'container_of(inode, struct proc_inode, vfs_inode)' to get at its internal proc_inode information from the inode pointer, and accessed a field below the inode. So the oops would look something like BUG: unable to handle kernel paging request at fffffffffffffff0 IP: [<ffffffff802bc6c6>] proc_sys_compare+0x36/0x50 and was seen on both x86-64 (Alexey Dobriyan and Hugh Dickins) and ppc64 (Hugh Dickins). Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Hugh Dickins <hugh@veritas.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-25[PATCH] change d_add_ci argument orderingChristoph Hellwig1-1/+1
As pointed out during review d_add_ci argument order should match d_add, so switch the dentry and inode arguments. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-28dcache: Add case-insensitive support d_ci_add() routineBarry Naujok1-0/+102
This add a dcache entry to the dcache for lookup, but changing the name that is associated with the entry rather than the one passed in to the lookup routine. First, it sees if the case-exact match already exists in the dcache and uses it if one exists. Otherwise, it allocates a new node with the new name and splices it into the dcache. Original code from ntfs_lookup in fs/ntfs/namei.c by Anton Altaparmakov. Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Acked-by: Christoph Hellwig <hch@infradead.org>
2008-07-26vfs: add cond_resched_lock while scanning dentry LRU listsKentaro Makita1-0/+1
Add cond_resched_lock(&dcache_lock) while scanning LRU lists on superblocks in __shrink_dcache_sb() Signed-off-by: Kentaro Makita <k-makita@np.css.fujitsu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24fix soft lock up at NFS mount via per-SB LRU-list of unused dentriesKentaro Makita1-155/+180
[Summary] Split LRU-list of unused dentries to one per superblock to avoid soft lock up during NFS mounts and remounting of any filesystem. Previously I posted here: http://lkml.org/lkml/2008/3/5/590 [Descriptions] - background dentry_unused is a list of dentries which are not referenced. dentry_unused grows up when references on directories or files are released. This list can be very long if there is huge free memory. - the problem When shrink_dcache_sb() is called, it scans all dentry_unused linearly under spin_lock(), and if dentry->d_sb is differnt from given superblock, scan next dentry. This scan costs very much if there are many entries, and very ineffective if there are many superblocks. IOW, When we need to shrink unused dentries on one dentry, but scans unused dentries on all superblocks in the system. For example, we scan 500 dentries to unmount a filesystem, but scans 1,000,000 or more unused dentries on other superblocks. In our case , At mounting NFS*, shrink_dcache_sb() is called to shrink unused dentries on NFS, but scans 100,000,000 unused dentries on superblocks in the system such as local ext3 filesystems. I hear NFS mounting took 1 min on some system in use. * : NFS uses virtual filesystem in rpc layer, so NFS is affected by this problem. 100,000,000 is possible number on large systems. Per-superblock LRU of unused dentried can reduce the cost in reasonable manner. - How to fix I found this problem is solved by David Chinner's "Per-superblock unused dentry LRU lists V3"(1), so I rebase it and add some fix to reclaim with fairness, which is in Andrew Morton's comments(2). 1) http://lkml.org/lkml/2006/5/25/318 2) http://lkml.org/lkml/2006/5/25/320 Split LRU-list of unused dentries to each superblocks. Then, NFS mounting will check dentries under a superblock instead of all. But this spliting will break LRU of dentry-unused. So, I've attempted to make reclaim unused dentrins with fairness by calculate number of dentries to scan on this sb based on following way number of dentries to scan on this sb = count * (number of dentries on this sb / number of dentries in the machine) - ToDo - I have to measuring performance number and do stress tests. - When unmount occurs during prune_dcache(), scanning on same superblock, It is unable to reach next superblock because it is gone away. We restart scannig superblock from first one, it causes unfairness of reclaim unused dentries on first superblock. But I think this happens very rarely. - Test Results Result on 6GB boxes with excessive unused dentries. Without patch: $ cat /proc/sys/fs/dentry-state 10181835 10180203 45 0 0 0 # mount -t nfs 10.124.60.70:/work/kernel-src nfs real 0m1.830s user 0m0.001s sys 0m1.653s With this patch: $ cat /proc/sys/fs/dentry-state 10236610 10234751 45 0 0 0 # mount -t nfs 10.124.60.70:/work/kernel-src nfs real 0m0.106s user 0m0.002s sys 0m0.032s [akpm@linux-foundation.org: fix comments] Signed-off-by: Kentaro Makita <k-makita@np.css.fujitsu.com> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: David Chinner <dgc@sgi.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-23[patch 2/3] vfs: dcache cleanupsMiklos Szeredi1-17/+14
Comment from Al Viro: add prepend_name() wrapper. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch 1/3] vfs: dcache sparse fixesMiklos Szeredi1-11/+12
Fix the following sparse warnings: fs/dcache.c:2183:19: warning: symbol 'filp_cachep' was not declared. Should it be static? fs/dcache.c:115:3: warning: context imbalance in 'dentry_iput' - unexpected unlock fs/dcache.c:188:2: warning: context imbalance in 'dput' - different lock contexts for basic block fs/dcache.c:400:2: warning: context imbalance in 'prune_one_dentry' - different lock contexts for basic block fs/dcache.c:431:22: warning: context imbalance in 'prune_dcache' - different lock contexts for basic block fs/dcache.c:563:2: warning: context imbalance in 'shrink_dcache_sb' - different lock contexts for basic block fs/dcache.c:1385:6: warning: context imbalance in 'd_delete' - wrong count at exit fs/dcache.c:1636:2: warning: context imbalance in '__d_unalias' - unexpected unlock fs/dcache.c:1735:2: warning: context imbalance in 'd_materialise_unique' - different lock contexts for basic block Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>