summaryrefslogtreecommitdiffstats
path: root/fs/namei.c
AgeCommit message (Collapse)AuthorFilesLines
2012-04-06Make the "word-at-a-time" helper functions more commonly usableLinus Torvalds1-32/+3
I have a new optimized x86 "strncpy_from_user()" that will use these same helper functions for all the same reasons the name lookup code uses them. This is preparation for that. This moves them into an architecture-specific header file. It's architecture-specific for two reasons: - some of the functions are likely to want architecture-specific implementations. Even if the current code happens to be "generic" in the sense that it should work on any little-endian machine, it's likely that the "multiply by a big constant and shift" implementation is less than optimal for an architecture that has a guaranteed fast bit count instruction, for example. - I expect that if architectures like sparc want to start playing around with this, we'll need to abstract out a few more details (in particular the actual unaligned accesses). So we're likely to have more architecture-specific stuff if non-x86 architectures start using this. (and if it turns out that non-x86 architectures don't start using this, then having it in an architecture-specific header is still the right thing to do, of course) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-31vfs: fix out-of-date dentry_unhash() commentJ. Bruce Fields1-1/+1
64252c75a2196a0cf1e0d3777143ecfe0e3ae650 "vfs: remove dget() from dentry_unhash()" changed the implementation but not the comment. Cc: Sage Weil <sage@newdream.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31vfs: split __lookup_hashMiklos Szeredi1-64/+44
Split __lookup_hash into two component functions: lookup_dcache - tries cached lookup, returns whether real lookup is needed lookup_real - calls i_op->lookup This eliminates code duplication between d_alloc_and_lookup() and d_inode_lookup(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31untangling do_lookup() - take __lookup_hash()-calling case out of line.Al Viro1-15/+16
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31untangling do_lookup() - switch to calling __lookup_hash()Al Viro1-67/+46
now we have __lookup_hash() open-coded if !dentry case; just call the damn thing instead... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31untangling do_lookup() - merge d_alloc_and_lookup() callersAl Viro1-3/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31untangling do_lookup() - merge failure exits in !dentry caseAl Viro1-15/+8
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31untangling do_lookup() - massage !dentry case towards __lookup_hash()Al Viro1-25/+20
Reorder if-else cases for starters... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31untangling do_lookup() - get rid of need_reval in !dentry caseAl Viro1-6/+1
Everything arriving into if (!dentry) will have need_reval = 1. Indeed, the only way to get there with need_reval reset to 0 would be via if (unlikely(d_need_lookup(dentry))) goto unlazy; if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE)) { status = d_revalidate(dentry, nd); if (unlikely(status <= 0)) { if (status != -ECHILD) need_reval = 0; goto unlazy; ... unlazy: /* no assignments to dentry */ if (dentry && unlikely(d_need_lookup(dentry))) { dput(dentry); dentry = NULL; } and if d_need_lookup() had already been false the first time around, it will remain false on the second call as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31untangling do_lookup() - eliminate a loop.Al Viro1-4/+8
d_lookup() *will* fail after successful d_invalidate(), if we are holding i_mutex all along. IOW, we don't need to jump back to l: - we know what path will be taken there and can do that (i.e. d_alloc_and_lookup()) directly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31untangling do_lookup() - expand the area under ->i_mutexAl Viro1-2/+4
keep holding ->i_mutex over revalidation parts Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31untangling do_lookup() - isolate !dentry stuff from the rest of it.Al Viro1-1/+16
Duplicate the revalidation-related parts into if (!dentry) branch. Next step will be to pull them under i_mutex. This and the next 8 commits are more or less a splitup of patch by Miklos; folks, when you are working with something that convoluted, carve your patches up into easily reviewed steps, especially when a lot of codepaths involved are rarely hit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31vfs: move MAY_EXEC check from __lookup_hash()Miklos Szeredi1-6/+5
The only caller of __lookup_hash() that needs the exec permission check on parent is lookup_one_len(). All lookup_hash() callers already checked permission in LOOKUP_PARENT walk. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31vfs: don't revalidate just looked up dentryMiklos Szeredi1-3/+1
__lookup_hash() calls ->lookup() if the dentry needs lookup and on success revalidates the dentry (all under dir->i_mutex). While this is harmless it doesn't make a lot of sense. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31vfs: fix d_need_lookup/d_revalidate order in do_lookupMiklos Szeredi1-2/+2
Doing revalidate on a dentry which has not yet been looked up makes no sense. Move the d_need_lookup() check before d_revalidate(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-24Merge tag 'module-for-3.4' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker: "Fix up files in fs/ and lib/ dirs to only use module.h if they really need it. These are trivial in scope vs the work done previously. We now have things where any few remaining cleanups can be farmed out to arch or subsystem maintainers, and I have done so when possible. What is remaining here represents the bits that don't clearly lie within a single arch/subsystem boundary, like the fs dir and the lib dir. Some duplicate includes arising from overlapping fixes from independent subsystem maintainer submissions are also quashed." Fix up trivial conflicts due to clashes with other include file cleanups (including some due to the previous bug.h cleanup pull). * tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: lib: reduce the use of module.h wherever possible fs: reduce the use of module.h wherever possible includecheck: delete any duplicate instances of module.h
2012-03-22vfs: tidy up sparse warnings in fs/namei.cLinus Torvalds1-3/+3
While doing the fs/namei.c cleanups, I ran sparse on it, and it pointed out other large integers and a couple of cases of us using '0' instead of the proper 'NULL'. Sparse still doesn't understand some of the conditional locking going on, but that's no excuse for not fixing up the trivial stuff. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-22vfs: tidy up fs/namei.c byte-repeat word constantsLinus Torvalds1-9/+4
In commit commit 1de5b41cd3b2 ("fs/namei.c: fix warnings on 32-bit") Andrew said that there must be a tidier way of doing this. This is that tidier way. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-22Fix full_name_hash() behaviour when length is a multiple of 8Al Viro1-1/+1
We want it to match what hash_name() is doing, which means extra multiply by 9 in this case... Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-22Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds1-0/+6
Merge first batch of patches from Andrew Morton: "A few misc things and all the MM queue" * emailed from Andrew Morton <akpm@linux-foundation.org>: (92 commits) memcg: avoid THP split in task migration thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE memcg: clean up existing move charge code mm/memcontrol.c: remove unnecessary 'break' in mem_cgroup_read() mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event() mm/memcontrol.c: s/stealed/stolen/ memcg: fix performance of mem_cgroup_begin_update_page_stat() memcg: remove PCG_FILE_MAPPED memcg: use new logic for page stat accounting memcg: remove PCG_MOVE_LOCK flag from page_cgroup memcg: simplify move_account() check memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) memcg: kill dead prev_priority stubs memcg: remove PCG_CACHE page_cgroup flag memcg: let css_get_next() rely upon rcu_read_lock() cgroup: revert ss_id_lock to spinlock idr: make idr_get_next() good for rcu_read_lock() memcg: remove unnecessary thp check in page stat accounting memcg: remove redundant returns memcg: enum lru_list lru ...
2012-03-21fs/namei.c: fix warnings on 32-bitAndrew Morton1-0/+6
i386 allnoconfig: fs/namei.c: In function 'has_zero': fs/namei.c:1617: warning: integer constant is too large for 'unsigned long' type fs/namei.c:1617: warning: integer constant is too large for 'unsigned long' type fs/namei.c: In function 'hash_name': fs/namei.c:1635: warning: integer constant is too large for 'unsigned long' type There must be a tidier way of doing this. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21Merge branch 'for-linus' of ↵Linus Torvalds1-1/+14
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 1 from Al Viro: "This is _not_ all; in particular, Miklos' and Jan's stuff is not there yet." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits) ext4: initialization of ext4_li_mtx needs to be done earlier debugfs-related mode_t whack-a-mole hfsplus: add an ioctl to bless files hfsplus: change finder_info to u32 hfsplus: initialise userflags qnx4: new helper - try_extent() qnx4: get rid of qnx4_bread/qnx4_getblk take removal of PF_FORKNOEXEC to flush_old_exec() trim includes in inode.c um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it um: embed ->stub_pages[] into mmu_context gadgetfs: list_for_each_safe() misuse ocfs2: fix leaks on failure exits in module_init ecryptfs: make register_filesystem() the last potential failure exit ntfs: forgets to unregister sysctls on register_filesystem() failure logfs: missing cleanup on register_filesystem() failure jfs: mising cleanup on register_filesystem() failure make configfs_pin_fs() return root dentry on success configfs: configfs_create_dir() has parent dentry in dentry->d_parent configfs: sanitize configfs_create() ...
2012-03-21Merge branch 'kmap_atomic' of git://github.com/congwang/linuxLinus Torvalds1-2/+2
Pull kmap_atomic cleanup from Cong Wang. It's been in -next for a long time, and it gets rid of the (no longer used) second argument to k[un]map_atomic(). Fix up a few trivial conflicts in various drivers, and do an "evil merge" to catch some new uses that have come in since Cong's tree. * 'kmap_atomic' of git://github.com/congwang/linux: (59 commits) feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename] drbd: remove the second argument of k[un]map_atomic() zcache: remove the second argument of k[un]map_atomic() gma500: remove the second argument of k[un]map_atomic() dm: remove the second argument of k[un]map_atomic() tomoyo: remove the second argument of k[un]map_atomic() sunrpc: remove the second argument of k[un]map_atomic() rds: remove the second argument of k[un]map_atomic() net: remove the second argument of k[un]map_atomic() mm: remove the second argument of k[un]map_atomic() lib: remove the second argument of k[un]map_atomic() power: remove the second argument of k[un]map_atomic() kdb: remove the second argument of k[un]map_atomic() udf: remove the second argument of k[un]map_atomic() ubifs: remove the second argument of k[un]map_atomic() squashfs: remove the second argument of k[un]map_atomic() reiserfs: remove the second argument of k[un]map_atomic() ocfs2: remove the second argument of k[un]map_atomic() ntfs: remove the second argument of k[un]map_atomic() ...
2012-03-20switch touch_atime to struct pathAl Viro1-1/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20vfs: check i_nlink limits in vfs_{mkdir,rename_dir,link}Al Viro1-0/+13
New field of struct super_block - ->s_max_links. Maximal allowed value of ->i_nlink or 0; in the latter case all checks still need to be done in ->link/->mkdir/->rename instances. Note that this limit applies both to directoris and to non-directories. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20fs: remove the second argument of k[un]map_atomic()Cong Wang1-2/+2
Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-19Merge branch 'dcache-word-accesses'Linus Torvalds1-0/+122
* branch 'dcache-word-accesses': vfs: use 'unsigned long' accesses for dcache name comparison and hashing This does the name hashing and lookup using word-sized accesses when that is efficient, namely on x86 (although any little-endian machine with good unaligned accesses would do). It does very much depend on little-endian logic, but it's a very hot couple of functions under some real loads, and this patch improves the performance of __d_lookup_rcu() and link_path_walk() by up to about 30%. Giving a 10% improvement on some very pathname-heavy benchmarks. Because we do make unaligned accesses past the filename, the optimization is disabled when CONFIG_DEBUG_PAGEALLOC is active, and we effectively depend on the fact that on x86 we don't really ever have the last page of usable RAM followed immediately by any IO memory (due to ACPI tables, BIOS buffer areas etc). Some of the bit operations we do are a bit "subtle". It's commented, but you do need to really think about the code. Or just consider it black magic. Thanks to people on G+ for some of the optimized bit tricks.
2012-03-10vfs: fix return value from do_last()Miklos Szeredi1-1/+1
complete_walk() returns either ECHILD or ESTALE. do_last() turns this into ECHILD unconditionally. If not in RCU mode, this error will reach userspace which is complete nonsense. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-10vfs: fix double put after complete_walk()Miklos Szeredi1-1/+1
complete_walk() already puts nd->path, no need to do it again at cleanup time. This would result in Oopses if triggered, apparently the codepath is not too well exercised. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-08vfs: use 'unsigned long' accesses for dcache name comparison and hashingLinus Torvalds1-0/+122
Ok, this is hacky, and only works on little-endian machines with goo unaligned handling. And even then only with CONFIG_DEBUG_PAGEALLOC disabled, since it can access up to 7 bytes after the pathname. But it runs like a bat out of hell. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02vfs: export full_name_hash() function to modulesLinus Torvalds1-0/+1
Commit 5707c87f "vfs: uninline full_name_hash()" broke the modular build, because it needs exporting now that it isn't inlined any more. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02vfs: split up name hashing in link_path_walk() into helper functionLinus Torvalds1-18/+34
The code in link_path_walk() that finds out the length and the hash of the next path component is some of the hottest code in the kernel. And I have a version of it that does things at the full width of the CPU wordsize at a time, but that means that we *really* want to split it up into a separate helper function. So this re-organizes the code a bit and splits the hashing part into a helper function called "hash_name()". It returns the length of the pathname component, while at the same time computing and writing the hash to the appropriate location. The code generation is slightly changed by this patch, but generally for the better - and the added abstraction actually makes the code easier to read too. And the new interface is well suited for replacing just the "hash_name()" function with alternative implementations. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02vfs: uninline full_name_hash()Linus Torvalds1-4/+9
.. and also use it in lookup_one_len() rather than open-coding it. There aren't any performance-critical users, so inlining it is silly. But it wouldn't matter if it wasn't for the fact that the word-at-a-time dentry name patches want to conditionally replace the function, and uninlining it sets the stage for that. So again, this is a preparatory patch that doesn't change any semantics, and only prepares for a much cleaner and testable word-at-a-time dentry name accessor patch. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-28fs: reduce the use of module.h wherever possiblePaul Gortmaker1-1/+1
For files only using THIS_MODULE and/or EXPORT_SYMBOL, map them onto including export.h -- or if the file isn't even using those, then just delete the include. Fix up any implicit include dependencies that were being masked by module.h along the way. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-13vfs: fix d_inode_lookup() dentry ref leakMiklos Szeredi1-1/+3
d_inode_lookup() leaks a dentry reference on IS_DEADDIR(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-17audit: do not call audit_getname on errorEric Paris1-15/+13
Just a code cleanup really. We don't need to make a function call just for it to return on error. This also makes the VFS function even easier to follow and removes a conditional on a hot path. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-06Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into ZAl Viro1-9/+9
2012-01-03vfs: move mnt_mountpoint to struct mountAl Viro1-2/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: now it can be done - make mnt_parent point to struct mountAl Viro1-11/+13
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: mnt_parent moved to struct mountAl Viro1-2/+2
the second victim... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: spread struct mount - __lookup_mnt() resultAl Viro1-6/+7
switch __lookup_mnt() to returning struct mount *; callers adjusted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03switch open and mkdir syscalls to umode_tAl Viro1-3/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03switch may_mknod() to umode_tAl Viro1-1/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03switch ->mknod() to umode_tAl Viro1-1/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03switch ->create() to umode_tAl Viro1-1/+1
vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03switch vfs_mkdir() and ->mkdir() to umode_tAl Viro1-1/+1
vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03switch sys_mknodat(2) to umode_tAl Viro1-2/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-11-07VFS: we need to set LOOKUP_JUMPED on mountpoint crossingAl Viro1-1/+15
Mountpoint crossing is similar to following procfs symlinks - we do not get ->d_revalidate() called for dentry we have arrived at, with unpleasant consequences for NFS4. Simple way to reproduce the problem in mainline: cat >/tmp/a.c <<'EOF' #include <unistd.h> #include <fcntl.h> #include <stdio.h> main() { struct flock fl = {.l_type = F_RDLCK, .l_whence = SEEK_SET, .l_len = 1}; if (fcntl(0, F_SETLK, &fl)) perror("setlk"); } EOF cc /tmp/a.c -o /tmp/test then on nfs4: mount --bind file1 file2 /tmp/test < file1 # ok /tmp/test < file2 # spews "setlk: No locks available"... What happens is the missing call of ->d_revalidate() after mountpoint crossing and that's where NFS4 would issue OPEN request to server. The fix is simple - treat mountpoint crossing the same way we deal with following procfs-style symlinks. I.e. set LOOKUP_JUMPED... Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02readlinkat: ensure we return ENOENT for the empty pathname for normal lookupsAndy Whitcroft1-5/+13
Since the commit below which added O_PATH support to the *at() calls, the error return for readlink/readlinkat for the empty pathname has switched from ENOENT to EINVAL: commit 65cfc6722361570bfe255698d9cd4dccaf47570d Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sun Mar 13 15:56:26 2011 -0400 readlinkat(), fchownat() and fstatat() with empty relative pathnames This is both unexpected for userspace and makes readlink/readlinkat inconsistant with all other interfaces; and inconsistant with our stated return for these pathnames. As the readlinkat call does not have a flags parameter we cannot use the AT_EMPTY_PATH approach used in the other calls. Therefore expose whether the original path is infact entry via a new user_path_at_empty() path lookup function. Use this to determine whether to default to EINVAL or ENOENT for failures. Addresses http://bugs.launchpad.net/bugs/817187 [akpm@linux-foundation.org: remove unused getname_flags()] Signed-off-by: Andy Whitcroft <apw@canonical.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28leases: fix write-open/read-lease raceJ. Bruce Fields1-4/+1
In setlease, we use i_writecount to decide whether we can give out a read lease. In open, we break leases before incrementing i_writecount. There is therefore a window between the break lease and the i_writecount increment when setlease could add a new read lease. This would leave us with a simultaneous write open and read lease, which shouldn't happen. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>