summaryrefslogtreecommitdiffstats
path: root/fs/namei.c
AgeCommit message (Collapse)AuthorFilesLines
2020-04-02link_path_walk(): sample parent's i_uid and i_mode for the last componentAl Viro1-10/+7
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02__nd_alloc_stack(): make it return boolAl Viro1-27/+18
... and adjust the caller (reserve_stack()). Rename to nd_alloc_stack(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02reserve_stack(): switch to __nd_alloc_stack()Al Viro1-11/+8
expand the call of nd_alloc_stack() into it (and don't recheck the depth on the second call) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02pick_link(): take reserving space on stack into a new helperAl Viro1-21/+25
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02pick_link(): more straightforward handling of allocation failuresAl Viro1-8/+7
pick_link() needs to push onto stack; we start with using two-element array embedded into struct nameidata and the first time we need more than that we switch to separately allocated array. Allocation can fail, of course, and handling of that would be simple enough - we need to drop 'link' and bugger off. However, the things get more complicated in RCU mode. There we must do GFP_ATOMIC allocation. If that fails, we try to switch to non-RCU mode and repeat the allocation. To switch to non-RCU mode we need to grab references to 'link' and to everything in nameidata. The latter done by unlazy_walk(); the former - legitimize_path(). 'link' must go first - after unlazy_walk() we are out of RCU-critical period and it's too late to call legitimize_path() since the references in link->mnt and link->dentry might be pointing to freed and reused memory. So we do legitimize_path(), then unlazy_walk(). And that's where it gets too subtle: what to do if the former fails? We MUST do path_put(link) to avoid leaks. And we can't do that under rcu_read_lock(). Solution in mainline was to empty then nameidata manually, drop out of RCU mode and then do put_path(). In effect, we open-code the things eventual terminate_walk() would've done on error in RCU mode. That looks badly out of place and confusing. We could add a comment along the lines of the explanation above, but... there's a simpler solution. Call unlazy_walk() even if legitimaze_path() fails. It will take us out of RCU mode, so we'll be able to do path_put(link). Yes, it will do unnecessary work - attempt to grab references on the stuff in nameidata, only to have them dropped as soon as we return the error to upper layer and get terminate_walk() called there. So what? We are thoroughly off the fast path by that point - we had GFP_ATOMIC allocation fail, we had ->d_seq or mount_lock mismatch and we are about to try walking the same path from scratch in non-RCU mode. Which will need to do the same allocation, this time with GFP_KERNEL, so it will be able to apply memory pressure for blocking stuff. Compared to that the cost of several lockref_get_not_dead() is noise. And the logics become much easier to understand that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02fold path_to_nameidata() into its only remaining callerAl Viro1-13/+6
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02pick_link(): pass it struct path already with normal refcounting rulesAl Viro1-6/+6
step_into() tries to avoid grabbing and dropping mount references on the steps that do not involve crossing mountpoints (which is obviously the majority of cases). So it uses a local struct path with unusual refcounting rules - path.mnt is pinned if and only if it's not equal to nd->path.mnt. We used to have similar beasts all over the place and we had quite a few bugs crop up in their handling - it's easy to get confused when changing e.g. cleanup on failure exits (or adding a new check, etc.) Now that's mostly gone - the step_into() instance (which is what we need them for) is the only one left. It is exposed to mount traversal and it's (shortly) seen by pick_link(). Since pick_link() needs to store it in link stack, where the normal rules apply, it has to make sure that mount is pinned regardless of nd->path.mnt value. That's done on all calls of pick_link() and very early in those. Let's do that in the caller (step_into()) instead - that way the fewer places need to be aware of such struct path instances. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02fs/namei.c: kill follow_mount()Al Viro1-20/+2
The only remaining caller (path_pts()) should be using follow_down() anyway. And clean path_pts() a bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02non-RCU analogue of the previous commitAl Viro1-17/+39
new helper: choose_mountpoint(). Wrapper around choose_mountpoint_rcu(), similar to lookup_mnt() vs. __lookup_mnt(). follow_dotdot() switched to it. Now we don't grab mount_lock exclusive anymore; note that the primitive used non-RCU mount traversals in other direction (lookup_mnt()) doesn't bother with that either - it uses mount_lock seqcount instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02helper for mount rootwards traversalAl Viro1-16/+24
The loops in follow_dotdot{_rcu()} are doing the same thing: we have a mount and we want to find out how far up the chain of mounts do we need to go. We follow the chain of mount until we find one that is not directly overmounting the root of another mount. If such a mount is found, we want the location it's mounted upon. If we run out of chain (i.e. get to a mount that is not mounted on anything else) or run into process' root, we report failure. On success, we want (in RCU case) d_seq of resulting location sampled or (in non-RCU case) references to that location acquired. This commit introduces such primitive for RCU case and switches follow_dotdot_rcu() to it; non-RCU case will be go in the next commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02follow_dotdot(): be lazy about changing nd->pathAl Viro1-5/+13
Change nd->path only after the loop is done and only in case we hadn't ended up finding ourselves in root. Same for NO_XDEV check. That separates the "check how far back do we need to go through the mount stack" logics from the rest of .. traversal. NOTE: path_get/path_put introduced here are temporary. They will go away later in the series. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02follow_dotdot_rcu(): be lazy about changing nd->pathAl Viro1-15/+20
Change nd->path only after the loop is done and only in case we hadn't ended up finding ourselves in root. Same for NO_XDEV check. Don't recheck mount_lock on each step either. That separates the "check how far back do we need to go through the mount stack" logics from the rest of .. traversal. Note that the sequence for d_seq/d_inode here is * sample mount_lock seqcount ... * sample d_seq * fetch d_inode * verify mount_lock seqcount The last step makes sure that d_inode value we'd got matches d_seq - it dentry is guaranteed to have been a mountpoint through the entire thing, so its d_inode must have been stable. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02follow_dotdot{,_rcu}(): massage loopsAl Viro1-32/+45
The logics in both of them is the same: while true if in process' root // uncommon break if *not* in mount root // normal case find the parent return if at absolute root // very uncommon break move to underlying mountpoint report that we are in root Pull the common path out of the loop: if in process' root // uncommon goto in_root if unlikely(in mount root) while true if at absolute root goto in_root move to underlying mountpoint if in process' root goto in_root if in mount root break; find the parent // we are not in mount root return in_root: report that we are in root The reason for that transformation is that we get to keep the common path straight *and* get a separate block for "move through underlying mountpoints", which will allow to sanitize NO_XDEV handling there. What's more, the pared-down loops will be easier to deal with - in particular, non-RCU case has no need to grab mount_lock and rewriting it to the form that wouldn't do that is a non-trivial change. Better do that with less stuff getting in the way... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcuAl Viro1-34/+37
lift step_into() into handle_dots() (where they merge with each other); have follow_... return dentry and pass inode/seq to the caller. [braino fix folded; kudos to Qian Cai <cai@lca.pw> for reporting it] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13follow_dotdot{,_rcu}(): switch to use of step_into()Al Viro1-24/+7
gets the regular mount crossing on result of .. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13handle_dots(), follow_dotdot{,_rcu}(): preparation to switch to step_into()Al Viro1-27/+25
Right now the tail ends of follow_dotdot{,_rcu}() are pretty much the open-coded analogues of step_into(). The differences: * the lack of proper LOOKUP_NO_XDEV handling in non-RCU case (arguably a bug) * the lack of ->d_manage() handling (again, arguably a bug) Adjust the calling conventions so that on the next step with could just switch those functions to returning step_into(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13move handle_dots(), follow_dotdot() and follow_dotdot_rcu() past step_into()Al Viro1-130/+130
pure move; we are going to have step_into() called by that bunch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks out of loopAl Viro1-10/+10
Behaviour change: LOOKUP_BENEATH lookup of .. in absolute root yields an error even if it's not the process' root. That's possible only if you'd managed to escape chroot jail by way of procfs symlinks, but IMO the resulting behaviour is not worse - more consistent and easier to describe: ".." in root is "stay where you are", uness LOOKUP_BENEATH has been given, in which case it's "fail with EXDEV". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loopAl Viro1-8/+12
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13expand path_parent_directory() in its callersAl Viro1-18/+11
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13path_parent_directory(): leave changing path->dentry to callersAl Viro1-15/+19
Instead of returning 0, return new dentry; instead of returning -ENOENT, return NULL. Adjust the callers accordingly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13path_connected(): pass mount and dentry separatelyAl Viro1-7/+5
eventually we'll want to do that check *before* mangling nd->path.dentry... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13split the lookup-related parts of do_last() into a separate helperAl Viro1-22/+29
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} caseAl Viro1-10/+4
... getting may_create_in_sticky() checks in FMODE_OPENED case as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13do_last(): simplify the liveness analysis past finish_open_createdAl Viro1-17/+11
Don't mess with got_write there - it is guaranteed to be false on entry and it will be set true if and only if we decide to go for truncation and manage to get write access for that. Don't carry acc_mode through the entire thing - it's only used in that part. And don't bother with gotos in there - compiler is quite capable of optimizing that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} caseAl Viro1-13/+8
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13do_last(): don't bother with keeping got_write in FMODE_OPENED caseAl Viro1-20/+11
it's easier to drop it right after lookup_open() and regain if needed (i.e. if we will need to truncate). On the non-FMODE_OPENED path we do that anyway. In case of FMODE_CREATED we won't be needing it. And it's easier to prove correctness that way, especially since the initial failure to get write access is not always fatal; proving that we'll never end up truncating in that case is rather convoluted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13do_last(): merge the may_open() callsAl Viro1-7/+3
have FMODE_OPENED case rejoin the main path at earlier point Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13atomic_open(): lift the call of may_open() into do_last()Al Viro1-15/+11
there we'll be able to merge it with its counterparts in other cases, and there's no reason to do it before the parent has been unlocked Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13atomic_open(): return the right dentry in FMODE_OPENED caseAl Viro1-1/+5
->atomic_open() might have used a different alias than the one we'd passed to it; in "not opened" case we take care of that, in "opened" one we don't. Currently we don't care downstream of "opened" case which alias to return; however, that will change shortly when we get to unifying may_open() calls. It's not hard to get right in all cases, anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13new helper: traverse_mounts()Al Viro1-105/+72
common guts of follow_down() and follow_managed() taken to a new helper - traverse_mounts(). The remnants of follow_managed() are folded into its sole remaining caller (handle_mounts()). Calling conventions of handle_mounts() slightly sanitized - instead of the weird "1 for success, -E... for failure" that used to be imposed by the calling conventions of walk_component() et.al. we can use the normal "0 for success, -E... for failure". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13massage __follow_mount_rcu() a bitAl Viro1-35/+35
make the loop more similar to that in follow_managed(), with explicit tracking of flags, etc. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13namei: have link_path_walk() maintain LOOKUP_PARENTAl Viro1-11/+6
set on entry, clear when we get to the last component. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13link_path_walk(): simplify stack handlingAl Viro1-9/+5
We use nd->stack to store two things: pinning down the symlinks we are resolving and resuming the name traversal when a nested symlink is finished. Currently, nd->depth is used to keep track of both. It's 0 when we call link_path_walk() for the first time (for the pathname itself) and 1 on all subsequent calls (for trailing symlinks, if any). That's fine, as far as pinning symlinks goes - when handling a trailing symlink, the string we are interpreting is the body of symlink pinned down in nd->stack[0]. It's rather inconvenient with respect to handling nested symlinks, though - when we run out of a string we are currently interpreting, we need to decide whether it's a nested symlink (in which case we need to pick the string saved back when we started to interpret that nested symlink and resume its traversal) or not (in which case we are done with link_path_walk()). Current solution is a bit of a kludge - in handling of trailing symlink (in lookup_last() and open_last_lookups() we clear nd->stack[0].name. That allows link_path_walk() to use the following rules when running out of a string to interpret: * if nd->depth is zero, we are at the end of pathname itself. * if nd->depth is positive, check the saved string; for nested symlink it will be non-NULL, for trailing symlink - NULL. It works, but it's rather non-obvious. Note that we have two sets: the set of symlinks currently being traversed and the set of postponed pathname tails. The former is stored in nd->stack[0..nd->depth-1].link and it's valid throught the pathname resolution; the latter is valid only during an individual call of link_path_walk() and it occupies nd->stack[0..nd->depth-1].name for the first call of link_path_walk() and nd->stack[1..nd->depth-1].name for subsequent ones. The kludge is basically a way to recognize the second set becoming empty. The things get simpler if we keep track of the second set's size explicitly and always store it in nd->stack[0..depth-1].name. We access the second set only inside link_path_walk(), so its size can live in a local variable; that way the check becomes trivial without the need of that kludge. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13pick_link(): check for WALK_TRAILING, not LOOKUP_PARENTAl Viro1-5/+5
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13namei: invert the meaning of WALK_FOLLOWAl Viro1-6/+6
old flags & WALK_FOLLOW <=> new !(flags & WALK_TRAILING) That's what that flag had really been used for. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13sanitize handling of nd->last_type, kill LAST_BINDAl Viro1-2/+1
->last_type values are set in 3 places: path_init() (sets to LAST_ROOT), link_path_walk (LAST_NORM/DOT/DOTDOT) and pick_link (LAST_BIND). The are checked in walk_component(), lookup_last() and do_last(). They also get copied to the caller by filename_parentat(). In the last 3 cases the value is what we had at the return from link_path_walk(). In case of walk_component() it's either directly downstream from assignment in link_path_walk() or, when called by lookup_last(), the value we have at the return from link_path_walk(). The value at the entry into link_path_walk() can survive to return only if the pathname contains nothing but slashes. Note that pick_link() never returns such - pure jumps are handled directly. So for the calls of link_path_walk() for trailing symlinks it does not matter what value had been there at the entry; the value at the return won't depend upon it. There are 3 call chains that might have pick_link() storing LAST_BIND: 1) pick_link() from step_into() from walk_component() from link_path_walk(). In that case we will either be parsing the next component immediately after return into link_path_walk(), which will overwrite the ->last_type before anyone has a chance to look at it, or we'll fail, in which case nobody will be looking at ->last_type at all. 2) pick_link() from step_into() from walk_component() from lookup_last(). The value is never looked at due to the above; it won't affect the value seen at return from any link_path_walk(). 3) pick_link() from step_into() from do_last(). Ditto. In other words, assignemnt in pick_link() is pointless, and so is LAST_BIND itself; nothing ever looks at that value. Kill it off. And make link_path_walk() _always_ assign ->last_type - in the only case when the value at the entry might survive to the return that value is always LAST_ROOT, inherited from path_init(). Move that assignment from path_init() into the beginning of link_path_walk(), to consolidate the things. Historical note: LAST_BIND used to be used for the kludge with trailing pure jump symlinks (extra iteration through the top-level loop). No point keeping it anymore... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13finally fold get_link() into pick_link()Al Viro1-74/+61
kill nd->link_inode, while we are at it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13merging pick_link() with get_link(), part 6Al Viro1-8/+5
move the only remaining call of get_link() into pick_link() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13merging pick_link() with get_link(), part 5Al Viro1-25/+18
move get_link() call into step_into(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13merging pick_link() with get_link(), part 4Al Viro1-33/+26
Move the call of get_link() into walk_component(). Change the calling conventions for walk_component() to returning the link body to follow (if any). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13merging pick_link() with get_link(), part 3Al Viro1-9/+9
After a pure jump ("/" or procfs-style symlink) we don't need to hold the link anymore. link_path_walk() dropped it if such case had been detected, lookup_last/do_last() (i.e. old trailing_symlink()) left it on the stack - it ended up calling terminate_walk() shortly anyway, which would've purged the entire stack. Do it in get_link() itself instead. Simpler logics that way... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13merging pick_link() with get_link(), part 2Al Viro1-28/+40
Fold trailing_symlink() into lookup_last() and do_last(), change the calling conventions of those two. Rules change: success, we are done => NULL instead of 0 error => ERR_PTR(-E...) instead of -E... got a symlink to follow => return the path to be followed instead of 1 The loops calling those (in path_lookupat() and path_openat()) adjusted. A subtle change of control flow here: originally a pure-jump trailing symlink ("/" or procfs one) would've passed through the upper level loop once more, with "" for path to traverse. That would've brought us back to the lookup_last/do_last entry and we would've hit LAST_BIND case (LAST_BIND left from get_link() called by trailing_symlink()) and pretty much skip to the point right after where we'd left the sucker back when we picked that trailing symlink. Now we don't bother with that extra pass through the upper level loop - if get_link() says "I've just done a pure jump, nothing else to do", we just treat that as non-symlink case. Boilerplate added on that step will go away shortly - it'll migrate into walk_component() and then to step_into(), collapsing into the change of calling conventions for those. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13merging pick_link() with get_link(), part 1Al Viro1-5/+7
Move restoring LOOKUP_PARENT and zeroing nd->stack.name[0] past the call of get_link() (nothing _currently_ uses them in there). That allows to moved the call of may_follow_link() into get_link() as well, since now the presence of LOOKUP_PARENT distinguishes the callers from each other (link_path_walk() has it, trailing_symlink() doesn't). Preparations for folding trailing_symlink() into callers (lookup_last() and do_last()) and changing the calling conventions of those. Next stage after that will have get_link() call migrate into walk_component(), then - into step_into(). It's tricky enough to warrant doing that in stages, unfortunately... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13expand the only remaining call of path_lookup_conditional()Al Viro1-9/+5
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()Al Viro1-83/+6
New LOOKUP flag, telling path_lookupat() to act as path_mountpointat(). IOW, traverse mounts at the final point and skip revalidation of the location where it ends up. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13fold handle_mounts() into step_into()Al Viro1-26/+15
The following is true: * calls of handle_mounts() and step_into() are always paired in sequences like err = handle_mounts(nd, dentry, &path, &inode, &seq); if (unlikely(err < 0)) return err; err = step_into(nd, &path, flags, inode, seq); * in all such sequences path is uninitialized before and unused after this pair of calls * in all such sequences inode and seq are unused afterwards. So the call of handle_mounts() can be shifted inside step_into(), turning 'path' into a local variable in the combined function. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13new step_into() flag: WALK_NOFOLLOWAl Viro1-6/+4
Tells step_into() not to follow symlinks, regardless of LOOKUP_FOLLOW. Allows to switch handle_lookup_down() to of step_into(), getting all follow_managed() and step_into() calls paired. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13step_into() callers: dismiss the symlink earlierAl Viro1-3/+7
We need to dismiss a symlink when we are done traversing it; currently that's done when we call step_into() for its last component. For the cases when we do not call step_into() for that component (i.e. when it's . or ..) we do the same symlink dismissal after the call of handle_dots(). What we need to guarantee is that the symlink won't be dismissed while we are still using nd->last.name - it's pointing into the body of said symlink. step_into() is sufficiently late - by the time it's called we'd already obtained the dentry, so the name we'd been looking up is no longer needed. However, it turns out to be cleaner to have that ("we are done with that component now, can dismiss the link") done explicitly - in the callers of step_into(). In handle_dots() case we won't be using the component string at all, so for . and .. the corresponding point is actually _before_ the call of handle_dots(), not after it. Fix a minor irregularity in do_last(), while we are at it - if trailing symlink ended with . or .. we forgot to dismiss it. Not a problem, since nameidata is about to be done with (neither . nor .. can be a trailing symlink, so this is the last iteration through the loop) and terminate_walk() will clean the stack anyway, but let's keep it more regular. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13lookup_fast(): take mount traversal into callersAl Viro1-26/+24
Current calling conventions: -E... on error, 0 on cache miss, result of handle_mounts(nd, dentry, path, inode, seqp) on success. Turn that into returning ERR_PTR(-E...), NULL and dentry resp.; deal with handle_mounts() in the callers. The thing is, they already do that in cache miss handling case, so we just need to supply dentry to them and unify the mount traversal in those cases. Fewer arguments that way, and we get closer to merging handle_mounts() and step_into(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>