summaryrefslogtreecommitdiffstats
path: root/fs/nsfs.c
AgeCommit message (Collapse)AuthorFilesLines
2019-04-09nsfs: unobfuscateAl Viro1-13/+10
1) IS_ERR(p) && PTR_ERR(p) == -E... is spelled p == ERR_PTR(-E...) 2) yes, you can open-code do-while and sometimes there's even a good reason to do so. Not in this case, though. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-04-09dcache: sort the freeing-without-RCU-delay mess for good.Al Viro1-2/+1
For lockless accesses to dentries we don't have pinned we rely (among other things) upon having an RCU delay between dropping the last reference and actually freeing the memory. On the other hand, for things like pipes and sockets we neither do that kind of lockless access, nor want to deal with the overhead of an RCU delay every time a socket gets closed. So delay was made optional - setting DCACHE_RCUACCESS in ->d_flags made sure it would happen. We tried to avoid setting it unless we knew we need it. Unfortunately, that had led to recurring class of bugs, in which we missed the need to set it. We only really need it for dentries that are created by d_alloc_pseudo(), so let's not bother with trying to be smart - just make having an RCU delay the default. The ones that do *not* get it set the replacement flag (DCACHE_NORCU) and we'd better use that sparingly. d_alloc_pseudo() is the only such user right now. FWIW, the race that finally prompted that switch had been between __lock_parent() of immediate subdirectory of what's currently the root of a disconnected tree (e.g. from open-by-handle in progress) racing with d_splice_alias() elsewhere picking another alias for the same inode, either on outright corrupted fs image, or (in case of open-by-handle on NFS) that subdirectory having been just moved on server. It's not easy to hit, so the sky is not falling, but that's not the first race on similar missed cases and the logics for settinf DCACHE_RCUACCESS has gotten ridiculously convoluted. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-02-15net: Export open_related_ns()Kirill Tkhai1-0/+1
This function will be used to obtain net of tun device. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-31nsfs: generalize ns_get_path() for path resolution with a taskJakub Kicinski1-3/+26
ns_get_path() takes struct task_struct and proc_ns_ops as its parameters. For path resolution directly from a namespace, e.g. based on a networking device's net name space, we need more flexibility. Add a ns_get_path_cb() helper which will allow callers to use any method of obtaining the name space reference. Convert ns_get_path() to use ns_get_path_cb(). Following patches will bring a networking user. CC: Eric W. Biederman <ebiederm@xmission.com> Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-27Rename superblock flags (MS_xyz -> SB_xyz)Linus Torvalds1-1/+1
This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman1-0/+1
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-06VFS: Provide empty name qstrDavid Howells1-2/+1
Provide an empty name (ie. "") qstr for general use. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-08ns: allow ns_entries to have custom symlink contentKirill Tkhai1-1/+3
Patch series "Expose task pid_ns_for_children to userspace". pid_ns_for_children set by a task is known only to the task itself, and it's impossible to identify it from outside. It's a big problem for checkpoint/restore software like CRIU, because it can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of their work. If they have a custom pid_ns_for_children before dump, they must have the same ns after restore. Otherwise, restored task bumped into enviroment it does not expect. This patchset solves the problem. It exposes pid_ns_for_children to ns directory in standard way with the name "pid_for_children": ~# ls /proc/5531/ns -l | grep pid lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836] lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286] This patch (of 2): Make possible to have link content prefix yyy different from the link name xxx: $ readlink /proc/[pid]/ns/xxx yyy:[4026531838] This will be used in next patch. Link: http://lkml.kernel.org/r/149201120318.6007.7362655181033883000.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-19nsfs: mark dentry with DCACHE_RCUACCESSCong Wang1-0/+1
Andrey reported a use-after-free in __ns_get_path(): spin_lock include/linux/spinlock.h:299 [inline] lockref_get_not_dead+0x19/0x80 lib/lockref.c:179 __ns_get_path+0x197/0x860 fs/nsfs.c:66 open_related_ns+0xda/0x200 fs/nsfs.c:143 sock_ioctl+0x39d/0x440 net/socket.c:1001 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 We are under rcu read lock protection at that point: rcu_read_lock(); d = atomic_long_read(&ns->stashed); if (!d) goto slow; dentry = (struct dentry *)d; if (!lockref_get_not_dead(&dentry->d_lockref)) goto slow; rcu_read_unlock(); but don't use a proper RCU API on the free path, therefore a parallel __d_free() could free it at the same time. We need to mark the stashed dentry with DCACHE_RCUACCESS so that __d_free() will be called after all readers leave RCU. Fixes: e149ed2b805f ("take the targets of /proc/*/ns/* symlinks to separate fs") Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03nsfs: Add an ioctl() to return owner UID of a usernsMichael Kerrisk (man-pages)1-0/+11
I'd like to write code that discovers the user namespace hierarchy on a running system, and also shows who owns the various user namespaces. Currently, there is no way of getting the owner UID of a user namespace. Therefore, this patch adds a new NS_GET_CREATOR_UID ioctl() that fetches the UID (as seen in the user namespace of the caller) of the creator of the user namespace referred to by the specified file descriptor. If the supplied file descriptor does not refer to a user namespace, the operation fails with the error EINVAL. If the owner UID does not have a mapping in the caller's user namespace return the overflow UID as that appears easier to deal with in practice in user-space applications. -- EWB Changed the handling of unmapped UIDs from -EOVERFLOW back to the overflow uid. Per conversation with Michael Kerrisk after examining his test code. Acked-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Michael Kerrisk <mtk-manpages@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-01-25nsfs: Add an ioctl() to return the namespace typeMichael Kerrisk (man-pages)1-0/+2
Linux 4.9 added two ioctl() operations that can be used to discover: * the parental relationships for hierarchical namespaces (user and PID) [NS_GET_PARENT] * the user namespaces that owns a specified non-user-namespace [NS_GET_USERNS] For no good reason that I can glean, NS_GET_USERNS was made synonymous with NS_GET_PARENT for user namespaces. It might have been better if NS_GET_USERNS had returned an error if the supplied file descriptor referred to a user namespace, since it suggests that the caller may be confused. More particularly, if it had generated an error, then I wouldn't need the new ioctl() operation proposed here. (On the other hand, what I propose here may be more generally useful.) I would like to write code that discovers namespace relationships for the purpose of understanding the namespace setup on a running system. In particular, given a file descriptor (or pathname) for a namespace, N, I'd like to obtain the corresponding user namespace. Namespace N might be a user namespace (in which case my code would just use N) or a non-user namespace (in which case my code will use NS_GET_USERNS to get the user namespace associated with N). The problem is that there is no way to tell the difference by looking at the file descriptor (and if I try to use NS_GET_USERNS on an N that is a user namespace, I get the parent user namespace of N, which is not what I want). This patch therefore adds a new ioctl(), NS_GET_NSTYPE, which, given a file descriptor that refers to a user namespace, returns the namespace type (one of the CLONE_NEW* constants). Signed-off-by: Michael Kerrisk <mtk-manpages@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-10-31net: add an ioctl to get a socket network namespaceAndrey Vagin1-1/+1
Each socket operates in a network namespace where it has been created, so if we want to dump and restore a socket, we have to know its network namespace. We have a socket_diag to get information about sockets, it doesn't report sockets which are not bound or connected. This patch introduces a new socket ioctl, which is called SIOCGSKNS and used to get a file descriptor for a socket network namespace. A task must have CAP_NET_ADMIN in a target network namespace to use this ioctl. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-10Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: ">rename2() work from Miklos + current_time() from Deepa" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Replace current_fs_time() with current_time() fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps fs: Replace CURRENT_TIME with current_time() for inode timestamps fs: proc: Delete inode time initializations in proc_alloc_inode() vfs: Add current_time() api vfs: add note about i_op->rename changes to porting fs: rename "rename2" i_op to "rename" vfs: remove unused i_op->rename fs: make remaining filesystems use .rename2 libfs: support RENAME_NOREPLACE in simple_rename() fs: support RENAME_NOREPLACE for local filesystems ncpfs: fix unused variable warning
2016-09-27fs: Replace CURRENT_TIME with current_time() for inode timestampsDeepa Dinamani1-1/+1
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_time() instead. CURRENT_TIME is also not y2038 safe. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. As part of the effort current_time() will be extended to do range checks. Hence, it is necessary for all file system timestamps to use current_time(). Also, current_time() will be transitioned along with vfs to be y2038 safe. Note that whenever a single call to current_time() is used to change timestamps in different inodes, it is because they share the same time granularity. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Felipe Balbi <balbi@kernel.org> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-22nsfs: Simplify __ns_get_pathEric W. Biederman1-5/+2
Move mntget from the very beginning of __ns_get_path to the success path of __ns_get_path, and remove the mntget calls. This removes the possibility that there will be a mntget/mntput pair of __ns_get_path has to retry, and generally simplifies the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-09-22nsfs: add ioctl to get a parent namespaceAndrey Vagin1-0/+4
Pid and user namepaces are hierarchical. There is no way to discover parent-child relationships. In a future we will use this interface to dump and restore nested namespaces. Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-09-22nsfs: add ioctl to get an owning user namespace for ns file descriptorAndrey Vagin1-13/+83
Each namespace has an owning user namespace and now there is not way to discover these relationships. Understending namespaces relationships allows to answer the question: what capability does process X have to perform operations on a resource governed by namespace Y? After a long discussion, Eric W. Biederman proposed to use ioctl-s for this purpose. The NS_GET_USERNS ioctl returns a file descriptor to an owning user namespace. It returns EPERM if a target namespace is outside of a current user namespace. v2: rename parent to relative v3: Add a missing mntput when returning -EAGAIN --EWB Acked-by: Serge Hallyn <serge@hallyn.com> Link: https://lkml.org/lkml/2016/7/6/158 Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2015-09-11fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to voidJoe Perches1-1/+2
The seq_<foo> function return values were frequently misused. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") All uses of these return values have been removed, so convert the return types to void. Miscellanea: o Move seq_put_decimal_<type> and seq_escape prototypes closer the other seq_vprintf prototypes o Reorder seq_putc and seq_puts to return early on overflow o Add argument names to seq_vprintf and seq_printf o Update the seq_escape kernel-doc o Convert a couple of leading spaces to tabs in seq_escape Signed-off-by: Joe Perches <joe@perches.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mark Brown <broonie@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-11nsfs: Add a show_path method to fix mountinfoEric W. Biederman1-0/+10
Today mountinfo displays a very unhelpful "/" for nsfs files. Add a show_path method returning the same string as ns_dname. This results in a bind mount of /proc/<pid>/ns/net showing up in /proc/<pid>/mountinfo as "net:[1234...]" instead of "/". Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-04-15VFS: assorted weird filesystems: d_inode() annotationsDavid Howells1-2/+2
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-10take the targets of /proc/*/ns/* symlinks to separate fsAl Viro1-0/+161
New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now. It's not mountable (not even registered, so it's not in /proc/filesystems, etc.). Files on it *are* bindable - we explicitly permit that in do_loopback(). This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well. get_proc_ns() is a macro now (it's simply returning ->i_private; would have been an inline, if not for header ordering headache). proc_ns_inode() is an ex-parrot. The interface used in procfs is ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops). Dentries and inodes are never hashed; a non-counting reference to dentry is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path() if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details of that mechanism. As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt; it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets from ns_get_path(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>