summaryrefslogtreecommitdiffstats
path: root/fs/sysfs
AgeCommit message (Collapse)AuthorFilesLines
2013-04-05sysfs: check if one entry has been removed before freeingMing Lei1-1/+8
It might be a kernel disaster if one sysfs entry is freed but still referenced by sysfs tree. Recently Dave and Sasha reported one use-after-free problem on sysfs entry, and the problem has been troubleshooted with help of debug message added in this patch. Given sysfs_get_dirent/sysfs_put are exported APIs, even inside sysfs they are called in many contexts(kobject/attribe add/delete, inode init/drop, dentry lookup/release, readdir, ...), it is healthful to check the removed flag before freeing one entry and dump message if it is freeing without being removed first. Cc: Dave Jones <davej@redhat.com> Cc: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-03sysfs: fix use after free in case of concurrent read/write and readdirMing Lei1-4/+11
The inode->i_mutex isn't hold when updating filp->f_pos in read()/write(), so the filp->f_pos might be read as 0 or 1 in readdir() when there is concurrent read()/write() on this same file, then may cause use after free in readdir(). The bug can be reproduced with Li Zefan's test code on the link: https://patchwork.kernel.org/patch/2160771/ This patch fixes the use after free under this situation. Cc: stable <stable@vger.kernel.org> Reported-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-01Merge v3.9-rc5 into driver-core-nextGreg Kroah-Hartman2-1/+20
We want the fixes in here. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-28Merge tag 'driver-core-3.9-rc4' of ↵Linus Torvalds1-1/+16
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull sysfs fixes from Greg Kroah-Hartman: "Here are two fixes for sysfs that resolve issues that have been found by the Trinity fuzz tool, causing oopses in sysfs. They both have been in linux-next for a while to ensure that they do not cause any other problems." * tag 'driver-core-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: sysfs: handle failure path correctly for readdir() sysfs: fix race between readdir and lseek
2013-03-27userns: Restrict when proc and sysfs can be mountedEric W. Biederman1-0/+4
Only allow unprivileged mounts of proc and sysfs if they are already mounted when the user namespace is created. proc and sysfs are interesting because they have content that is per namespace, and so fresh mounts are needed when new namespaces are created while at the same time proc and sysfs have content that is shared between every instance. Respect the policy of who may see the shared content of proc and sysfs by only allowing new mounts if there was an existing mount at the time the user namespace was created. In practice there are only two interesting cases: proc and sysfs are mounted at their usual places, proc and sysfs are not mounted at all (some form of mount namespace jail). Cc: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-25sysfs: use atomic_inc_unless_negative in sysfs_get_activeMaarten Lankhorst1-15/+2
It seems that sysfs has an interesting way of doing the same thing. This removes the cpu_relax unfortunately, but if it's really needed, it would be better to add this to include/linux/atomic.h to benefit all atomic ops users. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-20sysfs: handle failure path correctly for readdir()Ming Lei1-0/+4
In case of 'if (filp->f_pos == 0 or 1)' of sysfs_readdir(), the failure from filldir() isn't handled, and the reference counter of the sysfs_dirent object pointed by filp->private_data will be released without clearing filp->private_data, so use after free bug will be triggered later. This patch returns immeadiately under the situation for fixing the bug, and it is reasonable to return from readdir() when filldir() fails. Reported-by: Dave Jones <davej@redhat.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-20sysfs: fix race between readdir and lseekMing Lei1-1/+12
While readdir() is running, lseek() may set filp->f_pos as zero, then may leave filp->private_data pointing to one sysfs_dirent object without holding its reference counter, so the sysfs_dirent object may be used after free in next readdir(). This patch holds inode->i_mutex to avoid the problem since the lock is always held in readdir path. Reported-by: Dave Jones <davej@redhat.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin1-2/+1
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-26Merge branch 'for-linus' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
2013-02-22new helper: file_inode(file)Al Viro1-3/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-21Merge tag 'driver-core-3.9-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg Kroah-Hartman: "Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL Other than those patches, there's not much here, some minor fixes and updates" Fix up trivial conflicts * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits) base: memory: fix soft/hard_offline_page permissions drivercore: Fix ordering between deferred_probe and exiting initcalls backlight: fix class_find_device() arguments TTY: mark tty_get_device call with the proper const values driver-core: constify data for class_find_device() firmware: Ignore abort check when no user-helper is used firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER firmware: Make user-mode helper optional firmware: Refactoring for splitting user-mode helper code Driver core: treat unregistered bus_types as having no devices watchdog: Convert to devm_ioremap_resource() thermal: Convert to devm_ioremap_resource() spi: Convert to devm_ioremap_resource() power: Convert to devm_ioremap_resource() mtd: Convert to devm_ioremap_resource() mmc: Convert to devm_ioremap_resource() mfd: Convert to devm_ioremap_resource() media: Convert to devm_ioremap_resource() iommu: Convert to devm_ioremap_resource() drm: Convert to devm_ioremap_resource() ...
2013-01-25sysfs: Functions for adding/removing symlinks to/from attribute groupsRafael J. Wysocki3-13/+76
The most convenient way to expose ACPI power resources lists of a device is to put symbolic links to sysfs directories representing those resources into special attribute groups in the device's sysfs directory. For this purpose, it is necessary to be able to add symbolic links to attribute groups. For this reason, add sysfs helper functions for adding/removing symbolic links to/from attribute groups, sysfs_add_link_to_group() and sysfs_remove_link_from_group(), respectively. This change set includes a build fix from David Rientjes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17Revert "sysfs: Convert print_symbol to %pSR"Greg Kroah-Hartman1-2/+2
This reverts commit 6ad58fa82db897b4422a873c01fa41f84b652502 as %pSR isn't in the tree yet. Cc: Joe Perches <joe@perches.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17sysfs: Convert print_symbol to %pSRJoe Perches1-2/+2
Use the new vsprintf extension to avoid any possible message interleaving. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17sysfs: Fixed a trailing white space errorBin Wang1-1/+1
This patch removes the trailing white space in fs/sysfs/mount.c. Signed-off-by: Bin Wang <wbin00@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-17Merge branch 'for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "While small this set of changes is very significant with respect to containers in general and user namespaces in particular. The user space interface is now complete. This set of changes adds support for unprivileged users to create user namespaces and as a user namespace root to create other namespaces. The tyranny of supporting suid root preventing unprivileged users from using cool new kernel features is broken. This set of changes completes the work on setns, adding support for the pid, user, mount namespaces. This set of changes includes a bunch of basic pid namespace cleanups/simplifications. Of particular significance is the rework of the pid namespace cleanup so it no longer requires sending out tendrils into all kinds of unexpected cleanup paths for operation. At least one case of broken error handling is fixed by this cleanup. The files under /proc/<pid>/ns/ have been converted from regular files to magic symlinks which prevents incorrect caching by the VFS, ensuring the files always refer to the namespace the process is currently using and ensuring that the ptrace_mayaccess permission checks are always applied. The files under /proc/<pid>/ns/ have been given stable inode numbers so it is now possible to see if different processes share the same namespaces. Through the David Miller's net tree are changes to relax many of the permission checks in the networking stack to allowing the user namespace root to usefully use the networking stack. Similar changes for the mount namespace and the pid namespace are coming through my tree. Two small changes to add user namespace support were commited here adn in David Miller's -net tree so that I could complete the work on the /proc/<pid>/ns/ files in this tree. Work remains to make it safe to build user namespaces and 9p, afs, ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the Kconfig guard remains in place preventing that user namespaces from being built when any of those filesystems are enabled. Future design work remains to allow root users outside of the initial user namespace to mount more than just /proc and /sys." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits) proc: Usable inode numbers for the namespace file descriptors. proc: Fix the namespace inode permission checks. proc: Generalize proc inode allocation userns: Allow unprivilged mounts of proc and sysfs userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file procfs: Print task uids and gids in the userns that opened the proc file userns: Implement unshare of the user namespace userns: Implent proc namespace operations userns: Kill task_user_ns userns: Make create_new_namespaces take a user_ns parameter userns: Allow unprivileged use of setns. userns: Allow unprivileged users to create new namespaces userns: Allow setting a userns mapping to your current uid. userns: Allow chown and setgid preservation userns: Allow unprivileged users to create user namespaces. userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped userns: fix return value on mntns_install() failure vfs: Allow unprivileged manipulation of the mount namespace. vfs: Only support slave subtrees across different user namespaces vfs: Add a user namespace reference from struct mnt_namespace ...
2012-11-26sysfs: Mark sysfs_attr_ns staticJosh Triplett1-2/+2
Nothing outside of fs/sysfs/file.c references this function, so mark it static. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-20userns: Allow unprivilged mounts of proc and sysfsEric W. Biederman1-0/+1
- The context in which proc and sysfs are mounted have no effect on the the uid/gid of their files so no conversion is needed except allowing the mount. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-10-24sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat()Geert Uytterhoeven1-8/+8
The warning check for duplicate sysfs entries can cause a buffer overflow when printing the warning, as strcat() doesn't check buffer sizes. Use strlcat() instead. Since strlcat() doesn't return a pointer to the passed buffer, unlike strcat(), I had to convert the nested concatenation in sysfs_add_one() to an admittedly more obscure comma operator construct, to avoid emitting code for the concatenation if CONFIG_BUG is disabled. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-04sysfs: Fix comment typo "sysf_create_link".Robert P. J. Day1-1/+1
More pedantry. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-01Merge branch 'for-linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull second vfs pile from Al Viro: "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the deadlock reproduced by xfstests 068), symlink and hardlink restriction patches, plus assorted cleanups and fixes. Note that another fsfreeze deadlock (emergency thaw one) is *not* dealt with - the series by Fernando conflicts a lot with Jan's, breaks userland ABI (FIFREEZE semantics gets changed) and trades the deadlock for massive vfsmount leak; this is going to be handled next cycle. There probably will be another pull request, but that stuff won't be in it." Fix up trivial conflicts due to unrelated changes next to each other in drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) delousing target_core_file a bit Documentation: Correct s_umount state for freeze_fs/unfreeze_fs fs: Remove old freezing mechanism ext2: Implement freezing btrfs: Convert to new freezing mechanism nilfs2: Convert to new freezing mechanism ntfs: Convert to new freezing mechanism fuse: Convert to new freezing mechanism gfs2: Convert to new freezing mechanism ocfs2: Convert to new freezing mechanism xfs: Convert to new freezing code ext4: Convert to new freezing mechanism fs: Protect write paths by sb_start_write - sb_end_write fs: Skip atime update on frozen filesystem fs: Add freezing handling to mnt_want_write() / mnt_drop_write() fs: Improve filesystem freezing handling switch the protection of percpu_counter list to spinlock nfsd: Push mnt_want_write() outside of i_mutex btrfs: Push mnt_want_write() outside of i_mutex fat: Push mnt_want_write() outside of i_mutex ...
2012-07-31sysfs: Push file_update_time() into bin_page_mkwrite()Jan Kara1-0/+2
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-26Merge tag 'driver-core-3.6-rc1' of ↵Linus Torvalds1-0/+10
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core changes from Greg Kroah-Hartman: "Here's the big driver core pull request for 3.6-rc1. Unlike 3.5, this kernel should be a lot tamer, with the printk changes now settled down. All we have here is some extcon driver updates, w1 driver updates, a few printk cleanups that weren't needed for 3.5, but are good to have now, and some other minor fixes/changes in the driver core. All of these have been in the linux-next releases for a while now. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'driver-core-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits) printk: Export struct log size and member offsets through vmcoreinfo Drivers: hv: Change the hex constant to a decimal constant driver core: don't trigger uevent after failure extcon: MAX77693: Add extcon-max77693 driver to support Maxim MAX77693 MUIC device sysfs: fail dentry revalidation after namespace change fix sysfs: fail dentry revalidation after namespace change extcon: spelling of detach in function doc extcon: arizona: Stop microphone detection if we give up on it extcon: arizona: Update cable reporting calls and split headset PM / Runtime: Do not increment device usage counts before probing kmsg - do not flush partial lines when the console is busy kmsg - export "continuation record" flag to /dev/kmsg kmsg - avoid warning for CONFIG_PRINTK=n compilations kmsg - properly print over-long continuation lines driver-core: Use kobj_to_dev instead of re-implementing it driver-core: Move kobj_to_dev from genhd.h to device.h driver core: Move deferred devices to the end of dpm_list before probing driver core: move uevent call to driver_register driver core: fix shutdown races with probe/remove(v3) Extcon: Arizona: Add driver for Wolfson Arizona class devices ...
2012-07-17sysfs: fail dentry revalidation after namespace change fixAndrew Morton1-3/+5
don't assume that KOBJ_NS_TYPE_NONE==0. Also save a test-n-branch. Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17sysfs: fail dentry revalidation after namespace changeGlauber Costa1-0/+8
When we change the namespace tag of a sysfs entry, the associated dentry is still kept around. readdir() will work correctly and not display the old entries, but open() will still succeed, so will reads and writes. This will no longer happen if sysfs is remounted, hinting that this is a cache-related problem. I am using the following sequence to demonstrate that: shell1: ip link add type veth unshare -nm shell2: ip link set veth1 <pid_of_shell_1> cat /sys/devices/virtual/net/veth1/ifindex Before that patch, this will succeed (fail to fail). After it, it will correctly return an error. Differently from a normal rename, which we handle fine, changing the object namespace will keep it's path intact. So this check seems necessary as well. [ v2: get type from parent, as suggested by Eric Biederman ] Signed-off-by: Glauber Costa <glommer@parallels.com> CC: Tejun Heo <tj@kernel.org> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-14VFS: Pass mount flags to sget()David Howells1-2/+1
Pass mount flags to sget() so that it can use them in initialising a new superblock before the set function is called. They could also be passed to the compare function. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14sysfs: just use d_materialise_unique()Al Viro1-8/+1
same as for nfs et.al. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14sysfs: switch to ->s_d_op and ->d_release()Al Viro3-10/+8
a) ->d_iput() is wrong here - what we do to inode is completely usual, it's dentry->d_fsdata that we want to drop. Just use ->d_release(). b) switch to ->s_d_op - no need to play with d_set_d_op() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14stop passing nameidata to ->lookup()Al Viro1-1/+1
Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14stop passing nameidata * to ->d_revalidate()Al Viro1-2/+2
Just the lookup flags. Die, bastard, die... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-28Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linuxLinus Torvalds1-1/+1
Pull writeback tree from Wu Fengguang: "Mainly from Jan Kara to avoid iput() in the flusher threads." * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Avoid iput() from flusher thread vfs: Rename end_writeback() to clear_inode() vfs: Move waiting for inode writeback from end_writeback() to evict_inode() writeback: Refactor writeback_single_inode() writeback: Remove wb->list_lock from writeback_single_inode() writeback: Separate inode requeueing after writeback writeback: Move I_DIRTY_PAGES handling writeback: Move requeueing when I_SYNC set to writeback_sb_inodes() writeback: Move clearing of I_SYNC into inode_sync_complete() writeback: initialize global_dirty_limit fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds mm: page-writeback.c: local functions should not be exposed globally
2012-05-23Merge branch 'for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace enhancements from Eric Biederman: "This is a course correction for the user namespace, so that we can reach an inexpensive, maintainable, and reasonably complete implementation. Highlights: - Config guards make it impossible to enable the user namespace and code that has not been converted to be user namespace safe. - Use of the new kuid_t type ensures the if you somehow get past the config guards the kernel will encounter type errors if you enable user namespaces and attempt to compile in code whose permission checks have not been updated to be user namespace safe. - All uids from child user namespaces are mapped into the initial user namespace before they are processed. Removing the need to add an additional check to see if the user namespace of the compared uids remains the same. - With the user namespaces compiled out the performance is as good or better than it is today. - For most operations absolutely nothing changes performance or operationally with the user namespace enabled. - The worst case performance I could come up with was timing 1 billion cache cold stat operations with the user namespace code enabled. This went from 156s to 164s on my laptop (or 156ns to 164ns per stat operation). - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value. Most uid/gid setting system calls treat these value specially anyway so attempting to use -1 as a uid would likely cause entertaining failures in userspace. - If setuid is called with a uid that can not be mapped setuid fails. I have looked at sendmail, login, ssh and every other program I could think of that would call setuid and they all check for and handle the case where setuid fails. - If stat or a similar system call is called from a context in which we can not map a uid we lie and return overflowuid. The LFS experience suggests not lying and returning an error code might be better, but the historical precedent with uids is different and I can not think of anything that would break by lying about a uid we can't map. - Capabilities are localized to the current user namespace making it safe to give the initial user in a user namespace all capabilities. My git tree covers all of the modifications needed to convert the core kernel and enough changes to make a system bootable to runlevel 1." Fix up trivial conflicts due to nearby independent changes in fs/stat.c * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits) userns: Silence silly gcc warning. cred: use correct cred accessor with regards to rcu read lock userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq userns: Convert cgroup permission checks to use uid_eq userns: Convert tmpfs to use kuid and kgid where appropriate userns: Convert sysfs to use kgid/kuid where appropriate userns: Convert sysctl permission checks to use kuid and kgids. userns: Convert proc to use kuid/kgid where appropriate userns: Convert ext4 to user kuid/kgid where appropriate userns: Convert ext3 to use kuid/kgid where appropriate userns: Convert ext2 to use kuid/kgid where appropriate. userns: Convert devpts to use kuid/kgid where appropriate userns: Convert binary formats to use kuid/kgid where appropriate userns: Add negative depends on entries to avoid building code that is userns unsafe userns: signal remove unnecessary map_cred_ns userns: Teach inode_capable to understand inodes whose uids map to other namespaces. userns: Fail exec for suid and sgid binaries with ids outside our user namespace. userns: Convert stat to return values mapped from kuids and kgids userns: Convert user specfied uids and gids in chown into kuids and kgid userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs ...
2012-05-15userns: Convert sysfs to use kgid/kuid where appropriateEric W. Biederman1-2/+2
Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-14sysfs: get rid of some lockdep false positivesAlan Stern1-5/+26
This patch (as1554) fixes a lockdep false-positive report. The problem arises because lockdep is unable to deal with the tree-structured locks created by the device core and sysfs. This particular problem involves a sysfs attribute method that unregisters itself, not from the device it was called for, but from a descendant device. Lockdep doesn't understand the distinction and reports a possible deadlock, even though the operation is safe. This is the sort of thing that would normally be handled by using a nested lock annotation; unfortunately it's not feasible to do that here. There's no sensible way to tell sysfs when attribute removal occurs in the context of a parent attribute method. As a workaround, the patch adds a new flag to struct attribute telling sysfs not to inform lockdep when it acquires a readlock on a sysfs_dirent instance for the attribute. The readlock is still acquired, but lockdep doesn't know about it and hence does not complain about impossible deadlock scenarios. Also added are macros for static initialization of attribute structures with the ignore_lockdep flag set. The three offending attributes in the USB subsystem are converted to use the new macros. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Tejun Heo <tj@kernel.org> CC: Eric W. Biederman <ebiederm@xmission.com> CC: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-06vfs: Rename end_writeback() to clear_inode()Jan Kara1-1/+1
After we moved inode_sync_wait() from end_writeback() it doesn't make sense to call the function end_writeback() anymore. Rename it to clear_inode() which well says what the function really does - set I_CLEAR flag. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-05-02sysfs: Removed dup_name entirely in sysfs_renameSasikantha babu1-4/+2
Since no one using "dup_name", removed it completely in sysfs_rename. Signed-off-by: Sasikantha babu <sasikanth.v19@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-10sysfs: handle 'parent deleted before child added'Dan Williams1-0/+3
In scsi at least two cases of the parent device being deleted before the child is added have been observed. 1/ scsi is performing async scans and the device is removed prior to the async can thread running (can happen with an in-opportune / unlikely unplug during initial scan). 2/ libsas discovery event running after the parent port has been torn down (this is a bug in libsas). Result in crash signatures like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6 ... Process scsi_scan_8 (pid: 5417, threadinfo ffff88080bd16000, task ffff880801b8a0b0) Stack: 00000000fffffffe ffff880813470628 ffff88080bd17cd0 ffff88080614b7e8 ffff88080b45c108 00000000fffffffe ffff88080bd17d20 ffffffff8125e4a8 ffff88080bd17cf0 ffffffff81075149 ffff88080bd17d30 ffff88080614b7e8 Call Trace: [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3 [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8125e641>] kobject_add_varg+0x41/0x50 [<ffffffff8125e70b>] kobject_add+0x64/0x66 [<ffffffff8131122b>] device_add+0x12d/0x63a In this scenario the parent is still valid (because we have a reference), but it has been device_del()'d which means its kobj->sd pointer is NULL'd via: device_del()->kobject_del()->sysfs_remove_dir() ...and then sysfs_create_dir() (without this fix) goes ahead and de-references parent_sd via sysfs_ns_type(): return (sd->s_flags & SYSFS_NS_TYPE_MASK) >> SYSFS_NS_TYPE_SHIFT; This scenario is being fixed in scsi/libsas, but if other subsystems present the same ordering the system need not immediately crash. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: James Bottomley <JBottomley@parallels.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-10sysfs: Prevent crash on unset sysfs group attributesBruno Prémont1-1/+5
Do not let the kernel crash when a device is registered with sysfs while group attributes are not set (aka NULL). Warn about the offender with some information about the offending device. This would warn instead of trying NULL pointer deref like: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81152673>] internal_create_group+0x83/0x1a0 PGD 0 Oops: 0000 [#1] SMP CPU 0 Modules linked in: Pid: 1, comm: swapper/0 Not tainted 3.4.0-rc1-x86_64 #3 HP ProLiant DL360 G4 RIP: 0010:[<ffffffff81152673>] [<ffffffff81152673>] internal_create_group+0x83/0x1a0 RSP: 0018:ffff88019485fd70 EFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001 RDX: ffff880192e99908 RSI: ffff880192e99630 RDI: ffffffff81a26c60 RBP: ffff88019485fdc0 R08: 0000000000000000 R09: 0000000000000000 R10: ffff880192e99908 R11: 0000000000000000 R12: ffffffff81a16a00 R13: ffff880192e99908 R14: ffffffff81a16900 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88019bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000001a0c000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper/0 (pid: 1, threadinfo ffff88019485e000, task ffff880194878000) Stack: ffff88019485fdd0 ffff880192da9d60 0000000000000000 ffff880192e99908 ffff880192e995d8 0000000000000001 ffffffff81a16a00 ffff880192da9d60 0000000000000000 0000000000000000 ffff88019485fdd0 ffffffff811527be Call Trace: [<ffffffff811527be>] sysfs_create_group+0xe/0x10 [<ffffffff81376ca6>] device_add_groups+0x46/0x80 [<ffffffff81377d3d>] device_add+0x46d/0x6a0 ... Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-09sysfs: Update the name hash for an entry after changing the namespaceTom Goff1-1/+1
This is needed to allow renaming network devices that have been moved to another network namespace. Signed-off-by: Tom Goff <thomas.goff@boeing.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-21Merge branch 'for-linus' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 1 from Al Viro: "This is _not_ all; in particular, Miklos' and Jan's stuff is not there yet." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits) ext4: initialization of ext4_li_mtx needs to be done earlier debugfs-related mode_t whack-a-mole hfsplus: add an ioctl to bless files hfsplus: change finder_info to u32 hfsplus: initialise userflags qnx4: new helper - try_extent() qnx4: get rid of qnx4_bread/qnx4_getblk take removal of PF_FORKNOEXEC to flush_old_exec() trim includes in inode.c um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it um: embed ->stub_pages[] into mmu_context gadgetfs: list_for_each_safe() misuse ocfs2: fix leaks on failure exits in module_init ecryptfs: make register_filesystem() the last potential failure exit ntfs: forgets to unregister sysctls on register_filesystem() failure logfs: missing cleanup on register_filesystem() failure jfs: mising cleanup on register_filesystem() failure make configfs_pin_fs() return root dentry on success configfs: configfs_create_dir() has parent dentry in dentry->d_parent configfs: sanitize configfs_create() ...
2012-03-20switch open-coded instances of d_make_root() to new helperAl Viro1-2/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-08Revert "sysfs: Kill nlink counting."Greg Kroah-Hartman3-0/+10
This reverts commit 524b6c5b39b931311dfe5a2f5abae2f5c9731676. It has shown to break userspace tools, which is not acceptable. Reported-by: Jiri Slaby <jslaby@suse.cz> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-24sysfs: Fix memory leak in sysfs_sd_setsecdata().Masami Ichikawa1-5/+6
This patch fixies follwing two memory leak patterns that reported by kmemleak. sysfs_sd_setsecdata() is called during sys_lsetxattr() operation. It checks sd->s_iattr is NULL or not. Then if it is NULL, it calls sysfs_init_inode_attrs() to allocate memory. That code is this. iattrs = sd->s_iattr; if (!iattrs) iattrs = sysfs_init_inode_attrs(sd); The iattrs recieves sysfs_init_inode_attrs()'s result, but sd->s_iattr doesn't know the address. so it needs to set correct address to sd->s_iattr to free memory in other function. unreferenced object 0xffff880250b73e60 (size 32): comm "systemd", pid 1, jiffies 4294683888 (age 94.553s) hex dump (first 32 bytes): 73 79 73 74 65 6d 5f 75 3a 6f 62 6a 65 63 74 5f system_u:object_ 72 3a 73 79 73 66 73 5f 74 3a 73 30 00 00 00 00 r:sysfs_t:s0.... backtrace: [<ffffffff814cb1d0>] kmemleak_alloc+0x73/0x98 [<ffffffff811270ab>] __kmalloc+0x100/0x12c [<ffffffff8120775a>] context_struct_to_string+0x106/0x210 [<ffffffff81207cc1>] security_sid_to_context_core+0x10b/0x129 [<ffffffff812090ef>] security_sid_to_context+0x10/0x12 [<ffffffff811fb0da>] selinux_inode_getsecurity+0x7d/0xa8 [<ffffffff811fb127>] selinux_inode_getsecctx+0x22/0x2e [<ffffffff811f4d62>] security_inode_getsecctx+0x16/0x18 [<ffffffff81191dad>] sysfs_setxattr+0x96/0x117 [<ffffffff811542f0>] __vfs_setxattr_noperm+0x73/0xd9 [<ffffffff811543d9>] vfs_setxattr+0x83/0xa1 [<ffffffff811544c6>] setxattr+0xcf/0x101 [<ffffffff81154745>] sys_lsetxattr+0x6a/0x8f [<ffffffff814efda9>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffff88024163c5a0 (size 96): comm "systemd", pid 1, jiffies 4294683888 (age 94.553s) hex dump (first 32 bytes): 00 00 00 00 ed 41 00 00 00 00 00 00 00 00 00 00 .....A.......... 00 00 00 00 00 00 00 00 0c 64 42 4f 00 00 00 00 .........dBO.... backtrace: [<ffffffff814cb1d0>] kmemleak_alloc+0x73/0x98 [<ffffffff81127402>] kmem_cache_alloc_trace+0xc4/0xee [<ffffffff81191cbe>] sysfs_init_inode_attrs+0x2a/0x83 [<ffffffff81191dd6>] sysfs_setxattr+0xbf/0x117 [<ffffffff811542f0>] __vfs_setxattr_noperm+0x73/0xd9 [<ffffffff811543d9>] vfs_setxattr+0x83/0xa1 [<ffffffff811544c6>] setxattr+0xcf/0x101 [<ffffffff81154745>] sys_lsetxattr+0x6a/0x8f [<ffffffff814efda9>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff ` Signed-off-by: Masami Ichikawa <masami256@gmail.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-02Merge 3.3-rc2 into the driver-core-next branch.Greg Kroah-Hartman2-1/+10
This was done to resolve a merge and build problem with the drivers/acpi/processor_driver.c file. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-01-31sysfs: Update the name hash when renaming sysfs entriesEric W. Biederman1-0/+1
This fixes a bug introduced with sysfs name hashes where renaming a network device appears to succeed but silently makes the sysfs files for that network device inaccessible. In at least one configuration this bug has stopped networking from coming up during boot. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-24sysfs: change permissions for /sys from 0755 to 0555Vitaly Kuznetsov1-1/+1
There is a misleading difference between /proc and /sys permissions, /proc is 0555 and /sys is 0755. But as it is impossible to create or unlink something in /sys it would be nice to have same permissions. Signed-off-by: Vitaly Kuznetsov <vitty@altlinux.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-24sysfs: Kill nlink counting.Eric W. Biederman3-10/+0
Tracking the number of subdirectories requires an extra field that increases the size of sysfs_dirent. nlinks are not particularly interesting for sysfs and the nlink counts are wrong when network namespaces are involved so stop counting them, and always return nlink == 1. Userspace already knows that directories with nlink == 1 have an nlink count they can't use to count subdirectories. This reduces the size of sysfs_dirent by 8 bytes on 64bit platforms. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-24sysfs: Store the sysfs inode in an unsigned int.Eric W. Biederman2-3/+3
Store the sysfs inode number in an unsided int because ida inode allocator can return at most a 31 bit number, reducing the size of struct sysfs_dirent by 8 bytes on 64bit platforms. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-24sysfs: Reduce s_flags to an unsinged short so it packs well with s_modeEric W. Biederman1-3/+3
On 32bit this reduces sizeof(struct sysfs_dirent) by 2 bytes. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>