linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2010-10-29	convert get_sb_single() users	Al Viro	3	-16/+13
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-28	Fix install_process_keyring error handling	Andi Kleen	1	-1/+1
	Fix an incorrect error check that returns 1 for error instead of the expected error code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26	Merge branch 'for-linus' of ↵	Linus Torvalds	4	-4/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) split invalidate_inodes() fs: skip I_FREEING inodes in writeback_sb_inodes fs: fold invalidate_list into invalidate_inodes fs: do not drop inode_lock in dispose_list fs: inode split IO and LRU lists fs: switch bdev inode bdi's correctly fs: fix buffer invalidation in invalidate_list fsnotify: use dget_parent smbfs: use dget_parent exportfs: use dget_parent fs: use RCU read side protection in d_validate fs: clean up dentry lru modification fs: split __shrink_dcache_sb fs: improve DCACHE_REFERENCED usage fs: use percpu counter for nr_dentry and nr_dentry_unused fs: simplify __d_free fs: take dcache_lock inside __d_path fs: do not assign default i_ino in new_inode fs: introduce a per-cpu last_ino allocator new helper: ihold() ...
2010-10-26	Merge branch 'ima-memory-use-fixes'	Linus Torvalds	5	-177/+195
	* ima-memory-use-fixes: IMA: fix the ToMToU logic IMA: explicit IMA i_flag to remove global lock on inode_delete IMA: drop refcnt from ima_iint_cache since it isn't needed IMA: only allocate iint when needed IMA: move read counter into struct inode IMA: use i_writecount rather than a private counter IMA: use inode->i_lock to protect read and write counters IMA: convert internal flags from long to char IMA: use unsigned int instead of long for counters IMA: drop the inode opencount since it isn't needed for operation IMA: use rbtree instead of radix tree for inode information cache
2010-10-26	IMA: fix the ToMToU logic	Eric Paris	1	-5/+6
	Current logic looks like this: rc = ima_must_measure(NULL, inode, MAY_READ, FILE_CHECK); if (rc < 0) goto out; if (mode & FMODE_WRITE) { if (inode->i_readcount) send_tomtou = true; goto out; } if (atomic_read(&inode->i_writecount) > 0) send_writers = true; Lets assume we have a policy which states that all files opened for read by root must be measured. Lets assume the file has permissions 777. Lets assume that root has the given file open for read. Lets assume that a non-root process opens the file write. The non-root process will get to ima_counts_get() and will check the ima_must_measure(). Since it is not supposed to measure it will goto out. We should check the i_readcount no matter what since we might be causing a ToMToU voilation! This is close to correct, but still not quite perfect. The situation could have been that root, which was interested in the mesurement opened and closed the file and another process which is not interested in the measurement is the one holding the i_readcount ATM. This is just overly strict on ToMToU violations, which is better than not strict enough... Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26	IMA: explicit IMA i_flag to remove global lock on inode_delete	Eric Paris	2	-5/+12
	Currently for every removed inode IMA must take a global lock and search the IMA rbtree looking for an associated integrity structure. Instead we explicitly mark an inode when we add an integrity structure so we only have to take the global lock and do the removal if it exists. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26	IMA: drop refcnt from ima_iint_cache since it isn't needed	Eric Paris	3	-30/+19
	Since finding a struct ima_iint_cache requires a valid struct inode, and the struct ima_iint_cache is supposed to have the same lifetime as a struct inode (technically they die together but don't need to be created at the same time) we don't have to worry about the ima_iint_cache outliving or dieing before the inode. So the refcnt isn't useful. Just get rid of it and free the structure when the inode is freed. Signed-off-by: Eric Paris <eapris@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26	IMA: only allocate iint when needed	Eric Paris	2	-39/+65
	IMA always allocates an integrity structure to hold information about every inode, but only needed this structure to track the number of readers and writers currently accessing a given inode. Since that information was moved into struct inode instead of the integrity struct this patch stops allocating the integrity stucture until it is needed. Thus greatly reducing memory usage. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26	IMA: move read counter into struct inode	Eric Paris	4	-34/+17
	IMA currently allocated an inode integrity structure for every inode in core. This stucture is about 120 bytes long. Most files however (especially on a system which doesn't make use of IMA) will never need any of this space. The problem is that if IMA is enabled we need to know information about the number of readers and the number of writers for every inode on the box. At the moment we collect that information in the per inode iint structure and waste the rest of the space. This patch moves those counters into the struct inode so we can eventually stop allocating an IMA integrity structure except when absolutely needed. This patch does the minimum needed to move the location of the data. Further cleanups, especially the location of counter updates, may still be possible. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26	IMA: use i_writecount rather than a private counter	Eric Paris	3	-17/+6
	IMA tracks the number of struct files which are holding a given inode readonly and the number which are holding the inode write or r/w. It needs this information so when a new reader or writer comes in it can tell if this new file will be able to invalidate results it already made about existing files. aka if a task is holding a struct file open RO, IMA measured the file and recorded those measurements and then a task opens the file RW IMA needs to note in the logs that the old measurement may not be correct. It's called a "Time of Measure Time of Use" (ToMToU) issue. The same is true is a RO file is opened to an inode which has an open writer. We cannot, with any validity, measure the file in question since it could be changing. This patch attempts to use the i_writecount field to track writers. The i_writecount field actually embeds more information in it's value than IMA needs but it should work for our purposes and allow us to shrink the struct inode even more. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26	IMA: use inode->i_lock to protect read and write counters	Eric Paris	2	-34/+24
	Currently IMA used the iint->mutex to protect the i_readcount and i_writecount. This patch uses the inode->i_lock since we are going to start using in inode objects and that is the most appropriate lock. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26	IMA: convert internal flags from long to char	Eric Paris	1	-2/+2
	The IMA flags is an unsigned long but there is only 1 flag defined. Lets save a little space and make it a char. This packs nicely next to the array of u8's. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26	IMA: use unsigned int instead of long for counters	Eric Paris	3	-9/+14
	Currently IMA uses 2 longs in struct inode. To save space (and as it seems impossible to overflow 32 bits) we switch these to unsigned int. The switch to unsigned does require slightly different checks for underflow, but it isn't complex. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26	IMA: drop the inode opencount since it isn't needed for operation	Eric Paris	3	-14/+3
	The opencount was used to help debugging to make sure that everything which created a struct file also correctly made the IMA calls. Since we moved all of that into the VFS this isn't as necessary. We should be able to get the same amount of debugging out of just the reader and write count. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26	IMA: use rbtree instead of radix tree for inode information cache	Eric Paris	2	-36/+75
	The IMA code needs to store the number of tasks which have an open fd granting permission to write a file even when IMA is not in use. It needs this information in order to be enabled at a later point in time without losing it's integrity garantees. At the moment that means we store a little bit of data about every inode in a cache. We use a radix tree key'd on the inode's memory address. Dave Chinner pointed out that a radix tree is a terrible data structure for such a sparse key space. This patch switches to using an rbtree which should be more efficient. Bug report from Dave: "I just noticed that slabtop was reporting an awfully high usage of radix tree nodes: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 4200331 2778082 66% 0.55K 144839 29 2317424K radix_tree_node 2321500 2060290 88% 1.00K 72581 32 2322592K xfs_inode 2235648 2069791 92% 0.12K 69864 32 279456K iint_cache That is, 2.7M radix tree nodes are allocated, and the cache itself is consuming 2.3GB of RAM. I know that the XFS inodei caches are indexed by radix tree node, but for 2 million cached inodes that would mean a density of 1 inode per radix tree node, which for a system with 16M inodes in the filsystems is an impossibly low density. The worst I've seen in a production system like kernel.org is about 20-25% density, which would mean about 150-200k radix tree nodes for that many inodes. So it's not the inode cache. So I looked up what the iint_cache was. It appears to used for storing per-inode IMA information, and uses a radix tree for indexing. It uses the address of the struct inode as the indexing key. That means the key space is extremely sparse - for XFS the struct inode addresses are approximately 1000 bytes apart, which means the closest the radix tree index keys get is ~1000. Which means that there is a single entry per radix tree leaf node, so the radix tree is using roughly 550 bytes for every 120byte structure being cached. For the above example, it's probably wasting close to 1GB of RAM...." Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-25	fs: take dcache_lock inside __d_path	Christoph Hellwig	2	-4/+0
	All callers take dcache_lock just around the call to __d_path, so take the lock into it in preparation of getting rid of dcache_lock. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs: do not assign default i_ino in new_inode	Christoph Hellwig	2	-0/+2
	Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-22	Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl	Linus Torvalds	3	-3/+12
	* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek
2010-10-21	selinux: include vmalloc.h for vmalloc_user	Stephen Rothwell	1	-0/+1
	Include vmalloc.h for vmalloc_user (fixes ppc build warning). Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	selinux: implement mmap on /selinux/policy	Eric Paris	2	-1/+45
	/selinux/policy allows a user to copy the policy back out of the kernel. This patch allows userspace to actually mmap that file and use it directly. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	SELinux: allow userspace to read policy back out of the kernel	Eric Paris	12	-3/+1256
	There is interest in being able to see what the actual policy is that was loaded into the kernel. The patch creates a new selinuxfs file /selinux/policy which can be read by userspace. The actual policy that is loaded into the kernel will be written back out to userspace. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	SELinux: drop useless (and incorrect) AVTAB_MAX_SIZE	Eric Paris	2	-3/+2
	AVTAB_MAX_SIZE was a define which was supposed to be used in userspace to define a maximally sized avtab when userspace wasn't sure how big of a table it needed. It doesn't make sense in the kernel since we always know our table sizes. The only place it is used we have a more appropiately named define called AVTAB_MAX_HASH_BUCKETS, use that instead. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	SELinux: deterministic ordering of range transition rules	Eric Paris	1	-3/+13
	Range transition rules are placed in the hash table in an (almost) arbitrary order. This patch inserts them in a fixed order to make policy retrival more predictable. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	security: secid_to_secctx returns len when data is NULL	Eric Paris	2	-3/+11
	With the (long ago) interface change to have the secid_to_secctx functions do the string allocation instead of having the caller do the allocation we lost the ability to query the security server for the length of the upcoming string. The SECMARK code would like to allocate a netlink skb with enough length to hold the string but it is just too unclean to do the string allocation twice or to do the allocation the first time and hold onto the string and slen. This patch adds the ability to call security_secid_to_secctx() with a NULL data pointer and it will just set the slen pointer. Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	secmark: make secmark object handling generic	Eric Paris	5	-50/+59
	Right now secmark has lots of direct selinux calls. Use all LSM calls and remove all SELinux specific knowledge. The only SELinux specific knowledge we leave is the mode. The only point is to make sure that other LSMs at least test this generic code before they assume it works. (They may also have to make changes if they do not represent labels as strings) Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Paul Moore <paul.moore@hp.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	AppArmor: Ensure the size of the copy is < the buffer allocated to hold it	John Johansen	1	-1/+3
	Actually I think in this case the appropriate thing to do is to BUG as there is currently a case (remove) where the alloc_size needs to be larger than the copy_size, and if copy_size is ever greater than alloc_size there is a mistake in the caller code. Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	TOMOYO: Print URL information before panic().	Tetsuo Handa	1	-1/+10
	Configuration files for TOMOYO 2.3 are not compatible with TOMOYO 2.2. But current panic() message is too unfriendly and is confusing users. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	security: remove unused parameter from security_task_setscheduler()	KOSAKI Motohiro	4	-12/+7
	All security modules shouldn't change sched_param parameter of security_task_setscheduler(). This is not only meaningless, but also make a harmful result if caller pass a static variable. This patch remove policy and sched_param parameter from security_task_setscheduler() becuase none of security module is using it. Cc: James Morris <jmorris@namei.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	selinux: fix up style problem on /selinux/status	KaiGai Kohei	2	-11/+7
	This patch fixes up coding-style problem at this commit: 4f27a7d49789b04404eca26ccde5f527231d01d5 selinux: fast status update interface (/selinux/status) Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	selinux: change to new flag variable	matt mooney	1	-1/+1
	Replace EXTRA_CFLAGS with ccflags-y. Signed-off-by: matt mooney <mfm@muteddisk.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	selinux: really fix dependency causing parallel compile failure.	Paul Gortmaker	2	-20/+6
	While the previous change to the selinux Makefile reduced the window significantly for this failure, it is still possible to see a compile failure where cpp starts processing selinux files before the auto generated flask.h file is completed. This is easily reproduced by adding the following temporary change to expose the issue everytime: - cmd_flask = scripts/selinux/genheaders/genheaders ... + cmd_flask = sleep 30 ; scripts/selinux/genheaders/genheaders ... This failure happens because the creation of the object files in the ss subdir also depends on flask.h. So simply incorporate them into the parent Makefile, as the ss/Makefile really doesn't do anything unique. With this change, compiling of all selinux files is dependent on completion of the header file generation, and this test case with the "sleep 30" now confirms it is functioning as expected. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	selinux: fix parallel compile error	Paul Gortmaker	1	-1/+1
	Selinux has an autogenerated file, "flask.h" which is included by two other selinux files. The current makefile has a single dependency on the first object file in the selinux-y list, assuming that will get flask.h generated before anyone looks for it, but that assumption breaks down in a "make -jN" situation and you get: selinux/selinuxfs.c:35: fatal error: flask.h: No such file or directory compilation terminated. remake[9]: *** [security/selinux/selinuxfs.o] Error 1 Since flask.h is included by security.h which in turn is included nearly everywhere, make the dependency apply to all of the selinux-y list of objs. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	selinux: fast status update interface (/selinux/status)	KaiGai Kohei	5	-1/+210
	This patch provides a new /selinux/status entry which allows applications read-only mmap(2). This region reflects selinux_kernel_status structure in kernel space. struct selinux_kernel_status { u32 length; /* length of this structure / u32 sequence; / sequence number of seqlock logic / u32 enforcing; / current setting of enforcing mode / u32 policyload; / times of policy reloaded / u32 deny_unknown; / current setting of deny_unknown */ }; When userspace object manager caches access control decisions provided by SELinux, it needs to invalidate the cache on policy reload and setenforce to keep consistency. However, the applications need to check the kernel state for each accesses on userspace avc, or launch a background worker process. In heuristic, frequency of invalidation is much less than frequency of making access control decision, so it is annoying to invoke a system call to check we don't need to invalidate the userspace cache. If we can use a background worker thread, it allows to receive invalidation messages from the kernel. But it requires us an invasive coding toward the base application in some cases; E.g, when we provide a feature performing with SELinux as a plugin module, it is unwelcome manner to launch its own worker thread from the module. If we could map /selinux/status to process memory space, application can know updates of selinux status; policy reload or setenforce. A typical application checks selinux_kernel_status::sequence when it tries to reference userspace avc. If it was changed from the last time when it checked userspace avc, it means something was updated in the kernel space. Then, the application can reset userspace avc or update current enforcing mode, without any system call invocations. This sequence number is updated according to the seqlock logic, so we need to wait for a while if it is odd number. Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com> Acked-by: Eric Paris <eparis@redhat.com> -- security/selinux/include/security.h \| 21 ++++++ security/selinux/selinuxfs.c \| 56 +++++++++++++++ security/selinux/ss/Makefile \| 2 +- security/selinux/ss/services.c \| 3 + security/selinux/ss/status.c \| 129 +++++++++++++++++++++++++++++++++++ 5 files changed, 210 insertions(+), 1 deletions(-) Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	.gitignore: ignore apparmor/rlim_names.h	Yong Zhang	1	-0/+1
	Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	LSM: Fix security_module_enable() error.	Tetsuo Handa	1	-10/+2
	We can set default LSM module to DAC (which means "enable no LSM module"). If default LSM module was set to DAC, security_module_enable() must return 0 unless overridden via boot time parameter. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	selinux: type_bounds_sanity_check has a meaningless variable declaration	Eric Paris	1	-2/+2
	type is not used at all, stop declaring and assigning it. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21	tomoyo: cleanup. don't store bogus pointer	Dan Carpenter	1	-2/+4
	If domain is NULL then &domain->list is a bogus address. Let's leave head->r.domain NULL instead of saving an unusable pointer. This is just a cleanup. The current code always checks head->r.eof before dereferencing head->r.domain. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
2010-10-15	llseek: automatically add .llseek fop	Arnd Bergmann	3	-3/+12
	All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode i, struct file f) { <+... ( nonseekable_open(...) \| nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable / }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, / open uses nonseekable / }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, / we have seq_read / }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, / readdir is present / }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, / read accesses f_pos / }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, / write accesses f_pos / }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, / read and write both use no f_pos / }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, / write uses no f_pos / }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, / read uses no f_pos / }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, / no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
2010-09-27	TOMOYO: Don't abuse sys_getpid(), sys_getppid()	Ben Hutchings	2	-4/+5
	System call entry functions sys_*() are never to be called from general kernel code. The fact that they aren't declared in header files should have been a clue. These functions also don't exist on Alpha since it has sys_getxpid() instead. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: James Morris <jmorris@namei.org>
2010-09-10	KEYS: Fix bug in keyctl_session_to_parent() if parent has no session keyring	David Howells	1	-1/+2
	Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership of the parent process's session keyring whether or not the parent has a session keyring [CVE-2010-2960]. This results in the following oops: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0 IP: [<ffffffff811ae4dd>] keyctl_session_to_parent+0x251/0x443 ... Call Trace: [<ffffffff811ae2f3>] ? keyctl_session_to_parent+0x67/0x443 [<ffffffff8109d286>] ? __do_fault+0x24b/0x3d0 [<ffffffff811af98c>] sys_keyctl+0xb4/0xb8 [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b if the parent process has no session keyring. If the system is using pam_keyinit then it mostly protected against this as all processes derived from a login will have inherited the session keyring created by pam_keyinit during the log in procedure. To test this, pam_keyinit calls need to be commented out in /etc/pam.d/. Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-10	KEYS: Fix RCU no-lock warning in keyctl_session_to_parent()	David Howells	1	-0/+3
	There's an protected access to the parent process's credentials in the middle of keyctl_session_to_parent(). This results in the following RCU warning: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by keyctl-session-/2137: #0: (tasklist_lock){.+.+..}, at: [<ffffffff811ae2ec>] keyctl_session_to_parent+0x60/0x236 stack backtrace: Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1 Call Trace: [<ffffffff8105606a>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffff811ae379>] keyctl_session_to_parent+0xed/0x236 [<ffffffff811af77e>] sys_keyctl+0xb4/0xb6 [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b The code should take the RCU read lock to make sure the parents credentials don't go away, even though it's holding a spinlock and has IRQ disabled. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-08	ima: always maintain counters	Mimi Zohar	3	-4/+9
	commit 8262bb85da allocated the inode integrity struct (iint) before any inodes were created. Only after IMA was initialized in late_initcall were the counters updated. This patch updates the counters, whether or not IMA has been initialized, to resolve 'imbalance' messages. This patch fixes the bug as reported in bugzilla: 15673. When the i915 is builtin, the ring_buffer is initialized before IMA, causing the imbalance message on suspend. Reported-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Tested-by: Thomas Meyer <thomas@m3y3r.de> Tested-by: David Safford<safford@watson.ibm.com> Cc: Stable Kernel <stable@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>
2010-09-08	AppArmor: Fix locking from removal of profile namespace	John Johansen	1	-2/+4
	The locking for profile namespace removal is wrong, when removing a profile namespace, it needs to be removed from its parent's list. Lock the parent of namespace list instead of the namespace being removed. Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-09-08	AppArmor: Fix splitting an fqname into separate namespace and profile names	John Johansen	1	-1/+1
	As per Dan Carpenter <error27@gmail.com> If we have a ns name without a following profile then in the original code it did "ns_name = &name[1];". "name" is NULL so "ns_name" is 0x1. That isn't useful and could cause an oops when this function is called from aa_remove_profiles(). Beyond this the assignment of the namespace name was wrong in the case where the profile name was provided as it was being set to &name[1] after name = skip_spaces(split + 1); Move the ns_name assignment before updating name for the split and also add skip_spaces, making the interface more robust. Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-09-08	AppArmor: Fix security_task_setrlimit logic for 2.6.36 changes	John Johansen	3	-11/+15
	2.6.36 introduced the abilitiy to specify the task that is having its rlimits set. Update mediation to ensure that confined tasks can only set their own group_leader as expected by current policy. Add TODO note about extending policy to support setting other tasks rlimits. Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-09-08	AppArmor: Drop hack to remove appended " (deleted)" string	John Johansen	1	-27/+11
	The 2.6.36 kernel has refactored __d_path() so that it no longer appends " (deleted)" to unlinked paths. So drop the hack that was used to detect and remove the appended string. Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-08-18	Merge branch 'for-linus' of ↵	Linus Torvalds	2	-10/+8
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: fs: brlock vfsmount_lock fs: scale files_lock lglock: introduce special lglock and brlock spin locks tty: fix fu_list abuse fs: cleanup files_lock locking fs: remove extra lookup in __lookup_hash fs: fs_struct rwlock to spinlock apparmor: use task path helpers fs: dentry allocation consolidation fs: fix do_lookup false negative mbcache: Limit the maximum number of cache entries hostfs ->follow_link() braino hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy remove SWRITE* I/O types kill BH_Ordered flag vfs: update ctime when changing the file's permission by setfacl cramfs: only unlock new inodes fix reiserfs_evict_inode end_writeback second call
2010-08-18	tty: fix fu_list abuse	Nick Piggin	1	-1/+4
	tty: fix fu_list abuse tty code abuses fu_list, which causes a bug in remount,ro handling. If a tty device node is opened on a filesystem, then the last link to the inode removed, the filesystem will be allowed to be remounted readonly. This is because fs_may_remount_ro does not find the 0 link tty inode on the file sb list (because the tty code incorrectly removed it to use for its own purpose). This can result in a filesystem with errors after it is marked "clean". Taking idea from Christoph's initial patch, allocate a tty private struct at file->private_data and put our required list fields in there, linking file and tty. This makes tty nodes behave the same way as other device nodes and avoid meddling with the vfs, and avoids this bug. The error handling is not trivial in the tty code, so for this bugfix, I take the simple approach of using __GFP_NOFAIL and don't worry about memory errors. This is not a problem because our allocator doesn't fail small allocs as a rule anyway. So proper error handling is left as an exercise for tty hackers. [ Arguably filesystem's device inode would ideally be divorced from the driver's pseudo inode when it is opened, but in practice it's not clear whether that will ever be worth implementing. ] Cc: linux-kernel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18	fs: cleanup files_lock locking	Nick Piggin	1	-2/+2
	fs: cleanup files_lock locking Lock tty_files with a new spinlock, tty_files_lock; provide helpers to manipulate the per-sb files list; unexport the files_lock spinlock. Cc: linux-kernel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18	apparmor: use task path helpers	Nick Piggin	1	-7/+2
	apparmor: use task path helpers Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>