summaryrefslogtreecommitdiffstats
path: root/fs/notify/fanotify/fanotify_user.c
AgeCommit message (Collapse)AuthorFilesLines
2014-06-04fanotify: check file flags passed in fanotify_initHeinrich Schuchardt1-0/+25
Without this patch fanotify_init does not validate the value passed in event_f_flags. When a fanotify event is read from the fanotify file descriptor a new file descriptor is created where file.f_flags = event_f_flags. Internal and external open flags are stored together in field f_flags of struct file. Hence, an application might create file descriptors with internal flags like FMODE_EXEC, FMODE_NOCMTIME set. Jan Kara and Eric Paris both aggreed that this is a bug and the value of event_f_flags should be checked: https://lkml.org/lkml/2014/4/29/522 https://lkml.org/lkml/2014/4/29/539 This updated patch version considers the comments by Michael Kerrisk in https://lkml.org/lkml/2014/5/4/10 With the patch the value of event_f_flags is checked. When specifying an invalid value error EINVAL is returned. Internal flags are disallowed. File creation flags are disallowed: O_CREAT, O_DIRECTORY, O_EXCL, O_NOCTTY, O_NOFOLLOW, O_TRUNC, and O_TTY_INIT. Flags which do not make sense with fanotify are disallowed: __O_TMPFILE, O_PATH, FASYNC, and O_DIRECT. This leaves us with the following allowed values: O_RDONLY, O_WRONLY, O_RDWR are basic functionality. The are stored in the bits given by O_ACCMODE. O_APPEND is working as expected. The value might be useful in a logging application which appends the current status each time the log is opened. O_LARGEFILE is needed for files exceeding 4GB on 32bit systems. O_NONBLOCK may be useful when monitoring slow devices like tapes. O_NDELAY is equal to O_NONBLOCK except for platform parisc. To avoid code breaking on parisc either both flags should be allowed or none. The patch allows both. __O_SYNC and O_DSYNC may be used to avoid data loss on power disruption. O_NOATIME may be useful to reduce disk activity. O_CLOEXEC may be useful, if separate processes shall be used to scan files. Once this patch is accepted, the fanotify_init.2 manpage has to be updated. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/notify/fanotify/fanotify_user.c: fix FAN_MARK_FLUSH flag checkingHeinrich Schuchardt1-0/+3
If fanotify_mark is called with illegal value of arguments flags and marks it usually returns EINVAL. When fanotify_mark is called with FAN_MARK_FLUSH the argument flags is not checked for irrelevant flags like FAN_MARK_IGNORED_MASK. The patch removes this inconsistency. If an irrelevant flag is set error EINVAL is returned. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fanotify: FAN_MARK_FLUSH: avoid having to provide a fake/invalid fd and pathHeinrich Schuchardt1-7/+10
Originally from Tvrtko Ursulin (https://lkml.org/lkml/2011/1/12/112) Avoid having to provide a fake/invalid fd and path when flushing marks Currently for a group to flush marks it has set it needs to provide a fake or invalid (but resolvable) file descriptor and path when calling fanotify_mark. This patch pulls the flush handling a bit up so file descriptor and path are completely ignored when flushing. I reworked the patch to be applicable again (the signature of fanotify_mark has changed since Tvrtko's work). Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-06fanotify: fix -EOVERFLOW with large files on 64-bitWill Woods1-0/+2
On 64-bit systems, O_LARGEFILE is automatically added to flags inside the open() syscall (also openat(), blkdev_open(), etc). Userspace therefore defines O_LARGEFILE to be 0 - you can use it, but it's a no-op. Everything should be O_LARGEFILE by default. But: when fanotify does create_fd() it uses dentry_open(), which skips all that. And userspace can't set O_LARGEFILE in fanotify_init() because it's defined to 0. So if fanotify gets an event regarding a large file, the read() will just fail with -EOVERFLOW. This patch adds O_LARGEFILE to fanotify_init()'s event_f_flags on 64-bit systems, using the same test as open()/openat()/etc. Addresses https://bugzilla.redhat.com/show_bug.cgi?id=696821 Signed-off-by: Will Woods <wwoods@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fanotify: move unrelated handling from copy_event_to_user()Jan Kara1-21/+19
Move code moving event structure to access_list from copy_event_to_user() to fanotify_read() where it is more logical (so that we can immediately see in the main loop that we either move the event to a different list or free it). Also move special error handling for permission events from copy_event_to_user() to the main loop to have it in one place with error handling for normal events. This makes copy_event_to_user() really only copy the event to user without any side effects. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fanotify: reorganize loop in fanotify_read()Jan Kara1-22/+24
Swap the error / "read ok" branches in the main loop of fanotify_read(). We will grow the "read ok" part in the next patch and this makes the indentation easier. Also it is more common to have error conditions inside an 'if' instead of the fast path. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fanotify: convert access_mutex to spinlockJan Kara1-7/+7
access_mutex is used only to guard operations on access_list. There's no need for sleeping within this lock so just make a spinlock out of it. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fanotify: use fanotify event structure for permission response processingJan Kara1-79/+44
Currently, fanotify creates new structure to track the fact that permission event has been reported to userspace and someone is waiting for a response to it. As event structures are now completely in the hands of each notification framework, we can use the event structure for this tracking instead of allocating a new structure. Since this makes the event structures for normal events and permission events even more different and the structures have different lifetime rules, we split them into two separate structures (where permission event structure contains the structure for a normal event). This makes normal events 8 bytes smaller and the code a tad bit cleaner. [akpm@linux-foundation.org: fix build] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fanotify: remove useless bypass_perm checkJan Kara1-8/+0
The prepare_for_access_response() function checks whether group->fanotify_data.bypass_perm is set. However this test can never be true because prepare_for_access_response() is called only from fanotify_read() which means fanotify group is alive with an active fd while bypass_perm is set from fanotify_release() when all file descriptors pointing to the group are closed and the group is going away. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-25fsnotify: Allocate overflow events with proper typeJan Kara1-0/+13
Commit 7053aee26a35 "fsnotify: do not share events between notification groups" used overflow event statically allocated in a group with the size of the generic notification event. This causes problems because some code looks at type specific parts of event structure and gets confused by a random data it sees there and causes crashes. Fix the problem by allocating overflow event with type corresponding to the group type so code cannot get confused. Signed-off-by: Jan Kara <jack@suse.cz>
2014-01-29fanotify: Fix use after free for permission eventsJan Kara1-1/+6
Currently struct fanotify_event_info has been destroyed immediately after reporting its contents to userspace. However that is wrong for permission events because those need to stay around until userspace provides response which is filled back in fanotify_event_info. So change to code to free permission events only after we have got the response from userspace. Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz> Reported-and-tested-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Jan Kara <jack@suse.cz>
2014-01-27compat: fix sys_fanotify_markHeiko Carstens1-2/+2
Commit 91c2e0bcae72 ("unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE") added a new unified compat fanotify_mark syscall to be used by all architectures. Unfortunately the unified version merges the split mask parameter in a wrong way: the lower and higher word got swapped. This was discovered with glibc's tst-fanotify test case. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reported-by: Andreas Krebbel <krebbel@linux.vnet.ibm.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Acked-by: "David S. Miller" <davem@davemloft.net> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: <stable@vger.kernel.org> [3.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21fsnotify: do not share events between notification groupsJan Kara1-21/+20
Currently fsnotify framework creates one event structure for each notification event and links this event into all interested notification groups. This is done so that we save memory when several notification groups are interested in the event. However the need for event structure shared between inotify & fanotify bloats the event structure so the result is often higher memory consumption. Another problem is that fsnotify framework keeps path references with outstanding events so that fanotify can return open file descriptors with its events. This has the undesirable effect that filesystem cannot be unmounted while there are outstanding events - a regression for inotify compared to a situation before it was converted to fsnotify framework. For fanotify this problem is hard to avoid and users of fanotify should kind of expect this behavior when they ask for file descriptors from notified files. This patch changes fsnotify and its users to create separate event structure for each group. This allows for much simpler code (~400 lines removed by this patch) and also smaller event structures. For example on 64-bit system original struct fsnotify_event consumes 120 bytes, plus additional space for file name, additional 24 bytes for second and each subsequent group linking the event, and additional 32 bytes for each inotify group for private data. After the conversion inotify event consumes 48 bytes plus space for file name which is considerably less memory unless file names are long and there are several groups interested in the events (both of which are uncommon). Fanotify event fits in 56 bytes after the conversion (fanotify doesn't care about file names so its events don't have to have it allocated). A win unless there are four or more fanotify groups interested in the event. The conversion also solves the problem with unmount when only inotify is used as we don't have to grab path references for inotify events. [hughd@google.com: fanotify: fix corruption preventing startup] Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09fanotify: put duplicate code for adding vfsmount/inode marks into an own ↵Lino Sanfilippo1-36/+35
function The code under the groups mark_mutex in fanotify_add_inode_mark() and fanotify_add_vfsmount_mark() is almost identical. So put it into a seperate function. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09fanotify: fix races when adding/removing marksLino Sanfilippo1-12/+37
For both adding an event to an existing mark and destroying a mark we first have to find it via fsnotify_find_[inode|vfsmount]_mark(). But getting the mark and adding an event (or destroying it) is not done atomically. This opens a race where a thread is about to destroy a mark while another thread still finds the same mark and adds an event to its mask although it will be destroyed. Another race exists concerning the excess of a groups number of marks limit: When a mark is added the number of group marks is checked against the max number of marks per group and increased afterwards. Since check and increment is also not done atomically, this may result in 2 or more processes passing the check at the same time and increasing the number of group marks above the allowed limit. With this patch both races are avoided by doing the concerning operations with the groups mark mutex locked. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09fanotify: info leak in copy_event_to_user()Dan Carpenter1-0/+1
The ->reserved field isn't cleared so we leak one byte of stack information to userspace. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-29fanotify: quit wanking with FASYNC in ->release()Al Viro1-3/+0
... especially since there's no way to get that sucker on the list fsnotify_fasync() works with - the only thing adding to it is fsnotify_fasync() itself and it's never called for fanotify files while they are opened. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-09unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINEAl Viro1-0/+17
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-03-03teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long longAl Viro1-14/+3
... and convert a bunch of SYSCALL_DEFINE ones to SYSCALL_DEFINE<n>, killing the boilerplate crap around them. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22new helper: file_inode(file)Al Viro1-1/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20Merge branch 'for-next' of git://git.infradead.org/users/eparis/notifyLinus Torvalds1-12/+25
Pull filesystem notification updates from Eric Paris: "This pull mostly is about locking changes in the fsnotify system. By switching the group lock from a spin_lock() to a mutex() we can now hold the lock across things like iput(). This fixes a problem involving unmounting a fs and having inodes be busy, first pointed out by FAT, but reproducible with tmpfs. This also restores signal driven I/O for inotify, which has been broken since about 2.6.32." Ugh. I *hate* the timing of this. It was rebased after the merge window opened, and then left to sit with the pull request coming the day before the merge window closes. That's just crap. But apparently the patches themselves have been around for over a year, just gathering dust, so now it's suddenly critical. Fixed up semantic conflict in fs/notify/fdinfo.c as per Stephen Rothwell's fixes from -next. * 'for-next' of git://git.infradead.org/users/eparis/notify: inotify: automatically restart syscalls inotify: dont skip removal of watch descriptor if creation of ignored event failed fanotify: dont merge permission events fsnotify: make fasync generic for both inotify and fanotify fsnotify: change locking order fsnotify: dont put marks on temporary list when clearing marks by group fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark() fsnotify: pass group to fsnotify_destroy_mark() fsnotify: use a mutex instead of a spinlock to protect a groups mark list fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed fsnotify: take groups mark_lock before mark lock fsnotify: use reference counting for groups fsnotify: introduce fsnotify_get_group() inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()
2012-12-17fs, notify: add procfs fdinfo helperCyrill Gorcunov1-0/+2
This allow us to print out fsnotify details such as watchee inode, device, mask and optionally a file handle. For inotify objects if kernel compiled with exportfs support the output will be | pos: 0 | flags: 02000000 | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:11a1000020542153 | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:49b1060023552153 If kernel compiled without exportfs support, the file handle won't be provided but inode and device only. | pos: 0 | flags: 02000000 | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0 | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0 For fanotify the output is like | pos: 0 | flags: 04002 | fanotify flags:10 event-flags:0 | fanotify mnt_id:12 mask:3b ignored_mask:0 | fanotify ino:50205 sdev:800013 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:05020500fb1d47e7 To minimize impact on general fsnotify code the new functionality is gathered in fs/notify/fdinfo.c file. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: James Bottomley <jbottomley@parallels.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matthew Helsley <matt.helsley@gmail.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11fsnotify: make fasync generic for both inotify and fanotifyEric Paris1-0/+4
inotify is supposed to support async signal notification when information is available on the inotify fd. This patch moves that support to generic fsnotify functions so it can be used by all notification mechanisms. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fsnotify: pass group to fsnotify_destroy_mark()Lino Sanfilippo1-2/+2
In fsnotify_destroy_mark() dont get the group from the passed mark anymore, but pass the group itself as an additional parameter to the function. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fanotify: add an extra flag to mark_remove_from_mask that indicates wheather ↵Lino Sanfilippo1-5/+14
a mark should be destroyed This patch adds an extra flag to mark_remove_from_mask() to inform the caller if the mark should be destroyed. With this we dont destroy the mark implicitly in the function itself any more but let the caller handle it. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()Lino Sanfilippo1-7/+7
Currently in fsnotify_put_group() the ref count of a group is decremented and if it becomes 0 fsnotify_destroy_group() is called. Since a groups ref count is only at group creation set to 1 and never increased after that a call to fsnotify_put_group() always results in a call to fsnotify_destroy_group(). With this patch fsnotify_destroy_group() is called directly. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-11-18fanotify: fix FAN_Q_OVERFLOW case of fanotify_read()Al Viro1-1/+2
If the FAN_Q_OVERFLOW bit set in event->mask, the fanotify event metadata will not contain a valid file descriptor, but copy_event_to_user() didn't check for that, and unconditionally does a fd_install() on the file descriptor. Which in turn will cause a BUG_ON() in __fd_install(). Introduced by commit 352e3b249284 ("fanotify: sanitize failure exits in copy_event_to_user()") Mea culpa - missed that path ;-/ Reported-by: Alex Shi <lkml.alex@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-26switch simple cases of fget_light to fdgetAl Viro1-15/+13
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26fanotify: sanitize failure exits in copy_event_to_user()Al Viro1-39/+20
* do copy_to_user() before prepare_for_access_response(); that kills the need in remove_access_response(). * don't do fd_install() until we are past the last possible failure exit. Don't use sys_close() on cleanup side - just put_unused_fd() and fput(). Less racy that way... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-23switch dentry_open() to struct path, make it grab references itselfAl Viro1-6/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: move fsnotify junk to struct mountAl Viro1-2/+4
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-31Fix common misspellingsLucas De Marchi1-1/+1
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-01Remove one to many n's in a wordJustin P. Mattock1-1/+1
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-12-15fanotify: fill in the metadata_len field on struct fanotify_event_metadataEric Paris1-2/+4
The fanotify_event_metadata now has a field which is supposed to indicate the length of the metadata portion of the event. Fill in that field as well. Based-in-part-on-patch-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07fanotify: Dont try to open a file descriptor for the overflow eventLino Sanfilippo1-4/+13
We should not try to open a file descriptor for the overflow event since this will always fail. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07fanotify: do not leak user reference on allocation failureEric Paris1-1/+3
If fanotify_init is unable to allocate a new fsnotify group it will return but will not drop its reference on the associated user struct. Drop that reference on error. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07fanotify: on group destroy allow all waiters to bypass permission checkLino Sanfilippo1-2/+3
When fanotify_release() is called, there may still be processes waiting for access permission. Currently only processes for which an event has already been queued into the groups access list will be woken up. Processes for which no event has been queued will continue to sleep and thus cause a deadlock when fsnotify_put_group() is called. Furthermore there is a race allowing further processes to be waiting on the access wait queue after wake_up (if they arrive before clear_marks_by_group() is called). This patch corrects this by setting a flag to inform processes that the group is about to be destroyed and thus not to wait for access permission. [additional changelog from eparis] Lets think about the 4 relevant code paths from the PoV of the 'operator' 'listener' 'responder' and 'closer'. Where operator is the process doing an action (like open/read) which could require permission. Listener is the task (or in this case thread) slated with reading from the fanotify file descriptor. The 'responder' is the thread responsible for responding to access requests. 'Closer' is the thread attempting to close the fanotify file descriptor. The 'operator' is going to end up in: fanotify_handle_event() get_response_from_access() (THIS BLOCKS WAITING ON USERSPACE) The 'listener' interesting code path fanotify_read() copy_event_to_user() prepare_for_access_response() (THIS CREATES AN fanotify_response_event) The 'responder' code path: fanotify_write() process_access_response() (REMOVE A fanotify_response_event, SET RESPONSE, WAKE UP 'operator') The 'closer': fanotify_release() (SUPPOSED TO CLEAN UP THE REST OF THIS MESS) What we have today is that in the closer we remove all of the fanotify_response_events and set a bit so no more response events are ever created in prepare_for_access_response(). The bug is that we never wake all of the operators up and tell them to move along. You fix that in fanotify_get_response_from_access(). You also fix other operators which haven't gotten there yet. So I agree that's a good fix. [/additional changelog from eparis] [remove additional changes to minimize patch size] [move initialization so it was inside CONFIG_FANOTIFY_PERMISSION] Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07fanotify: Dont allow a mask of 0 if setting or removing a markLino Sanfilippo1-1/+3
In mark_remove_from_mask() we destroy marks that have their event mask cleared. Thus we should not allow the creation of those marks in the first place. With this patch we check if the mask given from user is 0 in case of FAN_MARK_ADD. If so we return an error. Same for FAN_MARK_REMOVE since this does not have any effect. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07fanotify: correct broken ref counting in case adding a mark failedLino Sanfilippo1-17/+14
If adding a mount or inode mark failed fanotify_free_mark() is called explicitly. But at this time the mark has already been put into the destroy list of the fsnotify_mark kernel thread. If the thread is too slow it will try to decrease the reference of a mark, that has already been freed by fanotify_free_mark(). (If its fast enough it will only decrease the marks ref counter from 2 to 1 - note that the counter has been increased to 2 in add_mark() - which has practically no effect.) This patch fixes the ref counting by not calling free_mark() explicitly, but decreasing the ref counter and rely on the fsnotify_mark thread to cleanup in case adding the mark has failed. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07fanotify: deny permissions when no event was sentEric Paris1-4/+12
If no event was sent to userspace we cannot expect userspace to respond to permissions requests. Today such requests just hang forever. This patch will deny any permissions event which was unable to be sent to userspace. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-30make fanotify_read() restartable across signalsLino Sanfilippo1-1/+1
In fanotify_read() return -ERESTARTSYS instead of -EINTR to make read() restartable across signals (BSD semantic). Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fs/notify/fanotify/fanotify_user.c: fix warningsAndrew Morton1-3/+2
fs/notify/fanotify/fanotify_user.c: In function 'fanotify_release': fs/notify/fanotify/fanotify_user.c:375: warning: unused variable 'lre' fs/notify/fanotify/fanotify_user.c:375: warning: unused variable 're' this is really ugly. Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fanotify: do not recalculate the mask if the ignored mask changedEric Paris1-3/+3
If fanotify sets a new bit in the ignored mask it will cause the generic fsnotify layer to recalculate the real mask. This is stupid since we didn't change that part. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fanotify: ignore events on directories unless specifically requestedEric Paris1-0/+12
fanotify has a very limited number of events it sends on directories. The usefulness of these events is yet to be seen and still we send them. This is particularly painful for mount marks where one might receive many of these useless events. As such this patch will drop events on IS_DIR() inodes unless they were explictly requested with FAN_ON_DIR. This means that a mark on a directory without FAN_EVENT_ON_CHILD or FAN_ON_DIR is meaningless and will result in no events ever (although it will still be allowed since detecting it is hard) Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fanotify: limit number of listeners per userEric Paris1-0/+11
fanotify currently has no limit on the number of listeners a given user can have open. This patch limits the total number of listeners per user to 128. This is the same as the inotify default limit. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fanotify: allow userspace to override max marksEric Paris1-1/+8
Some fanotify groups, especially those like AV scanners, will need to place lots of marks, particularly ignore marks. Since ignore marks do not pin inodes in cache and are cleared if the inode is removed from core (usually under memory pressure) we expose an interface for listeners, with CAP_SYS_ADMIN, to override the maximum number of marks and be allowed to set and 'unlimited' number of marks. Programs which make use of this feature will be able to OOM a machine. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fanotify: limit the number of marks in a single fanotify groupEric Paris1-0/+9
There is currently no limit on the number of marks a given fanotify group can have. Since fanotify is gated on CAP_SYS_ADMIN this was not seen as a serious DoS threat. This patch implements a default of 8192, the same as inotify to work towards removing the CAP_SYS_ADMIN gating and eliminating the default DoS'able status. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fanotify: allow userspace to override max queue depthEric Paris1-1/+8
fanotify has a defualt max queue depth. This patch allows processes which explicitly request it to have an 'unlimited' queue depth. These processes need to be very careful to make sure they cannot fall far enough behind that they OOM the box. Thus this flag is gated on CAP_SYS_ADMIN. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fsnotify: implement a default maximum queue depthEric Paris1-0/+4
Currently fanotify has no maximum queue depth. Since fanotify is CAP_SYS_ADMIN only this does not pose a normal user DoS issue, but it certianly is possible that an fanotify listener which can't keep up could OOM the box. This patch implements a default 16k depth. This is the same default depth used by inotify, but given fanotify's better queue merging in many situations this queue will contain many additional useful events by comparison. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-10-28fanotify: ignore fanotify ignore marks if open writersEric Paris1-0/+10
fanotify will clear ignore marks if a task changes the contents of an inode. The problem is with the races around when userspace finishes checking a file and when that result is actually attached to the inode. This race was described as such: Consider the following scenario with hostile processes A and B, and victim process C: 1. Process A opens new file for writing. File check request is generated. 2. File check is performed in userspace. Check result is "file has no malware". 3. The "permit" response is delivered to kernel space. 4. File ignored mark set. 5. Process A writes dummy bytes to the file. File ignored flags are cleared. 6. Process B opens the same file for reading. File check request is generated. 7. File check is performed in userspace. Check result is "file has no malware". 8. Process A writes malware bytes to the file. There is no cached response yet. 9. The "permit" response is delivered to kernel space and is cached in fanotify. 10. File ignored mark set. 11. Now any process C will be permitted to open the malware file. There is a race between steps 8 and 10 While fanotify makes no strong guarantees about systems with hostile processes there is no reason we cannot harden against this race. We do that by simply ignoring any ignore marks if the inode has open writers (aka i_writecount > 0). (We actually do not ignore ignore marks if the FAN_MARK_SURV_MODIFY flag is set) Reported-by: Vasily Novikov <vasily.novikov@kaspersky.com> Signed-off-by: Eric Paris <eparis@redhat.com>