summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2016-07-15x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2H.J. Lu1-0/+18
Don't use the same syscall numbers for 2 different syscalls: 534 x32 preadv compat_sys_preadv64 535 x32 pwritev compat_sys_pwritev64 534 x32 preadv2 compat_sys_preadv2 535 x32 pwritev2 compat_sys_pwritev2 Add compat_sys_preadv64v2() and compat_sys_pwritev64v2() so that 64-bit offset is passed in one 64-bit register on x32, similar to compat_sys_preadv64() and compat_sys_pwritev64(). Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/CAMe9rOovCMf-RQfx_n1U_Tu_DX1BYkjtFr%3DQ4-_PFVSj9BCzUA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08Merge tag 'ecryptfs-4.7-rc7-fixes' of ↵Linus Torvalds4-20/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull eCryptfs fixes from Tyler Hicks: "Provide a more concise fix for CVE-2016-1583: - Additionally fixes linux-stable regressions caused by the cherry-picking of the original fix Some very minor changes that have queued up: - Fix typos in code comments - Remove unnecessary check for NULL before destroying kmem_cache" * tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: ecryptfs: don't allow mmap when the lower fs doesn't support it Revert "ecryptfs: forbid opening files without mmap handler" ecryptfs: fix spelling mistakes eCryptfs: fix typos in comment ecryptfs: drop null test before destroy functions
2016-07-08ecryptfs: don't allow mmap when the lower fs doesn't support itJeff Mahoney1-1/+14
There are legitimate reasons to disallow mmap on certain files, notably in sysfs or procfs. We shouldn't emulate mmap support on file systems that don't offer support natively. CVE-2016-1583 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org [tyhicks: clean up f_op check by using ecryptfs_file_to_lower()] Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2016-07-07Revert "ecryptfs: forbid opening files without mmap handler"Jeff Mahoney1-11/+2
This reverts commit 2f36db71009304b3f0b95afacd8eba1f9f046b87. It fixed a local root exploit but also introduced a dependency on the lower file system implementing an mmap operation just to open a file, which is a bit of a heavy hammer. The right fix is to have mmap depend on the existence of the mmap handler instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2016-07-07Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull block IO fixes from Jens Axboe: "Three small fixes that have been queued up and tested for this series: - A bug fix for xen-blkfront from Bob Liu, fixing an issue with incomplete requests during migration. - A fix for an ancient issue in retrieving the IO priority of a different PID than self, preventing that task from going away while we access it. From Omar. - A writeback fix from Tahsin, fixing a case where we'd call ihold() with a zero ref count inode" * 'for-linus' of git://git.kernel.dk/linux-block: block: fix use-after-free in sys_ioprio_get() writeback: inode cgroup wb switch should not call ihold() xen-blkfront: save uncompleted reqs in blkfront_resume()
2016-07-07Merge tag 'configfs-for-4.7' of git://git.infradead.org/users/hch/configfsLinus Torvalds1-2/+0
Pull configfs fix from Christoph Hellwig: "A fix from Marek for ppos handling in configfs_write_bin_file, which was introduced in Linux 4.5, but didn't have any users until recently" * tag 'configfs-for-4.7' of git://git.infradead.org/users/hch/configfs: configfs: Remove ppos increment in configfs_write_bin_file
2016-07-03Merge branch 'for-linus' of ↵Linus Torvalds3-1/+31
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fix from Miklos Szeredi: "This makes sure userspace filesystems are not broken by the parallel lookups and readdir feature" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: serialize dirops by default
2016-07-03Merge branch 'overlayfs-linus' of ↵Linus Torvalds2-8/+33
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "This contains fixes for a dentry leak, a regression in 4.6 noticed by Docker users and missing write access checking in truncate" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: warn instead of error if d_type is not supported ovl: get_write_access() in truncate ovl: fix dentry leak for default_permissions
2016-07-03ovl: warn instead of error if d_type is not supportedVivek Goyal1-5/+7
overlay needs underlying fs to support d_type. Recently I put in a patch in to detect this condition and started failing mount if underlying fs did not support d_type. But this breaks existing configurations over kernel upgrade. Those who are running docker (partially broken configuration) with xfs not supporting d_type, are surprised that after kernel upgrade docker does not run anymore. https://github.com/docker/docker/issues/22937#issuecomment-229881315 So instead of erroring out, detect broken configuration and warn about it. This should allow existing docker setups to continue working after kernel upgrade. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 45aebeaf4f67 ("ovl: Ensure upper filesystem supports d_type") Cc: <stable@vger.kernel.org> 4.6
2016-07-01Merge branch 'for-linus' of ↵Linus Torvalds4-48/+78
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Tmpfs readdir throughput regression fix (this cycle) + some -stable fodder all over the place. One missing bit is Miklos' tonight locks.c fix - NFS folks had already grabbed that one by the time I woke up ;-)" [ The locks.c fix came through the nfsd tree just moments ago ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: namespace: update event counter when umounting a deleted dentry 9p: use file_dentry() ceph: fix d_obtain_alias() misuses lockless next_positive() libfs.c: new helper - next_positive() dcache_{readdir,dir_lseek}(): don't bother with nested ->d_lock
2016-07-01Merge tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2-4/+11
Pull lockd/locks fixes from Bruce Fields: "One fix for lockd soft lookups in an error path, and one fix for file leases on overlayfs" * tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linux: locks: use file_inode() lockd: unregister notifier blocks if the service fails to come up completely
2016-07-01Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "1/ Two regression fixes since v4.6: one for the byte order of a sysfs attribute (bz121161) and another for QEMU 2.6's NVDIMM _DSM (ACPI Device Specific Method) implementation that gets tripped up by new auto-probing behavior in the NFIT driver. 2/ A fix tagged for -stable that stops the kernel from clobbering/ignoring changes to the configuration of a 'pfn' instance ("struct page" driver). For example changing the alignment from 2M to 1G may silently revert to 2M if that value is currently stored on media. 3/ A fix from Eric for an xfstests failure in dax. It is not currently tagged for -stable since it requires an 8-exabyte file system to trigger, and there appear to be no user visible side effects" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nfit: fix format interface code byte order dax: fix offset overflow in dax_io acpi, nfit: fix acpi_check_dsm() vs zero functions implemented libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment
2016-07-01locks: use file_inode()Miklos Szeredi1-1/+1
(Another one for the f_path debacle.) ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask. The reason is that generic_add_lease() used filp->f_path.dentry->inode while all the others use file_inode(). This makes a difference for files opened on overlayfs since the former will point to the overlay inode the latter to the underlying inode. So generic_add_lease() added the lease to the overlay inode and generic_delete_lease() removed it from the underlying inode. When the file was released the lease remained on the overlay inode's lock list, resulting in use after free. Reported-by: Eryu Guan <eguan@redhat.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-30namespace: update event counter when umounting a deleted dentryAndrey Ulanov1-0/+1
- m_start() in fs/namespace.c expects that ns->event is incremented each time a mount added or removed from ns->list. - umount_tree() removes items from the list but does not increment event counter, expecting that it's done before the function is called. - There are some codepaths that call umount_tree() without updating "event" counter. e.g. from __detach_mounts(). - When this happens m_start may reuse a cached mount structure that no longer belongs to ns->list (i.e. use after free which usually leads to infinite loop). This change fixes the above problem by incrementing global event counter before invoking umount_tree(). Change-Id: I622c8e84dcb9fb63542372c5dbf0178ee86bb589 Cc: stable@vger.kernel.org Signed-off-by: Andrey Ulanov <andreyu@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-309p: use file_dentry()Miklos Szeredi1-3/+3
v9fs may be used as lower layer of overlayfs and accessing f_path.dentry can lead to a crash. In this case it's a NULL pointer dereference in p9_fid_create(). Fix by replacing direct access of file->f_path.dentry with the file_dentry() accessor, which will always return a native object. Reported-by: Alessio Igor Bogani <alessioigorbogani@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Alessio Igor Bogani <alessioigorbogani@gmail.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-30lockd: unregister notifier blocks if the service fails to come up completelyScott Mayhew1-3/+10
If the lockd service fails to start up then we need to be sure that the notifier blocks are not registered, otherwise a subsequent start of the service could cause the same notifier to be registered twice, leading to soft lockups. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org Fixes: 0751ddf77b6a "lockd: Register callbacks on the inetaddr_chain..." Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-30writeback: inode cgroup wb switch should not call ihold()Tahsin Erdogan1-1/+1
Asynchronous wb switching of inodes takes an additional ref count on an inode to make sure inode remains valid until switchover is completed. However, anyone calling ihold() must already have a ref count on inode, but in this case inode->i_count may already be zero: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 917 at fs/inode.c:397 ihold+0x2b/0x30 CPU: 1 PID: 917 Comm: kworker/u4:5 Not tainted 4.7.0-rc2+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: writeback wb_workfn (flush-8:16) 0000000000000000 ffff88007ca0fb58 ffffffff805990af 0000000000000000 0000000000000000 ffff88007ca0fb98 ffffffff80268702 0000018d000004e2 ffff88007cef40e8 ffff88007c9b89a8 ffff880079e3a740 0000000000000003 Call Trace: [<ffffffff805990af>] dump_stack+0x4d/0x6e [<ffffffff80268702>] __warn+0xc2/0xe0 [<ffffffff802687d8>] warn_slowpath_null+0x18/0x20 [<ffffffff8035b4ab>] ihold+0x2b/0x30 [<ffffffff80367ecc>] inode_switch_wbs+0x11c/0x180 [<ffffffff80369110>] wbc_detach_inode+0x170/0x1a0 [<ffffffff80369abc>] writeback_sb_inodes+0x21c/0x530 [<ffffffff80369f7e>] wb_writeback+0xee/0x1e0 [<ffffffff8036a147>] wb_workfn+0xd7/0x280 [<ffffffff80287531>] ? try_to_wake_up+0x1b1/0x2b0 [<ffffffff8027bb09>] process_one_work+0x129/0x300 [<ffffffff8027be06>] worker_thread+0x126/0x480 [<ffffffff8098cde7>] ? __schedule+0x1c7/0x561 [<ffffffff8027bce0>] ? process_one_work+0x300/0x300 [<ffffffff80280ff4>] kthread+0xc4/0xe0 [<ffffffff80335578>] ? kfree+0xc8/0x100 [<ffffffff809903cf>] ret_from_fork+0x1f/0x40 [<ffffffff80280f30>] ? __kthread_parkme+0x70/0x70 ---[ end trace aaefd2fd9f306bc4 ]--- Signed-off-by: Tahsin Erdogan <tahsin@google.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-30fuse: serialize dirops by defaultMiklos Szeredi3-1/+31
Negotiate with userspace filesystems whether they support parallel readdir and lookup. Disable parallelism by default for fear of breaking fuse filesystems. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 9902af79c01a ("parallel lookups: actual switch to rwsem") Fixes: d9b3dbdcfd62 ("fuse: switch to ->iterate_shared()")
2016-06-30configfs: Remove ppos increment in configfs_write_bin_fileMarek Vasut1-2/+0
The simple_write_to_buffer() already increments the @ppos on success, see fs/libfs.c simple_write_to_buffer() comment: " On success, the number of bytes written is returned and the offset @ppos advanced by this number, or negative value is returned on error. " If the configfs_write_bin_file() is invoked with @count smaller than the total length of the written binary file, it will be invoked multiple times. Since configfs_write_bin_file() increments @ppos on success, after calling simple_write_to_buffer(), the @ppos is incremented twice. Subsequent invocation of configfs_write_bin_file() will result in the next piece of data being written to the offset twice as long as the length of the previous write, thus creating buffer with "holes" in it. The simple testcase using DTO follows: $ mkdir /sys/kernel/config/device-tree/overlays/1 $ dd bs=1 if=foo.dtbo of=/sys/kernel/config/device-tree/overlays/1/dtbo Without this patch, the testcase will result in twice as big buffer in the kernel, which is then passed to the cfs_overlay_item_dtbo_write() . Signed-off-by: Marek Vasut <marex@denx.de> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Christoph Hellwig <hch@lst.de> Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
2016-06-29Merge tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds8-23/+45
Pull NFS client bugfixes from Anna Schumaker: "Stable bugfixes: - Fix _cancel_empty_pagelist - Fix a double page unlock - Make nfs_atomic_open() call d_drop() on all ->open_context() errors. - Fix another OPEN_DOWNGRADE bug Other bugfixes: - Ensure we handle delegation errors in nfs4_proc_layoutget() - Layout stateids start out as being invalid - Add sparse lock annotations for pnfs_find_alloc_layout - Handle bad delegation stateids in nfs4_layoutget_handle_exception - Fix up O_DIRECT results - Fix potential use after free of state in nfs4_do_reclaim. - Mark the layout stateid invalid when all segments are removed - Don't let readdirplus revalidate an inode that was marked as stale - Fix potential race in nfs_fhget() - Fix an unused variable warning" * tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Fix another OPEN_DOWNGRADE bug make nfs_atomic_open() call d_drop() on all ->open_context() errors. NFS: Fix an unused variable warning NFS: Fix potential race in nfs_fhget() NFS: Don't let readdirplus revalidate an inode that was marked as stale NFSv4.1/pnfs: Mark the layout stateid invalid when all segments are removed NFS: Fix a double page unlock pnfs_nfs: fix _cancel_empty_pagelist nfs4: Fix potential use after free of state in nfs4_do_reclaim. NFS: Fix up O_DIRECT results NFS/pnfs: handle bad delegation stateids in nfs4_layoutget_handle_exception NFSv4.1/pnfs: Add sparse lock annotations for pnfs_find_alloc_layout NFSv4.1/pnfs: Layout stateids start out as being invalid NFSv4.1/pnfs: Ensure we handle delegation errors in nfs4_proc_layoutget()
2016-06-29ovl: get_write_access() in truncateMiklos Szeredi1-0/+21
When truncating a file we should check write access on the underlying inode. And we should do so on the lower file as well (before copy-up) for consistency. Original patch and test case by Aihua Zhang. - - >o >o - - test.c - - >o >o - - #include <stdio.h> #include <errno.h> #include <unistd.h> int main(int argc, char *argv[]) { int ret; ret = truncate(argv[0], 4096); if (ret != -1) { fprintf(stderr, "truncate(argv[0]) should have failed\n"); return 1; } if (errno != ETXTBSY) { perror("truncate(argv[0])"); return 1; } return 0; } - - >o >o - - >o >o - - >o >o - - Reported-by: Aihua Zhang <zhangaihua1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-06-29ovl: fix dentry leak for default_permissionsMiklos Szeredi1-3/+5
When using the 'default_permissions' mount option, ovl_permission() on non-directories was missing a dput(alias), resulting in "BUG Dentry still in use". Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 8d3095f4ad47 ("ovl: default permissions") Cc: <stable@vger.kernel.org> # v4.5+
2016-06-28NFS: Fix another OPEN_DOWNGRADE bugTrond Myklebust1-3/+2
Olga Kornievskaia reports that the following test fails to trigger an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE. fd0 = open(foo, RDRW) -- should be open on the wire for "both" fd1 = open(foo, RDONLY) -- should be open on the wire for "read" close(fd0) -- should trigger an open_downgrade read(fd1) close(fd1) The issue is that we're missing a check for whether or not the current state transitioned from an O_RDWR state as opposed to having transitioned from a combination of O_RDONLY and O_WRONLY. Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: cd9288ffaea4 ("NFSv4: Fix another bug in the close/open_downgrade code") Cc: stable@vger.kernel.org # 2.6.33+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-27dax: fix offset overflow in dax_ioEric Sandeen1-1/+6
This isn't functionally apparent for some reason, but when we test io at extreme offsets at the end of the loff_t rang, such as in fstests xfs/071, the calculation of "max" in dax_io() can be wrong due to pos + size overflowing. For example, # xfs_io -c "pwrite 9223372036854771712 512" /mnt/test/file enters dax_io with: start 0x7ffffffffffff000 end 0x7ffffffffffff200 and the rounded up "size" variable is 0x1000. This yields: pos + size 0x8000000000000000 (overflows loff_t) end 0x7ffffffffffff200 Due to the overflow, the min() function picks the wrong value for the "max" variable, and when we send (max - pos) into i.e. copy_from_iter_pmem() it is also the wrong value. This somehow(tm) gets magically absorbed without incident, probably because iter->count is correct. But it seems best to fix it up properly by comparing the two values as unsigned. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-06-27Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds9-52/+124
Pull cifs fixes from Steve French: "Various small cifs/smb3 fixes, include some for stable, and some from the recent SMB3 test event" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: File names with trailing period or space need special case conversion Fix reconnect to not defer smb3 session reconnect long after socket reconnect cifs: check hash calculating succeeded cifs: dynamic allocation of ntlmssp blob cifs: use CIFS_MAX_DOMAINNAME_LEN when converting the domain name cifs: stuff the fl_owner into "pid" field in the lock request
2016-06-27make nfs_atomic_open() call d_drop() on all ->open_context() errors.Al Viro1-1/+1
In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code" unconditional d_drop() after the ->open_context() had been removed. It had been correct for success cases (there ->open_context() itself had been doing dcache manipulations), but not for error ones. Only one of those (ENOENT) got a compensatory d_drop() added in that commit, but in fact it should've been done for all errors. As it is, the case of O_CREAT non-exclusive open on a hashed negative dentry racing with e.g. symlink creation from another client ended up with ->open_context() getting an error and proceeding to call nfs_lookup(). On a hashed dentry, which would've instantly triggered BUG_ON() in d_materialise_unique() (or, these days, its equivalent in d_splice_alias()). Cc: stable@vger.kernel.org # v3.10+ Tested-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-25Merge branch 'for-linus-4.7-part2' of ↵Linus Torvalds3-13/+34
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes part 2 from Chris Mason: "This has one patch from Omar to bring iterate_shared back to btrfs. We have a tree of work we queue up for directory items and it doesn't lend itself well to shared access. While we're cleaning it up, Omar has changed things to use an exclusive lock when there are delayed items" * 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
2016-06-25Merge branch 'for-linus-4.7' of ↵Linus Torvalds11-36/+57
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "I have a two part pull this time because one of the patches Dave Sterba collected needed to be against v4.7-rc2 or higher (we used rc4). I try to make my for-linus-xx branch testable on top of the last major so we can hand fixes to people on the list more easily, so I've split this pull in two. This first part has some fixes and two performance improvements that we've been testing for some time. Josef's two performance fixes are most notable. The transid tracking patch makes a big improvement on pretty much every workload" * 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: Force stripesize to the value of sectorsize btrfs: fix disk_i_size update bug when fallocate() fails Btrfs: fix error handling in map_private_extent_buffer Btrfs: fix error return code in btrfs_init_test_fs() Btrfs: don't do nocow check unless we have to btrfs: fix deadlock in delayed_ref_async_start Btrfs: track transid for delayed ref flushing
2016-06-25Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodesOmar Sandoval3-13/+34
Commit fe742fd4f90f ("Revert "btrfs: switch to ->iterate_shared()"") backed out the conversion to ->iterate_shared() for Btrfs because the delayed inode handling in btrfs_real_readdir() is racy. However, we can still do readdir in parallel if there are no delayed nodes. This is a temporary fix which upgrades the shared inode lock to an exclusive lock only when we have delayed items until we come up with a more complete solution. While we're here, rename the btrfs_{get,put}_delayed_items functions to make it very clear that they're just for readdir. Tested with xfstests and by doing a parallel kernel build: while make tinyconfig && make -j4 && git clean dqfx; do : done along with a bunch of parallel finds in another shell: while true; do for ((i=0; i<4; i++)); do find . >/dev/null & done wait done Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-06-24ceph: fix d_obtain_alias() misusesAl Viro1-7/+3
on failure d_obtain_alias() will have done iput() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-31/+17
Merge misc fixes from Andrew Morton: "Two weeks worth of fixes here" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits) init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64 autofs: don't get stuck in a loop if vfs_write() returns an error mm/page_owner: avoid null pointer dereference tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences" fs/nilfs2: fix potential underflow in call to crc32_le oom, suspend: fix oom_reaper vs. oom_killer_disable race ocfs2: disable BUG assertions in reading blocks mm, compaction: abort free scanner if split fails mm: prevent KASAN false positives in kmemleak mm/hugetlb: clear compound_mapcount when freeing gigantic pages mm/swap.c: flush lru pvecs on compound page arrival memcg: css_alloc should return an ERR_PTR value on error memcg: mem_cgroup_migrate() may be called with irq disabled hugetlb: fix nr_pmds accounting with shared page tables Revert "mm: disable fault around on emulated access bit architecture" Revert "mm: make faultaround produce old ptes" mailmap: add Boris Brezillon's email mailmap: add Antoine Tenart's email mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask mm: mempool: kasan: don't poot mempool objects in quarantine ...
2016-06-24autofs: don't get stuck in a loop if vfs_write() returns an errorAndrey Vagin1-3/+4
__vfs_write() returns a negative value in a error case. Link: http://lkml.kernel.org/r/20160616083108.6278.65815.stgit@pluto.themaw.net Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24fs/nilfs2: fix potential underflow in call to crc32_leTorsten Hilbrich1-1/+1
The value `bytes' comes from the filesystem which is about to be mounted. We cannot trust that the value is always in the range we expect it to be. Check its value before using it to calculate the length for the crc32_le call. It value must be larger (or equal) sumoff + 4. This fixes a kernel bug when accidentially mounting an image file which had the nilfs2 magic value 0x3434 at the right offset 0x406 by chance. The bytes 0x01 0x00 were stored at 0x408 and were interpreted as a s_bytes value of 1. This caused an underflow when substracting sumoff + 4 (20) in the call to crc32_le. BUG: unable to handle kernel paging request at ffff88021e600000 IP: crc32_le+0x36/0x100 ... Call Trace: nilfs_valid_sb.part.5+0x52/0x60 [nilfs2] nilfs_load_super_block+0x142/0x300 [nilfs2] init_nilfs+0x60/0x390 [nilfs2] nilfs_mount+0x302/0x520 [nilfs2] mount_fs+0x38/0x160 vfs_kern_mount+0x67/0x110 do_mount+0x269/0xe00 SyS_mount+0x9f/0x100 entry_SYSCALL_64_fastpath+0x16/0x71 Link: http://lkml.kernel.org/r/1466778587-5184-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24ocfs2: disable BUG assertions in reading blocksGang He2-2/+5
According to some high-load testing, these two BUG assertions were encountered, this led system panic. Actually, there were some discussions about removing these two BUG() assertions, it would not bring any side effect. Then, I did the the following changes, 1) use the existing macro CATCH_BH_JBD_RACES to wrap BUG() in the ocfs2_read_blocks_sync function like before. 2) disable the macro CATCH_BH_JBD_RACES in Makefile by default. Link: http://lkml.kernel.org/r/1466574294-26863-1-git-send-email-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24jbd2: get rid of superfluous __GFP_REPEATMichal Hocko1-25/+7
jbd2_alloc is explicit about its allocation preferences wrt. the allocation size. Sub page allocations go to the slab allocator and larger are using either the page allocator or vmalloc. This is all good but the logic is unnecessarily complex. 1) as per Ted, the vmalloc fallback is a left-over: : jbd2_alloc is only passed in the bh->b_size, which can't be PAGE_SIZE, so : the code path that calls vmalloc() should never get called. When we : conveted jbd2_alloc() to suppor sub-page size allocations in commit : d2eecb039368, there was an assumption that it could be called with a size : greater than PAGE_SIZE, but that's certaily not true today. Moreover vmalloc allocation might even lead to a deadlock because the callers expect GFP_NOFS context while vmalloc is GFP_KERNEL. 2) __GFP_REPEAT for requests <= PAGE_ALLOC_COSTLY_ORDER is ignored since the flag was introduced. Let's simplify the code flow and use the slab allocator for sub-page requests and the page allocator for others. Even though order > 0 is not currently used as per above leave that option open. Link: http://lkml.kernel.org/r/1464599699-30131-18-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24Merge tag 'nfsd-4.7-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds4-46/+48
Pull nfsd bugfixes from Bruce Fields: "Fix missing server-side permission checks on setting NFS ACLs" * tag 'nfsd-4.7-2' of git://linux-nfs.org/~bfields/linux: nfsd: check permissions when setting ACLs posix_acl: Add set_posix_acl
2016-06-24File names with trailing period or space need special case conversionSteve French2-4/+31
POSIX allows files with trailing spaces or a trailing period but SMB3 does not, so convert these using the normal Services For Mac mapping as we do for other reserved characters such as : < > | ? * This is similar to what Macs do for the same problem over SMB3. CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <steve.french@primarydata.com> Acked-by: Pavel Shilovsky <pshilovsky@samba.org>
2016-06-24Fix reconnect to not defer smb3 session reconnect long after socket reconnectSteve French2-1/+30
Azure server blocks clients that open a socket and don't do anything on it. In our reconnect scenarios, we can reconnect the tcp session and detect the socket is available but we defer the negprot and SMB3 session setup and tree connect reconnection until the next i/o is requested, but this looks suspicous to some servers who expect SMB3 negprog and session setup soon after a socket is created. In the echo thread, reconnect SMB3 sessions and tree connections that are disconnected. A later patch will replay persistent (and resilient) handle opens. CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <steve.french@primarydata.com> Acked-by: Pavel Shilovsky <pshilovsky@samba.org>
2016-06-24nfsd: check permissions when setting ACLsBen Hutchings3-27/+25
Use set_posix_acl, which includes proper permission checks, instead of calling ->set_acl directly. Without this anyone may be able to grant themselves permissions to a file by setting the ACL. Lock the inode to make the new checks atomic with respect to set_acl. (Also, nfsd was the only caller of set_acl not locking the inode, so I suspect this may fix other races.) This also simplifies the code, and ensures our ACLs are checked by posix_acl_valid. The permission checks and the inode locking were lost with commit 4ac7249e, which changed nfsd to use the set_acl inode operation directly instead of going through xattr handlers. Reported-by: David Sinquin <david@sinquin.eu> [agreunba@redhat.com: use set_posix_acl] Fixes: 4ac7249e Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-24posix_acl: Add set_posix_aclAndreas Gruenbacher1-19/+23
Factor out part of posix_acl_xattr_set into a common function that takes a posix_acl, which nfsd can also call. The prototype already exists in include/linux/posix_acl.h. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: stable@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-24NFS: Fix an unused variable warningTrond Myklebust1-2/+0
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24NFS: Fix potential race in nfs_fhget()Trond Myklebust1-0/+1
If we don't set the mode correctly in nfs_init_locked(), then there is potential for a race with a second call to nfs_fhget that will cause inode aliasing. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24NFS: Don't let readdirplus revalidate an inode that was marked as staleTrond Myklebust1-1/+6
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24NFSv4.1/pnfs: Mark the layout stateid invalid when all segments are removedTrond Myklebust1-1/+3
According to RFC5661, section 12.5.3. the layout stateid is no longer valid once the client no longer holds any layout segments. Ensure that we mark it invalid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24NFS: Fix a double page unlockTrond Myklebust1-2/+2
Since commit 0bcbf039f6b2, nfs_readpage_release() has been used to unlock the page in the read code. Fixes: 0bcbf039f6b2 ("nfs: handle request add failure properly") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24pnfs_nfs: fix _cancel_empty_pagelistWeston Andros Adamson1-2/+10
pnfs_generic_commit_cancel_empty_pagelist calls nfs_commitdata_release, but that is wrong: nfs_commitdata_release puts the open context, something that isn't valid until nfs_init_commit is called, which is never the case when pnfs_generic_commit_cancel_empty_pagelist is called. This was introduced in "nfs: avoid race that crashes nfs_init_commit". Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24nfs4: Fix potential use after free of state in nfs4_do_reclaim.Oleg Drokin1-1/+1
Commit e8d975e73e5f ("fixing infinite OPEN loop in 4.0 stateid recovery") introduced access to state after it was just potentially freed by nfs4_put_open_state leading to a random data corruption somewhere. BUG: unable to handle kernel paging request at ffff88004941ee40 IP: [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740 PGD 3501067 PUD 3504067 PMD 6ff37067 PTE 800000004941e060 Oops: 0002 [#1] SMP DEBUG_PAGEALLOC Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq tpm_tis joydev i2c_piix4 pcspkr tpm virtio_console nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops floppy serio_raw virtio_blk drm CPU: 6 PID: 2161 Comm: 192.168.10.253- Not tainted 4.7.0-rc1-vm-nfs+ #112 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800463dcd00 ti: ffff88003ff48000 task.ti: ffff88003ff48000 RIP: 0010:[<ffffffff813baf01>] [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740 RSP: 0018:ffff88003ff4bd68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffffff81a49900 RCX: 00000000000000e8 RDX: 00000000000000e8 RSI: ffff8800418b9930 RDI: ffff880040c96c88 RBP: ffff88003ff4bdf8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880040c96c98 R13: ffff88004941ee20 R14: ffff88004941ee40 R15: ffff88004941ee00 FS: 0000000000000000(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88004941ee40 CR3: 0000000060b0b000 CR4: 00000000000006e0 Stack: ffffffff813baad5 ffff8800463dcd00 ffff880000000001 ffffffff810e6b68 ffff880043ddbc88 ffff8800418b9800 ffff8800418b98c8 ffff88004941ee48 ffff880040c96c90 ffff880040c96c00 ffff880040c96c20 ffff880040c96c40 Call Trace: [<ffffffff813baad5>] ? nfs4_do_reclaim+0x35/0x740 [<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0 [<ffffffff813bb7cd>] nfs4_run_state_manager+0x5ed/0xa40 [<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740 [<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740 [<ffffffff810af0d1>] kthread+0x101/0x120 [<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0 [<ffffffff818843af>] ret_from_fork+0x1f/0x40 [<ffffffff810aefd0>] ? kthread_create_on_node+0x250/0x250 Code: 65 80 4c 8b b5 78 ff ff ff e8 fc 88 4c 00 48 8b 7d 88 e8 13 67 d2 ff 49 8b 47 40 a8 02 0f 84 d3 01 00 00 4c 89 ff e8 7f f9 ff ff <f0> 41 80 26 7f 48 8b 7d c8 e8 b1 84 4c 00 e9 39 fd ff ff 3d e6 RIP [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740 RSP <ffff88003ff4bd68> CR2: ffff88004941ee40 Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24NFS: Fix up O_DIRECT resultsTrond Myklebust1-3/+7
if we read or wrote something, we must report it Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24NFS/pnfs: handle bad delegation stateids in nfs4_layoutget_handle_exceptionTrond Myklebust1-4/+4
We must call nfs4_handle_exception() on BAD_STATEID errors. The only exception is if the stateid argument turns out to be a layout stateid that is declared invalid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24NFSv4.1/pnfs: Add sparse lock annotations for pnfs_find_alloc_layoutTrond Myklebust1-0/+2
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>