summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2015-11-27Merge branch 'for-linus-4.4' of ↵Linus Torvalds12-66/+219
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This has Mark Fasheh's patches to fix quota accounting during subvol deletion, which we've been working on for a while now. The patch is pretty small but it's a key fix. Otherwise it's a random assortment" * 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: fix balance range usage filters in 4.4-rc btrfs: qgroup: account shared subtree during snapshot delete Btrfs: use btrfs_get_fs_root in resolve_indirect_ref btrfs: qgroup: fix quota disable during rescan Btrfs: fix race between cleaner kthread and space cache writeout Btrfs: fix scrub preventing unused block groups from being deleted Btrfs: fix race between scrub and block group deletion btrfs: fix rcu warning during device replace btrfs: Continue replace when set_block_ro failed btrfs: fix clashing number of the enhanced balance usage filter Btrfs: fix the number of transaction units needed to remove a block group Btrfs: use global reserve when deleting unused block group after ENOSPC Btrfs: tests: checking for NULL instead of IS_ERR() btrfs: fix signed overflows in btrfs_sync_file
2015-11-25btrfs: fix balance range usage filters in 4.4-rcHolger Hoffstätte1-2/+2
There's a regression in 4.4-rc since commit bc3094673f22 (btrfs: extend balance filter usage to take minimum and maximum) in that existing (non-ranged) balance with -dusage=x no longer works; all chunks are skipped. After staring at the code for a while and wondering why a non-ranged balance would even need min and max thresholds (..which then were not set correctly, leading to the bug) I realized that the only problem was the fact that the filter functions were named wrong, thanks to patching copypasta. Simply renaming both functions lets the existing btrfs-progs call balance with -dusage=x and now the non-ranged filter function is invoked, properly using only a single chunk limit. Signed-off-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Fixes: bc3094673f22 ("btrfs: extend balance filter usage to take minimum and maximum") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25btrfs: qgroup: account shared subtree during snapshot deleteMark Fasheh2-7/+42
Commit 0ed4792 ('btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.') removed our qgroup accounting during btrfs_drop_snapshot(). Predictably, this results in qgroup numbers going bad shortly after a snapshot is removed. Fix this by adding a dirty extent record when we encounter extents during our shared subtree walk. This effectively restores the functionality we had with the original shared subtree walking code in 1152651 (btrfs: qgroup: account shared subtrees during snapshot delete). The idea with the original patch (and this one) is that shared subtrees can get skipped during drop_snapshot. The shared subtree walk then allows us a chance to visit those extents and add them to the qgroup work for later processing. This ultimately makes the accounting for drop snapshot work. The new qgroup code nicely handles all the other extents during the tree walk via the ref dec/inc functions so we don't have to add actions beyond what we had originally. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25Btrfs: use btrfs_get_fs_root in resolve_indirect_refJosef Bacik1-1/+1
The backref code will look up the fs_root we're trying to resolve our indirect refs for, unfortunately we use btrfs_read_fs_root_no_name, which returns -ENOENT if the ref is 0. This isn't helpful for the qgroup stuff with snapshot delete as it won't be able to search down the snapshot we are deleting, which will cause us to miss roots. So use btrfs_get_fs_root and send false for check_ref so we can always get the root we're looking for. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25btrfs: qgroup: fix quota disable during rescanJustin Maggard1-1/+2
There's a race condition that leads to a NULL pointer dereference if you disable quotas while a quota rescan is running. To fix this, we just need to wait for the quota rescan worker to actually exit before tearing down the quota structures. Signed-off-by: Justin Maggard <jmaggard@netgear.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25Btrfs: fix race between cleaner kthread and space cache writeoutFilipe Manana1-13/+16
When a block group becomes unused and the cleaner kthread is currently running, we can end up getting the current transaction aborted with error -ENOENT when we try to commit the transaction, leading to the following trace: [59779.258768] WARNING: CPU: 3 PID: 5990 at fs/btrfs/extent-tree.c:3740 btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]() [59779.272594] BTRFS: Transaction aborted (error -2) (...) [59779.291137] Call Trace: [59779.291621] [<ffffffff812566f4>] dump_stack+0x4e/0x79 [59779.292543] [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8 [59779.293435] [<ffffffffa04cb81f>] ? btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs] [59779.295000] [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50 [59779.296138] [<ffffffffa04c2721>] ? write_one_cache_group.isra.32+0x77/0x82 [btrfs] [59779.297663] [<ffffffffa04cb81f>] btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs] [59779.299141] [<ffffffffa0549b0d>] commit_cowonly_roots+0x1de/0x261 [btrfs] [59779.300359] [<ffffffffa04dd5b6>] btrfs_commit_transaction+0x4c4/0x99c [btrfs] [59779.301805] [<ffffffffa04b5df4>] btrfs_sync_fs+0x145/0x1ad [btrfs] [59779.302893] [<ffffffff81196634>] sync_filesystem+0x7f/0x93 (...) [59779.318186] ---[ end trace 577e2daff90da33a ]--- The following diagram illustrates a sequence of steps leading to this problem: CPU 1 CPU 2 <at transaction N> adds bg A to list fs_info->unused_bgs adds bg B to list fs_info->unused_bgs <transaction kthread commits transaction N and wakes up the cleaner kthread> cleaner kthread delete_unused_bgs() sees bg A in list fs_info->unused_bgs btrfs_start_transaction() <transaction N + 1 starts> deletes bg A update_block_group(bg C) --> adds bg C to list fs_info->unused_bgs deletes bg B sees bg C in the list fs_info->unused_bgs btrfs_remove_chunk(bg C) btrfs_remove_block_group(bg C) --> checks if the block group is in a dirty list, and because it isn't now, it does nothing --> the block group item is deleted from the extent tree --> adds bg C to list transaction->dirty_bgs some task calls btrfs_commit_transaction(t N + 1) commit_cowonly_roots() btrfs_write_dirty_block_groups() --> sees bg C in cur_trans->dirty_bgs --> calls write_one_cache_group() which returns -ENOENT because it did not find the block group item in the extent tree --> transaction aborte with -ENOENT because write_one_cache_group() returned that error So fix this by adding a block group to the list of dirty block groups before adding it to the list of unused block groups. This happened on a stress test using fsstress plus concurrent calls to fallocate 20G and truncate (releasing part of the space allocated with fallocate). Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25Btrfs: fix scrub preventing unused block groups from being deletedFilipe Manana3-1/+24
Currently scrub can race with the cleaner kthread when the later attempts to delete an unused block group, and the result is preventing the cleaner kthread from ever deleting later the block group - unless the block group becomes used and unused again. The following diagram illustrates that race: CPU 1 CPU 2 cleaner kthread btrfs_delete_unused_bgs() gets block group X from fs_info->unused_bgs and removes it from that list scrub_enumerate_chunks() searches device tree using its commit root finds device extent for block group X gets block group X from the tree fs_info->block_group_cache_tree (via btrfs_lookup_block_group()) sets bg X to RO sees the block group is already RO and therefore doesn't delete it nor adds it back to unused list So fix this by making scrub add the block group again to the list of unused block groups if the block group is still unused when it finished scrubbing it and it hasn't been removed already. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25Btrfs: fix race between scrub and block group deletionFilipe Manana1-4/+16
Scrub can race with the cleaner kthread deleting block groups that are unused (and with relocation too) leading to a failure with error -EINVAL that gets returned to user space. The following diagram illustrates how it happens: CPU 1 CPU 2 cleaner kthread btrfs_delete_unused_bgs() gets block group X from fs_info->unused_bgs sets block group to RO btrfs_remove_chunk(bg X) deletes device extents scrub_enumerate_chunks() searches device tree using its commit root finds device extent for block group X gets block group X from the tree fs_info->block_group_cache_tree (via btrfs_lookup_block_group()) sets bg X to RO (again) btrfs_remove_block_group(bg X) deletes block group from fs_info->block_group_cache_tree removes extent map from fs_info->mapping_tree scrub_chunk(offset X) searches fs_info->mapping_tree for extent map starting at offset X --> doesn't find any such extent map --> returns -EINVAL and scrub errors out to userspace with -EINVAL Fix this by dealing with an extent map lookup failure as an indicator of block group deletion. Issue reproduced with fstest btrfs/071. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25btrfs: fix rcu warning during device replaceDavid Sterba1-4/+2
The test btrfs/011 triggers a rcu warning Reviewed-by: Anand Jain <anand.jain@oracle.com> =============================== [ INFO: suspicious RCU usage. ] 4.4.0-rc1-default+ #286 Tainted: G W ------------------------------- fs/btrfs/volumes.c:1977 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 4 locks held by btrfs/28786: 0: (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+...}, at: [<ffffffffa00bc785>] btrfs_dev_replace_finishing+0x45/0xa00 [btrfs] 1: (uuid_mutex){+.+.+.}, at: [<ffffffffa00bc84f>] btrfs_dev_replace_finishing+0x10f/0xa00 [btrfs] 2: (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa00bc868>] btrfs_dev_replace_finishing+0x128/0xa00 [btrfs] 3: (&fs_info->chunk_mutex){+.+...}, at: [<ffffffffa00bc87d>] btrfs_dev_replace_finishing+0x13d/0xa00 [btrfs] stack backtrace: CPU: 0 PID: 28786 Comm: btrfs Tainted: G W 4.4.0-rc1-default+ #286 Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS ASNBCPT1.86C.0031.B00.1006301607 06/30/2010 0000000000000001 ffff8800a07dfb48 ffffffff8141d47b 0000000000000001 0000000000000001 0000000000000000 ffff8801464a4f00 ffff8800a07dfb78 ffffffff810cd883 ffff880146eb9400 ffff8800a3698600 ffff8800a33fe220 Call Trace: [<ffffffff8141d47b>] dump_stack+0x4f/0x74 [<ffffffff810cd883>] lockdep_rcu_suspicious+0x103/0x140 [<ffffffffa0071261>] btrfs_rm_dev_replace_remove_srcdev+0x111/0x130 [btrfs] [<ffffffff810d354d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81449536>] ? __percpu_counter_sum+0x66/0x80 [<ffffffffa00bcc15>] btrfs_dev_replace_finishing+0x4d5/0xa00 [btrfs] [<ffffffffa00bc96e>] ? btrfs_dev_replace_finishing+0x22e/0xa00 [btrfs] [<ffffffffa00a8795>] ? btrfs_scrub_dev+0x415/0x6d0 [btrfs] [<ffffffffa003ea69>] ? btrfs_start_transaction+0x9/0x20 [btrfs] [<ffffffffa00bda79>] btrfs_dev_replace_start+0x339/0x590 [btrfs] [<ffffffff81196aa5>] ? __might_fault+0x95/0xa0 [<ffffffffa0078638>] btrfs_ioctl_dev_replace+0x118/0x160 [btrfs] [<ffffffff811409c6>] ? stack_trace_call+0x46/0x70 [<ffffffffa007c914>] ? btrfs_ioctl+0x24/0x1770 [btrfs] [<ffffffffa007ce43>] btrfs_ioctl+0x553/0x1770 [btrfs] [<ffffffff811409c6>] ? stack_trace_call+0x46/0x70 [<ffffffff811d6eb1>] ? do_vfs_ioctl+0x21/0x5a0 [<ffffffff811d6f1c>] do_vfs_ioctl+0x8c/0x5a0 [<ffffffff811e3336>] ? __fget_light+0x86/0xb0 [<ffffffff811e3369>] ? __fdget+0x9/0x20 [<ffffffff811d7451>] ? SyS_ioctl+0x21/0x80 [<ffffffff811d7483>] SyS_ioctl+0x53/0x80 [<ffffffff81b1efd7>] entry_SYSCALL_64_fastpath+0x12/0x6f This is because of unprotected use of rcu_dereference in btrfs_scratch_superblocks. We can't add rcu locks around the whole function because we read the superblock. The fix will use the rcu string buffer directly without the rcu locking. Thi is safe as the device will not go away in the meantime. We're holding the device list mutexes. Restructuring the code to narrow down the rcu section turned out to be impossible, we need to call filp_open (through update_dev_time) on the buffer and this could call kmalloc/__might_sleep. We could call kstrdup with GFP_ATOMIC but it's not absolutely necessary. Fixes: 12b1c2637b6e (Btrfs: enhance btrfs_scratch_superblock to scratch all superblocks) Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25btrfs: Continue replace when set_block_ro failedZhaolei1-2/+18
xfstests/011 failed in node with small_size filesystem. Can be reproduced by following script: DEV_LIST="/dev/vdd /dev/vde" DEV_REPLACE="/dev/vdf" do_test() { local mkfs_opt="$1" local size="$2" dmesg -c >/dev/null umount $SCRATCH_MNT &>/dev/null echo mkfs.btrfs -f $mkfs_opt "${DEV_LIST[*]}" mkfs.btrfs -f $mkfs_opt "${DEV_LIST[@]}" || return 1 mount "${DEV_LIST[0]}" $SCRATCH_MNT echo -n "Writing big files" dd if=/dev/urandom of=$SCRATCH_MNT/t0 bs=1M count=1 >/dev/null 2>&1 for ((i = 1; i <= size; i++)); do echo -n . /bin/cp $SCRATCH_MNT/t0 $SCRATCH_MNT/t$i || return 1 done echo echo Start replace btrfs replace start -Bf "${DEV_LIST[0]}" "$DEV_REPLACE" $SCRATCH_MNT || { dmesg return 1 } return 0 } # Set size to value near fs size # for example, 1897 can trigger this bug in 2.6G device. # ./do_test "-d raid1 -m raid1" 1897 System will report replace fail with following warning in dmesg: [ 134.710853] BTRFS: dev_replace from /dev/vdd (devid 1) to /dev/vdf started [ 135.542390] BTRFS: btrfs_scrub_dev(/dev/vdd, 1, /dev/vdf) failed -28 [ 135.543505] ------------[ cut here ]------------ [ 135.544127] WARNING: CPU: 0 PID: 4080 at fs/btrfs/dev-replace.c:428 btrfs_dev_replace_start+0x398/0x440() [ 135.545276] Modules linked in: [ 135.545681] CPU: 0 PID: 4080 Comm: btrfs Not tainted 4.3.0 #256 [ 135.546439] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 [ 135.547798] ffffffff81c5bfcf ffff88003cbb3d28 ffffffff817fe7b5 0000000000000000 [ 135.548774] ffff88003cbb3d60 ffffffff810a88f1 ffff88002b030000 00000000ffffffe4 [ 135.549774] ffff88003c080000 ffff88003c082588 ffff88003c28ab60 ffff88003cbb3d70 [ 135.550758] Call Trace: [ 135.551086] [<ffffffff817fe7b5>] dump_stack+0x44/0x55 [ 135.551737] [<ffffffff810a88f1>] warn_slowpath_common+0x81/0xc0 [ 135.552487] [<ffffffff810a89e5>] warn_slowpath_null+0x15/0x20 [ 135.553211] [<ffffffff81448c88>] btrfs_dev_replace_start+0x398/0x440 [ 135.554051] [<ffffffff81412c3e>] btrfs_ioctl+0x1d2e/0x25c0 [ 135.554722] [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0 [ 135.555506] [<ffffffff8111ab36>] ? current_kernel_time64+0x56/0xa0 [ 135.556304] [<ffffffff81201e3d>] do_vfs_ioctl+0x30d/0x580 [ 135.557009] [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0 [ 135.557855] [<ffffffff810011d1>] ? do_audit_syscall_entry+0x61/0x70 [ 135.558669] [<ffffffff8120d1c1>] ? __fget_light+0x61/0x90 [ 135.559374] [<ffffffff81202124>] SyS_ioctl+0x74/0x80 [ 135.559987] [<ffffffff81809857>] entry_SYSCALL_64_fastpath+0x12/0x6f [ 135.560842] ---[ end trace 2a5c1fc3205abbdd ]--- Reason: When big data writen to fs, the whole free space will be allocated for data chunk. And operation as scrub need to set_block_ro(), and when there is only one metadata chunk in system(or other metadata chunks are all full), the function will try to allocate a new chunk, and failed because no space in device. Fix: When set_block_ro failed for metadata chunk, it is not a problem because scrub_lock paused commit_trancaction in same time, and metadata are always cowed, so the on-the-fly writepages will not write data into same place with scrub/replace. Let replace continue in this case is no problem. Tested by above script, and xfstests/011, plus 100 times xfstests/070. Changelog v1->v2: 1: Add detail comments in source and commit-message. 2: Add dmesg detail into commit-message. 3: Limit return value of -ENOSPC to be passed. All suggested by: Filipe Manana <fdmanana@gmail.com> Suggested-by: Filipe Manana <fdmanana@gmail.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25btrfs: fix clashing number of the enhanced balance usage filterDavid Sterba1-1/+1
I've accidentally picked an already used number for the enhanced usage filter represented by BTRFS_BALANCE_ARGS_USAGE_RANGE, clashing with BTRFS_BALANCE_ARGS_CONVERT. Introduced during the development phase, no backward compatibility issues. Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: bc3094673f22 ("btrfs: extend balance filter usage to take minimum and maximum") Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25Btrfs: fix the number of transaction units needed to remove a block groupFilipe Manana3-5/+38
We were using only 1 transaction unit when attempting to delete an unused block group but in reality we need 3 + N units, where N corresponds to the number of stripes. We were accounting only for the addition of the orphan item (for the block group's free space cache inode) but we were not accounting that we need to delete one block group item from the extent tree, one free space item from the tree of tree roots and N device extent items from the device tree. While one unit is not enough, it worked most of the time because for each single unit we are too pessimistic and assume an entire tree path, with the highest possible heigth (8), needs to be COWed with eventual node splits at every possible level in the tree, so there was usually enough reserved space for removing all the items and adding the orphan item. However after adding the orphan item, writepages() can by called by the VM subsystem against the btree inode when we are under memory pressure, which causes writeback to start for the nodes we COWed before, this forces the operation to remove the free space item to COW again some (or all of) the same nodes (in the tree of tree roots). Even without writepages() being called, we could fail with ENOSPC because these items are located in multiple trees and one of them might have a higher heigth and require node/leaf splits at many levels, exhausting all the reserved space before removing all the items and adding the orphan. In the kernel 4.0 release, commit 3d84be799194 ("Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group"), we attempted to fix a BUG_ON due to ENOSPC when trying to add the orphan item by making the cleaner kthread reserve one transaction unit before attempting to remove the block group, but this was not enough. We had a couple user reports still hitting the same BUG_ON after 4.0, like Stefan Priebe's report on a 4.2-rc6 kernel for example: http://www.spinics.net/lists/linux-btrfs/msg46070.html So fix this by reserving all the necessary units of metadata. Reported-by: Stefan Priebe <s.priebe@profihost.ag> Fixes: 3d84be799194 ("Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25Btrfs: use global reserve when deleting unused block group after ENOSPCFilipe Manana6-26/+52
It's possible to reach a state where the cleaner kthread isn't able to start a transaction to delete an unused block group due to lack of enough free metadata space and due to lack of unallocated device space to allocate a new metadata block group as well. If this happens try to use space from the global block group reserve just like we do for unlink operations, so that we don't reach a permanent state where starting a transaction for filesystem operations (file creation, renames, etc) keeps failing with -ENOSPC. Such an unfortunate state was observed on a machine where over a dozen unused data block groups existed and the cleaner kthread was failing to delete them due to ENOSPC error when attempting to start a transaction, and even running balance with a -dusage=0 filter failed with ENOSPC as well. Also unmounting and mounting again the filesystem didn't help. Allowing the cleaner kthread to use the global block reserve to delete the unused data block groups fixed the problem. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25Btrfs: tests: checking for NULL instead of IS_ERR()Dan Carpenter1-1/+3
btrfs_alloc_dummy_root() return an error pointer on failure, it never returns NULL. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25btrfs: fix signed overflows in btrfs_sync_fileDavid Sterba1-3/+7
The calculation of range length in btrfs_sync_file leads to signed overflow. This was caught by PaX gcc SIZE_OVERFLOW plugin. https://forums.grsecurity.net/viewtopic.php?f=1&t=4284 The fsync call passes 0 and LLONG_MAX, the range length does not fit to loff_t and overflows, but the value is converted to u64 so it silently works as expected. The minimal fix is a typecast to u64, switching functions to take (start, end) instead of (start, len) would be more intrusive. Coccinelle script found that there's one more opencoded calculation of the length. <smpl> @@ loff_t start, end; @@ * end - start </smpl> CC: stable@vger.kernel.org Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-23vfs: Avoid softlockups with sendfile(2)Jan Kara1-0/+1
The following test program from Dmitry can cause softlockups or RCU stalls as it copies 1GB from tmpfs into eventfd and we don't have any scheduling point at that path in sendfile(2) implementation: int r1 = eventfd(0, 0); int r2 = memfd_create("", 0); unsigned long n = 1<<30; fallocate(r2, 0, 0, n); sendfile(r1, r2, 0, n); Add cond_resched() into __splice_from_pipe() to fix the problem. CC: Dmitry Vyukov <dvyukov@google.com> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-23vfs: Make sendfile(2) killable even betterJan Kara1-0/+7
Commit 296291cdd162 (mm: make sendfile(2) killable) fixed an issue where sendfile(2) was doing a lot of tiny writes into a filesystem and thus was unkillable for a long time. However sendfile(2) can be (mis)used to issue lots of writes into arbitrary file descriptor such as evenfd or similar special file descriptors which never hit the standard filesystem write path and thus are still unkillable. E.g. the following example from Dmitry burns CPU for ~16s on my test system without possibility to be killed: int r1 = eventfd(0, 0); int r2 = memfd_create("", 0); unsigned long n = 1<<30; fallocate(r2, 0, 0, n); sendfile(r1, r2, 0, n); There are actually quite a few tests for pending signals in sendfile code however we data to write is always available none of them seems to trigger. So fix the problem by adding a test for pending signal into splice_from_pipe_next() also before the loop waiting for pipe buffers to be available. This should fix all the lockup issues with sendfile of the do-ton-of-tiny-writes nature. CC: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-23fix sysvfs symlinksAl Viro1-9/+2
The thing got broken back in 2002 - sysvfs does *not* have inline symlinks; even short ones have bodies stored in the first block of file. sysv_symlink() handles that correctly; unfortunately, attempting to look an existing symlink up will end up confusing them for inline symlinks, and interpret the block number containing the body as the body itself. Nobody has noticed until now, which says something about the level of testing sysvfs gets ;-/ Cc: stable@vger.kernel.org # all of them, not that anyone cared Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-21Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-38/+157
Merge misc fixes from Andrew Morton: "A bunch of fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG slub: avoid irqoff/on in bulk allocation slub: create new ___slab_alloc function that can be called with irqs disabled mm: fix up sparse warning in gfpflags_allow_blocking ocfs2: fix umask ignored issue PM/OPP: add entry in MAINTAINERS kernel/panic.c: turn off locks debug before releasing console lock kernel/signal.c: unexport sigsuspend() kasan: fix kmemleak false-positive in kasan_module_alloc() fat: fix fake_offset handling on error path mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes mm/page-writeback.c: initialize m_dirty to avoid compile warning various: fix pci_set_dma_mask return value checking mm: loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390 mm: vmalloc: don't remove inexistent guard hole in remove_vm_area() tools/vm/page-types.c: support KPF_IDLE ncpfs: don't allow negative timeouts configfs: allow dynamic group creation MAINTAINERS: add Moritz as reviewer for FPGA Manager Framework slab.h: sprinkle __assume_aligned attributes
2015-11-20ocfs2: fix umask ignored issueJunxiao Bi1-0/+2
New created file's mode is not masked with umask, and this makes umask not work for ocfs2 volume. Fixes: 702e5bc ("ocfs2: use generic posix ACL infrastructure") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Gang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20fat: fix fake_offset handling on error pathOGAWA Hirofumi1-5/+11
For the root directory, . and .. are faked (using dir_emit_dots()) and ctx->pos is reset from 2 to 0. A corrupted root directory could cause fat_get_entry() to fail, but ->iterate() (fat_readdir()) reports progress to the VFS (with ctx->pos rewound to 0), so any following calls to ->iterate() continue to return the same entries again and again. The result is that userspace will never see the end of the directory, causing e.g. 'ls' to hang in a getdents() loop. [hirofumi@mail.parknet.co.jp: cleanup and make sure to correct fake_offset] Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Richard Weinberger <richard.weinberger@gmail.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holesMike Kravetz1-33/+32
Hugh Dickins pointed out problems with the new hugetlbfs fallocate hole punch code. These problems are in the routine remove_inode_hugepages and mostly occur in the case where there are holes in the range of pages to be removed. These holes could be the result of a previous hole punch or simply sparse allocation. The current code could access pages outside the specified range. remove_inode_hugepages handles both hole punch and truncate operations. Page index handling was fixed/cleaned up so that the loop index always matches the page being processed. The code now only makes a single pass through the range of pages as it was determined page faults could not race with truncate. A cond_resched() was added after removing up to PAGEVEC_SIZE pages. Some totally unnecessary code in hugetlbfs_fallocate() that remained from early development was also removed. Tested with fallocate tests submitted here: http://librelist.com/browser//libhugetlbfs/2015/6/25/patch-tests-add-tests-for-fallocate-system-call/ And, some ftruncate tests under development Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Hillf Danton" <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> [4.3] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20ncpfs: don't allow negative timeoutsDan Carpenter1-0/+2
This code causes a static checker warning because it's a user controlled variable where we cap the upper bound but not the lower bound. Let's return an -EINVAL for negative timeouts. [akpm@linux-foundation.org: remove unneeded `else'] Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jan Kara <jack@suse.com> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: David Howells <dhowells@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20configfs: allow dynamic group creationDaniel Baluta1-0/+110
This patchset introduces IIO software triggers, offers a way of configuring them via configfs and adds the IIO hrtimer based interrupt source to be used with software triggers. The architecture is now split in 3 parts, to remove all IIO trigger specific parts from IIO configfs core: (1) IIO configfs - creates the root of the IIO configfs subsys. (2) IIO software triggers - software trigger implementation, dynamically creating /config/iio/triggers group. (3) IIO hrtimer trigger - is the first interrupt source for software triggers (with syfs to follow). Each trigger type can implement its own set of attributes. Lockdep seems to be happy with the locking in configfs patch. This patch (of 5): We don't want to hardcode default groups at subsystem creation time. We export: * configfs_register_group * configfs_unregister_group to allow drivers to programatically create/destroy groups later, after module init time. This is needed for IIO configfs support. (akpm: the other 4 patches to be merged via the IIO tree) Signed-off-by: Daniel Baluta <daniel.baluta@intel.com> Suggested-by: Lars-Peter Clausen <lars@metafoo.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Hartmut Knaack <knaack.h@gmx.de> Cc: Octavian Purdila <octavian.purdila@intel.com> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: Adriana Reus <adriana.reus@intel.com> Cc: Cristina Opriceana <cristina.opriceana@gmail.com> Cc: Peter Meerwald <pmeerw@pmeerw.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds5-3/+33
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: - A collection of crash and deadlock fixes for DAX that are also tagged for -stable. We will look to re-enable DAX pmd mappings in 4.5, but for now 4.4 and -stable should disable it by default. - A fixup to ext2 and ext4 to mirror the same warning emitted by XFS when mounting with "-o dax" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: block: protect rw_page against device teardown mm, dax: fix DAX deadlocks (COW fault) dax: disable pmd mappings ext2, ext4: warn when mounting with dax enabled
2015-11-19block: protect rw_page against device teardownDan Williams1-2/+16
Fix use after free crashes like the following: general protection fault: 0000 [#1] SMP Call Trace: [<ffffffffa0050216>] ? pmem_do_bvec.isra.12+0xa6/0xf0 [nd_pmem] [<ffffffffa0050ba2>] pmem_rw_page+0x42/0x80 [nd_pmem] [<ffffffff8128fd90>] bdev_read_page+0x50/0x60 [<ffffffff812972f0>] do_mpage_readpage+0x510/0x770 [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20 [<ffffffff811d86dc>] ? lru_cache_add+0x1c/0x50 [<ffffffff81297657>] mpage_readpages+0x107/0x170 [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20 [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20 [<ffffffff8129058d>] blkdev_readpages+0x1d/0x20 [<ffffffff811d615f>] __do_page_cache_readahead+0x28f/0x310 [<ffffffff811d6039>] ? __do_page_cache_readahead+0x169/0x310 [<ffffffff811c5abd>] ? pagecache_get_page+0x2d/0x1d0 [<ffffffff811c76f6>] filemap_fault+0x396/0x530 [<ffffffff811f816e>] __do_fault+0x4e/0xf0 [<ffffffff811fce7d>] handle_mm_fault+0x11bd/0x1b50 Cc: <stable@vger.kernel.org> Cc: Jens Axboe <axboe@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Matthew Wilcox <willy@linux.intel.com> [willy: symmetry fixups] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-16dax: disable pmd mappingsDan Williams2-0/+10
While dax pmd mappings are functional in the nominal path they trigger kernel crashes in the following paths: BUG: unable to handle kernel paging request at ffffea0004098000 IP: [<ffffffff812362f7>] follow_trans_huge_pmd+0x117/0x3b0 [..] Call Trace: [<ffffffff811f6573>] follow_page_mask+0x2d3/0x380 [<ffffffff811f6708>] __get_user_pages+0xe8/0x6f0 [<ffffffff811f7045>] get_user_pages_unlocked+0x165/0x1e0 [<ffffffff8106f5b1>] get_user_pages_fast+0xa1/0x1b0 kernel BUG at arch/x86/mm/gup.c:131! [..] Call Trace: [<ffffffff8106f34c>] gup_pud_range+0x1bc/0x220 [<ffffffff8106f634>] get_user_pages_fast+0x124/0x1b0 BUG: unable to handle kernel paging request at ffffea0004088000 IP: [<ffffffff81235f49>] copy_huge_pmd+0x159/0x350 [..] Call Trace: [<ffffffff811fad3c>] copy_page_range+0x34c/0x9f0 [<ffffffff810a0daf>] copy_process+0x1b7f/0x1e10 [<ffffffff810a11c1>] _do_fork+0x91/0x590 All of these paths are interpreting a dax pmd mapping as a transparent huge page and making the assumption that the pfn is covered by the memmap, i.e. that the pfn has an associated struct page. PTE mappings do not suffer the same fate since they have the _PAGE_SPECIAL flag to cause the gup path to fault. We can do something similar for the PMD path, or otherwise defer pmd support for cases where a struct page is available. For now, 4.4-rc and -stable need to disable dax pmd support by default. For development the "depends on BROKEN" line can be removed from CONFIG_FS_DAX_PMD. Cc: <stable@vger.kernel.org> Cc: Jan Kara <jack@suse.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-16FS-Cache: Add missing initialization of ret in cachefiles_write_page()Geert Uytterhoeven1-1/+1
fs/cachefiles/rdwr.c: In function ‘cachefiles_write_page’: fs/cachefiles/rdwr.c:882: warning: ‘ret’ may be used uninitialized in this function If the jump to label "error" is taken, "ret" will indeed be uninitialized, and random stack data may be printed by the debug code. Fixes: 102f4d900c9c8f5e ("FS-Cache: Handle a write to the page immediately beyond the EOF marker") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-16ext2, ext4: warn when mounting with dax enabledDan Williams2-1/+7
Similar to XFS warn when mounting DAX while it is still considered under development. Also, aspects of the DAX implementation, for example synchronization against multiple faults and faults causing block allocation, depend on the correct implementation in the filesystem. The maturity of a given DAX implementation is filesystem specific. Cc: <stable@vger.kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: linux-ext4@vger.kernel.org Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Dave Chinner <david@fromorbit.com> Acked-by: Jan Kara <jack@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-13Merge tag 'chrome-platform-4.4' of ↵Linus Torvalds1-2/+15
git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform Pull chrome platform updates from Olof Johansson: "Here's the branch of chrome platform changes for v4.4. Some have been queued up for the full 4.3 release cycle since I forgot to send them in for that round (rebased early on to deal with fixes conflicts). Most of these enable EC communication stuff -- Pixel 2015 support, enabling building for ARM64 platforms, and a few fixes for memory leaks. There's also a patch in here to allow reading/writing the verified boot context, which depends on a sysfs patch acked by Greg" * tag 'chrome-platform-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform: platform/chrome: Fix i2c-designware adapter name platform/chrome: Support reading/writing the vboot context sysfs: Support is_visible() on binary attributes platform/chrome: cros_ec: Fix possible leak in led_rgb_store() platform/chrome: cros_ec: Fix leak in sequence_store() platform/chrome: Enable Chrome platforms on 64-bit ARM platform/chrome: cros_ec_dev - Add a platform device ID table platform/chrome: cros_ec_lpc - Add support for Google Pixel 2 platform/chrome: cros_ec_lpc - Use existing function to check EC result platform/chrome: Make depends on MFD_CROS_EC instead CROS_EC_PROTO Revert "platform/chrome: Don't make CHROME_PLATFORMS depends on X86 || ARM"
2015-11-13Merge branch 'for-next' of ↵Linus Torvalds4-595/+191
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "This series contains HCH's changes to absorb configfs attribute ->show() + ->store() function pointer usage from it's original tree-wide consumers, into common configfs code. It includes usb-gadget, target w/ drivers, netconsole and ocfs2 changes to realize the improved simplicity, that now renders the original include/target/configfs_macros.h CPP magic for fabric drivers and others, unnecessary and obsolete. And with common code in place, new configfs attributes can be added easier than ever before. Note, there are further improvements in-flight from other folks for v4.5 code in configfs land, plus number of target fixes for post -rc1 code" In the meantime, a new user of the now-removed old configfs API came in through the char/misc tree in commit 7bd1d4093c2f ("stm class: Introduce an abstraction for System Trace Module devices"). This merge resolution comes from Alexander Shishkin, who updated his stm class tracing abstraction to account for the removal of the old show_attribute and store_attribute methods in commit 517982229f78 ("configfs: remove old API") from this pull. As Alexander says about that patch: "There's no need to keep an extra wrapper structure per item and the awkward show_attribute/store_attribute item ops are no longer needed. This patch converts policy code to the new api, all the while making the code quite a bit smaller and easier on the eyes. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>" That patch was folded into the merge so that the tree should be fully bisectable. * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits) configfs: remove old API ocfs2/cluster: use per-attribute show and store methods ocfs2/cluster: move locking into attribute store methods netconsole: use per-attribute show and store methods target: use per-attribute show and store methods spear13xx_pcie_gadget: use per-attribute show and store methods dlm: use per-attribute show and store methods usb-gadget/f_serial: use per-attribute show and store methods usb-gadget/f_phonet: use per-attribute show and store methods usb-gadget/f_obex: use per-attribute show and store methods usb-gadget/f_uac2: use per-attribute show and store methods usb-gadget/f_uac1: use per-attribute show and store methods usb-gadget/f_mass_storage: use per-attribute show and store methods usb-gadget/f_sourcesink: use per-attribute show and store methods usb-gadget/f_printer: use per-attribute show and store methods usb-gadget/f_midi: use per-attribute show and store methods usb-gadget/f_loopback: use per-attribute show and store methods usb-gadget/ether: use per-attribute show and store methods usb-gadget/f_acm: use per-attribute show and store methods usb-gadget/f_hid: use per-attribute show and store methods ...
2015-11-13Merge branch 'for-linus-3' of ↵Linus Torvalds38-644/+379
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs xattr cleanups from Al Viro. * 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: f2fs: xattr simplifications squashfs: xattr simplifications 9p: xattr simplifications xattr handlers: Pass handler to operations instead of flags jffs2: Add missing capability check for listing trusted xattrs hfsplus: Remove unused xattr handler list operations ubifs: Remove unused security xattr handler vfs: Fix the posix_acl_xattr_list return value vfs: Check attribute names in posix acl xattr handers
2015-11-13Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: - three fixes tagged for -stable including a crash fix, simple performance tweak, and an invalid i/o error. - build regression fix for the nvdimm unit tests - nvdimm documentation update * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: fix __dax_pmd_fault crash libnvdimm: documentation clarifications libnvdimm, pmem: fix size trim in pmem_direct_access() libnvdimm, e820: fix numa node for e820-type-12 pmem ranges tools/testing/nvdimm, acpica: fix flag rename build breakage
2015-11-13f2fs: xattr simplificationsAndreas Gruenbacher1-12/+3
Now that the xattr handler is passed to the xattr handler operations, we have access to the attribute name prefix, so simplify f2fs_xattr_generic_list. Also, f2fs_xattr_advise_list is only ever called for f2fs_xattr_advise_handler; there is no need to double check for that. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Changman Lee <cm224.lee@samsung.com> Cc: Chao Yu <chao2.yu@samsung.com> Cc: linux-f2fs-devel@lists.sourceforge.net Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-13squashfs: xattr simplificationsAndreas Gruenbacher1-59/+31
Now that the xattr handler is passed to the xattr handler operations, we have access to the attribute name prefix, so simplify the squashfs xattr handlers a bit. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-139p: xattr simplificationsAndreas Gruenbacher8-299/+72
Now that the xattr handler is passed to the xattr handler operations, we can use the same get and set operations for the user, trusted, and security xattr namespaces. In those namespaces, we can access the full attribute name by "reattaching" the name prefix the vfs has skipped for us. Add a xattr_full_name helper to make this obvious in the code. For the "system.posix_acl_access" and "system.posix_acl_default" attributes, handler->prefix is the full attribute name; the suffix is the empty string. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: v9fs-developer@lists.sourceforge.net Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-13xattr handlers: Pass handler to operations instead of flagsAndreas Gruenbacher32-226/+306
The xattr_handler operations are currently all passed a file system specific flags value which the operations can use to disambiguate between different handlers; some file systems use that to distinguish the xattr namespace, for example. In some oprations, it would be useful to also have access to the handler prefix. To allow that, pass a pointer to the handler to operations instead of the flags value alone. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-13jffs2: Add missing capability check for listing trusted xattrsAndreas Gruenbacher1-0/+3
The vfs checks if a task has the appropriate access for get and set operations, but it cannot do that for the list operation; the file system must check for that itself. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: linux-mtd@lists.infradead.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-13hfsplus: Remove unused xattr handler list operationsAndreas Gruenbacher4-44/+0
The list operations can never be called; they are even documented to be unused. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-13ubifs: Remove unused security xattr handlerAndreas Gruenbacher3-42/+0
Ubifs installs a security xattr handler in sb->s_xattr but doesn't use the generic_{get,set,list,remove}xattr inode operations needed for processing this list of attribute handlers; the handler is never called. Instead, ubifs uses its own xattr handlers which also process security xattrs. Remove the dead code. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Richard Weinberger <richard@nod.at> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: linux-mtd@lists.infradead.org Cc: Subodh Nijsure <snijsure@grid-net.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-13vfs: Fix the posix_acl_xattr_list return valueAndreas Gruenbacher1-3/+1
When a filesystem that contains POSIX ACLs is mounted without ACL support (-o noacl), the appropriate behavior is not to list any existing POSIX ACL xattrs. The return value for list xattr handlers in this case is 0, not an error code: several filesystems that use the POSIX ACL xattr handlers do not expect the list operation to fail. Symlinks cannot have ACLs, so posix_acl_xattr_list will never be called for symlinks in the first place. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-13vfs: Check attribute names in posix acl xattr handersAndreas Gruenbacher1-0/+4
The get and set operations of the POSIX ACL xattr handlers failed to check the attribute names, so all names with "system.posix_acl_access" or "system.posix_acl_default" as a prefix were accepted. Reject invalid names from now on. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-13Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds9-25/+287
Pull SMB3 updates from Steve French: "A collection of SMB3 patches adding some reliability features (persistent and resilient handles) and improving SMB3 copy offload. I will have some additional patches for SMB3 encryption and SMB3.1.1 signing (important security features), and also for improving SMB3 persistent handle reconnection (setting ChannelSequence number e.g.) that I am still working on but wanted to get this set in since they can stand alone" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: Allow copy offload (CopyChunk) across shares Add resilienthandles mount parm [SMB3] Send durable handle v2 contexts when use of persistent handles required [SMB3] Display persistenthandles in /proc/mounts for SMB3 shares if enabled [SMB3] Enable checking for continuous availability and persistent handle support [SMB3] Add parsing for new mount option controlling persistent handles Allow duplicate extents in SMB3 not just SMB3.1.1
2015-11-13Merge branch 'for-linus-4.4' of ↵Linus Torvalds8-146/+163
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes and cleanups from Chris Mason: "Some of this got cherry-picked from a github repo this week, but I verified the patches. We have three small scrub cleanups and a collection of fixes" * 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: Use fs_info directly in btrfs_delete_unused_bgs btrfs: Fix lost-data-profile caused by balance bg btrfs: Fix lost-data-profile caused by auto removing bg btrfs: Remove len argument from scrub_find_csum btrfs: Reduce unnecessary arguments in scrub_recheck_block btrfs: Use scrub_checksum_data and scrub_checksum_tree_block for scrub_recheck_block_checksum btrfs: Reset sblock->xxx_error stats before calling scrub_recheck_block_checksum btrfs: scrub: setup all fields for sblock_to_check btrfs: scrub: set error stats when tree block spanning stripes Btrfs: fix race when listing an inode's xattrs Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow Btrfs: fix race leading to incorrect item deletion when dropping extents Btrfs: fix sleeping inside atomic context in qgroup rescan worker Btrfs: fix race waiting for qgroup rescan worker btrfs: qgroup: exit the rescan worker during umount Btrfs: fix extent accounting for partial direct IO writes
2015-11-13Merge branch 'for-linus' of ↵Linus Torvalds7-66/+161
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There are several patches from Ilya fixing RBD allocation lifecycle issues, a series adding a nocephx_sign_messages option (and associated bug fixes/cleanups), several patches from Zheng improving the (directory) fsync behavior, a big improvement in IO for direct-io requests when striping is enabled from Caifeng, and several other small fixes and cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: clear msg->con in ceph_msg_release() only libceph: add nocephx_sign_messages option libceph: stop duplicating client fields in messenger libceph: drop authorizer check from cephx msg signing routines libceph: msg signing callouts don't need con argument libceph: evaluate osd_req_op_data() arguments only once ceph: make fsync() wait unsafe requests that created/modified inode ceph: add request to i_unsafe_dirops when getting unsafe reply libceph: introduce ceph_x_authorizer_cleanup() ceph: don't invalidate page cache when inode is no longer used rbd: remove duplicate calls to rbd_dev_mapping_clear() rbd: set device_type::release instead of device::release rbd: don't free rbd_dev outside of the release callback rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails libceph: use local variable cursor instead of &msg->cursor libceph: remove con argument in handle_reply() ceph: combine as many iovec as possile into one OSD request ceph: fix message length computation ceph: fix a comment typo rbd: drop null test before destroy functions
2015-11-12dax: fix __dax_pmd_fault crashDan Williams1-0/+7
Since 4.3 introduced devm_memremap_pages() the pfns handled by DAX may optionally have a struct page backing. When a mapped pfn reaches vmf_insert_pfn_pmd() it fails with a crash signature like the following: kernel BUG at mm/huge_memory.c:905! [..] Call Trace: [<ffffffff812a73ba>] __dax_pmd_fault+0x2ea/0x5b0 [<ffffffffa01a4182>] xfs_filemap_pmd_fault+0x92/0x150 [xfs] [<ffffffff811fbe02>] handle_mm_fault+0x312/0x1b50 Fix this by falling back to 4K mappings in the pfn_valid() case. Longer term, vmf_insert_pfn_pmd() needs to grow support for architectures that can provide a 'pmd_special' capability. Cc: <stable@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-12Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2-4/+15
Pull misc block fixes from Jens Axboe: "Stuff that got collected after the merge window opened. This contains: - NVMe: - Fix for non-striped transfer size setting for NVMe from Sathyavathi. - (Some) support for the weird Apple nvme controller in the macbooks. From Stephan Günther. - The error value leak for dax from Al. - A few minor blk-mq tweaks from me. - Add the new linux-block@vger.kernel.org mailing list to the MAINTAINERS file. - Discard fix for brd, from Jan. - A kerneldoc warning for block core from Randy. - An older fix from Vivek, converting a WARN_ON() to a rate limited printk when a device is hot removed with dirty inodes" * 'for-linus' of git://git.kernel.dk/linux-block: block: don't hardcode blk_qc_t -> tag mask dax_io(): don't let non-error value escape via retval instead of EFAULT block: fix blk-core.c kernel-doc warning fs/block_dev.c: Remove WARN_ON() when inode writeback fails NVMe: add support for Apple NVMe controller NVMe: use split lo_hi_{read,write}q blk-mq: mark __blk_mq_complete_request() static MAINTAINERS: add reference to new linux-block list NVMe: Increase the max transfer size when mdts is 0 brd: Refuse improperly aligned discard requests
2015-11-11Merge tag 'xfs-for-linus-4.4' of ↵Linus Torvalds62-411/+994
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs updates from Dave Chinner: "There is nothing really major here - the only significant addition is the per-mount operation statistics infrastructure. Otherwises there's various ACL, xattr, DAX, AIO and logging fixes, and a smattering of small cleanups and fixes elsewhere. Summary: - per-mount operational statistics in sysfs - fixes for concurrent aio append write submission - various logging fixes - detection of zeroed logs and invalid log sequence numbers on v5 filesystems - memory allocation failure message improvements - a bunch of xattr/ACL fixes - fdatasync optimisation - miscellaneous other fixes and cleanups" * tag 'xfs-for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (39 commits) xfs: give all workqueues rescuer threads xfs: fix log recovery op header validation assert xfs: Fix error path in xfs_get_acl xfs: optimise away log forces on timestamp updates for fdatasync xfs: don't leak uuid table on rmmod xfs: invalidate cached acl if set via ioctl xfs: Plug memory leak in xfs_attrmulti_attr_set xfs: Validate the length of on-disk ACLs xfs: invalidate cached acl if set directly via xattr xfs: xfs_filemap_pmd_fault treats read faults as write faults xfs: add ->pfn_mkwrite support for DAX xfs: DAX does not use IO completion callbacks xfs: Don't use unwritten extents for DAX xfs: introduce BMAPI_ZERO for allocating zeroed extents xfs: fix inode size update overflow in xfs_map_direct() xfs: clear PF_NOFREEZE for xfsaild kthread xfs: fix an error code in xfs_fs_fill_super() xfs: stats are no longer dependent on CONFIG_PROC_FS xfs: simplify /proc teardown & error handling xfs: per-filesystem stats counter implementation ...
2015-11-11Merge tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linuxLinus Torvalds19-248/+316
Pull nfsd updates from Bruce Fields: "Apologies for coming a little late in the merge window. Fortunately this is another fairly quiet one: Mainly smaller bugfixes and cleanup. We're still finding some bugs from the breakup of the big NFSv4 state lock in 3.17 -- thanks especially to Andrew Elble and Jeff Layton for tracking down some of the remaining races" * tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux: svcrpc: document lack of some memory barriers nfsd: fix race with open / open upgrade stateids nfsd: eliminate sending duplicate and repeated delegations nfsd: remove recurring workqueue job to clean DRC SUNRPC: drop stale comment in svc_setup_socket() nfsd: ensure that seqid morphing operations are atomic wrt to copies nfsd: serialize layout stateid morphing operations nfsd: improve client_has_state to check for unused openowners nfsd: fix clid_inuse on mount with security change sunrpc/cache: make cache flushing more reliable. nfsd: move include of state.h from trace.c to trace.h sunrpc: avoid warning in gss_key_timeout lockd: get rid of reference-counted NSM RPC clients SUNRPC: Use MSG_SENDPAGE_NOTLAST when calling sendpage() lockd: create NSM handles per net namespace nfsd: switch unsigned char flags in svc_fh to bools nfsd: move svc_fh->fh_maxsize to just after fh_handle nfsd: drop null test before destroy functions nfsd: serialize state seqid morphing operations
2015-11-11Merge branch 'for-linus-2' of ↵Linus Torvalds12-98/+86
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - misc stable fixes - trivial kernel-doc and comment fixups - remove never-used block_page_mkwrite() wrapper function, and rename the function that is _actually_ used to not have double underscores. * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: 9p: cache.h: Add #define of include guard vfs: remove stale comment in inode_operations vfs: remove unused wrapper block_page_mkwrite() binfmt_elf: Correct `arch_check_elf's description fs: fix writeback.c kernel-doc warnings fs: fix inode.c kernel-doc warning fs/pipe.c: return error code rather than 0 in pipe_write() fs/pipe.c: preserve alloc_file() error code binfmt_elf: Don't clobber passed executable's file header FS-Cache: Handle a write to the page immediately beyond the EOF marker cachefiles: perform test on s_blocksize when opening cache file. FS-Cache: Don't override netfs's primary_index if registering failed FS-Cache: Increase reference of parent after registering, netfs success debugfs: fix refcount imbalance in start_creating