summaryrefslogtreecommitdiffstats
path: root/fs/nfsd
AgeCommit message (Collapse)AuthorFilesLines
2017-02-27lib/vsprintf.c: remove %Z supportAlexey Dobriyan1-1/+1
Now that %z is standartised in C99 there is no reason to support %Z. Unlike %L it doesn't even make format strings smaller. Use BUILD_BUG_ON in a couple ATM drivers. In case anyone didn't notice lib/vsprintf.o is about half of SLUB which is in my opinion is quite an achievement. Hopefully this patch inspires someone else to trim vsprintf.c more. Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27fs: add i_blocksize()Fabian Frederick1-3/+3
Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs branch. This patch also fixes multiple checkpatch warnings: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' Thanks to Andrew Morton for suggesting more appropriate function instead of macro. [geliangtang@gmail.com: truncate: use i_blocksize()] Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22Merge tag 'driver-core-4.11-rc1' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "small" driver core patches for 4.11-rc1. Not much here, some firmware documentation and self-test updates, a debugfs code formatting issue, and a new feature for call_usermodehelper to make it more robust on systems that want to lock it down in a more secure way. All of these have been linux-next for a while now with no reported issues" * tag 'driver-core-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: kernfs: handle null pointers while printing node name and path Introduce STATIC_USERMODEHELPER to mediate call_usermodehelper() Make static usermode helper binaries constant kmod: make usermodehelper path a const string firmware: revamp firmware documentation selftests: firmware: send expected errors to /dev/null selftests: firmware: only modprobe if driver is missing platform: Print the resource range if device failed to claim kref: prefer atomic_inc_not_zero to atomic_add_unless debugfs: improve formatting of debugfs_real_fops()
2017-02-17Merge branch 'for-4.11/next' into for-4.11/linus-mergeJens Axboe2-8/+12
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-09nfsd: Revert "nfsd: special case truncates some more"J. Bruce Fields1-37/+60
This patch incorrectly attempted nested mnt_want_write, and incorrectly disabled nfsd's owner override for truncate. We'll fix those problems and make another attempt soon, for the moment I think the safest is to revert. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31block: fold cmd_type into the REQ_OP_ spaceChristoph Hellwig1-1/+1
Instead of keeping two levels of indirection for requests types, fold it all into the operations. The little caveat here is that previously cmd_type only applied to struct request, while the request and bio op fields were set to plain REQ_OP_READ/WRITE even for passthrough operations. Instead this patch adds new REQ_OP_* for SCSI passthrough and driver private requests, althought it has to add two for each so that we can communicate the data in/out nature of the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31block: make scsi_request and scsi ioctl support optionalChristoph Hellwig1-0/+1
We only need this code to support scsi, ide, cciss and virtio. And at least for virtio it's a deprecated feature to start with. This should shrink the kernel size for embedded device that only use, say eMMC a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31nfsd: special case truncates some moreChristoph Hellwig1-60/+37
Both the NFS protocols and the Linux VFS use a setattr operation with a bitmap of attributs to set to set various file attributes including the file size and the uid/gid. The Linux syscalls never mixes size updates with unrelated updates like the uid/gid, and some file systems like XFS and GFS2 rely on the fact that truncates might not update random other attributes, and many other file systems handle the case but do not update the different attributes in the same transaction. NFSD on the other hand passes the attributes it gets on the wire more or less directly through to the VFS, leading to updates the file systems don't expect. XFS at least has an assert on the allowed attributes, which caught an unusual NFS client setting the size and group at the same time. To handle this issue properly this switches nfsd to call vfs_truncate for size changes, and then handle all other attributes through notify_change. As a side effect this also means less boilerplace code around the size change as we can now reuse the VFS code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31NFSD: Fix a null reference case in find_or_create_lock_stateid()Kinglong Mee3-15/+13
nfsd assigns the nfs4_free_lock_stateid to .sc_free in init_lock_stateid(). If nfsd doesn't go through init_lock_stateid() and put stateid at end, there is a NULL reference to .sc_free when calling nfs4_put_stid(ns). This patch let the nfs4_stid.sc_free assignment to nfs4_alloc_stid(). Cc: stable@vger.kernel.org Fixes: 356a95ece7aa "nfsd: clean up races in lock stateid searching..." Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-27block: split scsi_request out of struct requestChristoph Hellwig1-7/+10
And require all drivers that want to support BLOCK_PC to allocate it as the first thing of their private data. To support this the legacy IDE and BSG code is switched to set cmd_size on their queues to let the block layer allocate the additional space. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-19Make static usermode helper binaries constantGreg Kroah-Hartman1-2/+4
There are a number of usermode helper binaries that are "hard coded" in the kernel today, so mark them as "const" to make it harder for someone to change where the variables point to. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Sailer <t.sailer@alumni.ethz.ch> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Johan Hovold <johan@kernel.org> Cc: Alex Elder <elder@kernel.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-12nfsd: fix supported attributes for acl & labelsJ. Bruce Fields1-1/+3
Oops--in 916d2d844afd I moved some constants into an array for convenience, but here I'm accidentally writing to that array. The effect is that if you ever encounter a filesystem lacking support for ACLs or security labels, then all queries of supported attributes will report that attribute as unsupported from then on. Fixes: 916d2d844afd "nfsd: clean up supported attribute handling" Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds2-2/+2
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-17Merge uncontroversial parts of branch 'readlink' of ↵Linus Torvalds2-8/+6
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull partial readlink cleanups from Miklos Szeredi. This is the uncontroversial part of the readlink cleanup patch-set that simplifies the default readlink handling. Miklos and Al are still discussing the rest of the series. * git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: vfs: make generic_readlink() static vfs: remove ".readlink = generic_readlink" assignments vfs: default to generic_readlink() vfs: replace calling i_op->readlink with vfs_readlink() proc/self: use generic_readlink ecryptfs: use vfs_get_link() bad_inode: add missing i_op initializers
2016-12-16Merge branch 'overlayfs-linus' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This update contains: - try to clone on copy-up - allow renaming a directory - split source into managable chunks - misc cleanups and fixes It does not contain the read-only fd data inconsistency fix, which Al didn't like. I'll leave that to the next year..." * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (36 commits) ovl: fix reStructuredText syntax errors in documentation ovl: fix return value of ovl_fill_super ovl: clean up kstat usage ovl: fold ovl_copy_up_truncate() into ovl_copy_up() ovl: create directories inside merged parent opaque ovl: opaque cleanup ovl: show redirect_dir mount option ovl: allow setting max size of redirect ovl: allow redirect_dir to default to "on" ovl: check for emptiness of redirect dir ovl: redirect on rename-dir ovl: lookup redirects ovl: consolidate lookup for underlying layers ovl: fix nested overlayfs mount ovl: check namelen ovl: split super.c ovl: use d_is_dir() ovl: simplify lookup ovl: check lower existence of rename target ovl: rename: simplify handling of lower/merged directory ...
2016-12-16Merge tag 'nfsd-4.10' of git://linux-nfs.org/~bfields/linuxLinus Torvalds8-85/+103
Pull nfsd updates from Bruce Fields: "The one new feature is support for a new NFSv4.2 mode_umask attribute that makes ACL inheritance a little more useful in environments that default to restrictive umasks. Requires client-side support, also on its way for 4.10. Other than that, miscellaneous smaller fixes and cleanup, especially to the server rdma code" [ The client side of the umask attribute was merged yesterday ] * tag 'nfsd-4.10' of git://linux-nfs.org/~bfields/linux: nfsd: add support for the umask attribute sunrpc: use DEFINE_SPINLOCK() svcrdma: Further clean-up of svc_rdma_get_inv_rkey() svcrdma: Break up dprintk format in svc_rdma_accept() svcrdma: Remove unused variable in rdma_copy_tail() svcrdma: Remove unused variables in xprt_rdma_bc_allocate() svcrdma: Remove svc_rdma_op_ctxt::wc_status svcrdma: Remove DMA map accounting svcrdma: Remove BH-disabled spin locking in svc_rdma_send() svcrdma: Renovate sendto chunk list parsing svcauth_gss: Close connection when dropping an incoming message svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm nfsd: constify reply_cache_stats_operations structure nfsd: update workqueue creation sunrpc: GFP_KERNEL should be GFP_NOFS in crypto code nfsd: catch errors in decode_fattr earlier nfsd: clean up supported attribute handling nfsd: fix error handling for clients that fail to return the layout nfsd: more robust allocation failure handling in nfsd_reply_cache_init
2016-12-16vfs: call vfs_clone_file_range() under freeze protectionAmir Goldstein1-2/+1
Move sb_start_write()/sb_end_write() out of the vfs helper and up into the ioctl handler. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-15nfsd: add support for the umask attributeAndreas Gruenbacher4-9/+33
Clients can set the umask attribute when creating files to cause the server to apply it always except when inheriting permissions from the parent directory. That way, the new files will end up with the same permissions as files created locally. See https://tools.ietf.org/html/draft-ietf-nfsv4-umask-02 for more details. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-12-09vfs: replace calling i_op->readlink with vfs_readlink()Miklos Szeredi2-8/+6
Also check d_is_symlink() in callers instead of inode->i_op->readlink because following patches will allow NULL ->readlink for symlinks. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-11-18netns: make struct pernet_operations::id unsigned intAlexey Dobriyan2-2/+2
Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-14nfsd: constify reply_cache_stats_operations structureJulia Lawall1-1/+1
reply_cache_stats_operations, of type struct file_operations, is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-14nfsd: update workqueue creationJ. Bruce Fields1-1/+1
No real change in functionality, but the old interface seems to be deprecated. We don't actually care about ordering necessarily, but we do depend on running at most one work item at a time: nfsd4_process_cb_update() assumes that no other thread is running it, and that no new callbacks are starting while it's running. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01nfsd: catch errors in decode_fattr earlierJ. Bruce Fields2-7/+14
3c8e03166ae2 "NFSv4: do exact check about attribute specified" fixed some handling of unsupported-attribute errors, but it also delayed checking for unwriteable attributes till after we decode them. This could lead to odd behavior in the case a client attemps to set an attribute we don't know about followed by one we try to parse. In that case the parser for the known attribute will attempt to parse the unknown attribute. It should fail in some safe way, but the error might at least be incorrect (probably bad_xdr instead of inval). So, it's better to do that check at the start. As far as I know this doesn't cause any problems with current clients but it might be a minor issue e.g. if we encounter a future client that supports a new attribute that we currently don't. Cc: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01nfsd: clean up supported attribute handlingJ. Bruce Fields3-60/+42
Minor cleanup, no change in behavior. Provide helpers for some common attribute bitmap operations. Drop some comments that just echo the code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01nfsd: fix error handling for clients that fail to return the layoutJeff Layton1-4/+4
Currently, when the client continually returns NFS4ERR_DELAY on a CB_LAYOUTRECALL, we'll give up trying to retransmit after two lease periods, but leave the layout in place. What we really need to do here is fence the client in this case. Have it fall through to that code in that case instead of into the NFS4ERR_NOMATCHING_LAYOUT case. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01nfsd: more robust allocation failure handling in nfsd_reply_cache_initJeff Layton1-3/+8
Currently, we try to allocate the cache as a single, large chunk, which can fail if no big chunks of memory are available. We _do_ try to size it according to the amount of memory in the box, but if the server is started well after boot time, then the allocation can fail due to memory fragmentation. Fall back to doing a vzalloc if the kcalloc fails, and switch the shutdown code to do a kvfree to handle freeing correctly. Reported-by: Olaf Hering <olaf@aepfle.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01nfsd: Fix general protection fault in release_lock_stateid()Chuck Lever1-6/+4
When I push NFSv4.1 / RDMA hard, (xfstests generic/089, for example), I get this crash on the server: Oct 28 22:04:30 klimt kernel: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Oct 28 22:04:30 klimt kernel: Modules linked in: cts rpcsec_gss_krb5 iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btrfs irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd xor pcspkr raid6_pq i2c_i801 i2c_smbus lpc_ich mfd_core sg mei_me mei ioatdma shpchp wmi ipmi_si ipmi_msghandler rpcrdma ib_ipoib rdma_ucm acpi_power_meter acpi_pad ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb ahci libahci ptp mlx4_core pps_core dca libata i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod Oct 28 22:04:30 klimt kernel: CPU: 7 PID: 1558 Comm: nfsd Not tainted 4.9.0-rc2-00005-g82cd754 #8 Oct 28 22:04:30 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 Oct 28 22:04:30 klimt kernel: task: ffff880835c3a100 task.stack: ffff8808420d8000 Oct 28 22:04:30 klimt kernel: RIP: 0010:[<ffffffffa05a759f>] [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd] Oct 28 22:04:30 klimt kernel: RSP: 0018:ffff8808420dbce0 EFLAGS: 00010246 Oct 28 22:04:30 klimt kernel: RAX: ffff88084e6660f0 RBX: ffff88084e667020 RCX: 0000000000000000 Oct 28 22:04:30 klimt kernel: RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88084e667020 Oct 28 22:04:30 klimt kernel: RBP: ffff8808420dbcf8 R08: 0000000000000001 R09: 0000000000000000 Oct 28 22:04:30 klimt kernel: R10: ffff880835c3a100 R11: ffff880835c3aca8 R12: 6b6b6b6b6b6b6b6b Oct 28 22:04:30 klimt kernel: R13: ffff88084e6670d8 R14: ffff880835f546f0 R15: ffff880835f1c548 Oct 28 22:04:30 klimt kernel: FS: 0000000000000000(0000) GS:ffff88087bdc0000(0000) knlGS:0000000000000000 Oct 28 22:04:30 klimt kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 28 22:04:30 klimt kernel: CR2: 00007ff020389000 CR3: 0000000001c06000 CR4: 00000000001406e0 Oct 28 22:04:30 klimt kernel: Stack: Oct 28 22:04:30 klimt kernel: ffff88084e667020 0000000000000000 ffff88084e6670d8 ffff8808420dbd20 Oct 28 22:04:30 klimt kernel: ffffffffa05ac80d ffff880835f54548 ffff88084e640008 ffff880835f545b0 Oct 28 22:04:30 klimt kernel: ffff8808420dbd70 ffffffffa059803d ffff880835f1c768 0000000000000870 Oct 28 22:04:30 klimt kernel: Call Trace: Oct 28 22:04:30 klimt kernel: [<ffffffffa05ac80d>] nfsd4_free_stateid+0xfd/0x1b0 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffffa059803d>] nfsd4_proc_compound+0x40d/0x690 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffffa0583114>] nfsd_dispatch+0xd4/0x1d0 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffffa047bbf9>] svc_process_common+0x3d9/0x700 [sunrpc] Oct 28 22:04:30 klimt kernel: [<ffffffffa047ca64>] svc_process+0xf4/0x330 [sunrpc] Oct 28 22:04:30 klimt kernel: [<ffffffffa05827ca>] nfsd+0xfa/0x160 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffffa05826d0>] ? nfsd_destroy+0x170/0x170 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffff810b367b>] kthread+0x10b/0x120 Oct 28 22:04:30 klimt kernel: [<ffffffff810b3570>] ? kthread_stop+0x280/0x280 Oct 28 22:04:30 klimt kernel: [<ffffffff8174e8ba>] ret_from_fork+0x2a/0x40 Oct 28 22:04:30 klimt kernel: Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 b0 00 00 00 48 89 fb 4c 8b a0 98 00 00 00 <49> 8b 44 24 20 48 8d b8 80 03 00 00 e8 10 66 1a e1 48 89 df e8 Oct 28 22:04:30 klimt kernel: RIP [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd] Oct 28 22:04:30 klimt kernel: RSP <ffff8808420dbce0> Oct 28 22:04:30 klimt kernel: ---[ end trace cf5d0b371973e167 ]--- Jeff Layton says: > Hm...now that I look though, this is a little suspicious: > > struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner); > > I wonder if it's possible for the openstateid to have already been > destroyed at this point. > > We might be better off doing something like this to get the client pointer: > > stp->st_stid.sc_client; > > ...which should be more direct and less dependent on other stateids > staying valid. With the suggested change, I am no longer able to reproduce the above oops. v2: Fix unhash_lock_stateid() as well Fix-suggested-by: Jeff Layton <jlayton@redhat.com> Fixes: 42691398be08 ('nfsd: Fix race between FREE_STATEID and LOCK') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-10-24nfsd: move blocked lock handling under a dedicated spinlockJeff Layton2-13/+20
Bruce was hitting some lockdep warnings in testing, showing that we could hit a deadlock with the new CB_NOTIFY_LOCK handling, involving a rather complex situation involving four different spinlocks. The crux of the matter is that we end up taking the nn->client_lock in the lm_notify handler. The simplest fix is to just declare a new per-nfsd_net spinlock to protect the new CB_NOTIFY_LOCK structures. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-10-13Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linuxLinus Torvalds16-49/+511
Pull nfsd updates from Bruce Fields: "Some RDMA work and some good bugfixes, and two new features that could benefit from user testing: - Anna Schumacker contributed a simple NFSv4.2 COPY implementation. COPY is already supported on the client side, so a call to copy_file_range() on a recent client should now result in a server-side copy that doesn't require all the data to make a round trip to the client and back. - Jeff Layton implemented callbacks to notify clients when contended locks become available, which should reduce latency on workloads with contended locks" * tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux: NFSD: Implement the COPY call nfsd: handle EUCLEAN nfsd: only WARN once on unmapped errors exportfs: be careful to only return expected errors. nfsd4: setclientid_confirm with unmatched verifier should fail nfsd: randomize SETCLIENTID reply to help distinguish servers nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant nfsd: add a LRU list for blocked locks nfsd: have nfsd4_lock use blocking locks for v4.1+ locks nfsd: plumb in a CB_NOTIFY_LOCK operation NFSD: fix corruption in notifier registration svcrdma: support Remote Invalidation svcrdma: Server-side support for rpcrdma_connect_private rpcrdma: RDMA/CM private message data structure svcrdma: Skip put_page() when send_reply() fails svcrdma: Tail iovec leaves an orphaned DMA mapping nfsd: fix dprintk in nfsd4_encode_getdeviceinfo nfsd: eliminate cb_minorversion field nfsd: don't set a FL_LAYOUT lease for flexfiles layouts
2016-10-10Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: ">rename2() work from Miklos + current_time() from Deepa" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Replace current_fs_time() with current_time() fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps fs: Replace CURRENT_TIME with current_time() for inode timestamps fs: proc: Delete inode time initializations in proc_alloc_inode() vfs: Add current_time() api vfs: add note about i_op->rename changes to porting fs: rename "rename2" i_op to "rename" vfs: remove unused i_op->rename fs: make remaining filesystems use .rename2 libfs: support RENAME_NOREPLACE in simple_rename() fs: support RENAME_NOREPLACE for local filesystems ncpfs: fix unused variable warning
2016-10-10Merge branch 'work.misc' of ↵Linus Torvalds1-5/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted misc bits and pieces. There are several single-topic branches left after this (rename2 series from Miklos, current_time series from Deepa Dinamani, xattr series from Andreas, uaccess stuff from from me) and I'd prefer to send those separately" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits) proc: switch auxv to use of __mem_open() hpfs: support FIEMAP cifs: get rid of unused arguments of CIFSSMBWrite() posix_acl: uapi header split posix_acl: xattr representation cleanups fs/aio.c: eliminate redundant loads in put_aio_ring_file fs/internal.h: add const to ns_dentry_operations declaration compat: remove compat_printk() fs/buffer.c: make __getblk_slow() static proc: unsigned file descriptors fs/file: more unsigned file descriptors fs: compat: remove redundant check of nr_segs cachefiles: Fix attempt to read i_blocks after deleting file [ver #2] cifs: don't use memcpy() to copy struct iov_iter get rid of separate multipage fault-in primitives fs: Avoid premature clearing of capabilities fs: Give dentry to inode_change_ok() instead of inode fuse: Propagate dentry down to inode_change_ok() ceph: Propagate dentry down to inode_change_ok() xfs: Propagate dentry down to inode_change_ok() ...
2016-10-08Merge remote-tracking branch 'jk/vfs' into work.miscAl Viro1-5/+3
2016-10-07cred: simpler, 1D supplementary groupsAlexey Dobriyan2-4/+4
Current supplementary groups code can massively overallocate memory and is implemented in a way so that access to individual gid is done via 2D array. If number of gids is <= 32, memory allocation is more or less tolerable (140/148 bytes). But if it is not, code allocates full page (!) regardless and, what's even more fun, doesn't reuse small 32-entry array. 2D array means dependent shifts, loads and LEAs without possibility to optimize them (gid is never known at compile time). All of the above is unnecessary. Switch to the usual trailing-zero-len-array scheme. Memory is allocated with kmalloc/vmalloc() and only as much as needed. Accesses become simpler (LEA 8(gi,idx,4) or even without displacement). Maximum number of gids is 65536 which translates to 256KB+8 bytes. I think kernel can handle such allocation. On my usual desktop system with whole 9 (nine) aux groups, struct group_info shrinks from 148 bytes to 44 bytes, yay! Nice side effects: - "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing, - fix little mess in net/ipv4/ping.c should have been using GROUP_AT macro but this point becomes moot, - aux group allocation is persistent and should be accounted as such. Link: http://lkml.kernel.org/r/20160817201927.GA2096@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Vasily Kulikov <segoon@openwall.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07NFSD: Implement the COPY callAnna Schumaker5-13/+181
I only implemented the sync version of this call, since it's the easiest. I can simply call vfs_copy_range() and have the vfs do the right thing for the filesystem being exported. Signed-off-by: Anna Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-10-07nfsd: handle EUCLEANJ. Bruce Fields1-0/+1
Eric Sandeen reports that xfs can return this if filesystem corruption prevented completing the operation. Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-10-07nfsd: only WARN once on unmapped errorsJ. Bruce Fields1-1/+1
No need to spam the logs here. The only drawback is losing information if we ever encounter two different unmapped errors, but in practice we've rarely see even one. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-27fs: Replace current_fs_time() with current_time()Deepa Dinamani1-1/+1
current_fs_time() uses struct super_block* as an argument. As per Linus's suggestion, this is changed to take struct inode* as a parameter instead. This is because the function is primarily meant for vfs inode timestamps. Also the function was renamed as per Arnd's suggestion. Change all calls to current_fs_time() to use the new current_time() function instead. current_fs_time() will be deleted. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-26nfsd4: setclientid_confirm with unmatched verifier should failJ. Bruce Fields1-2/+3
A setclientid_confirm with (clientid, verifier) both matching an existing confirmed record is assumed to be a replay, but if the verifier doesn't match, it shouldn't be. This would be a very rare case, except that clients following https://tools.ietf.org/html/rfc7931#section-5.8 may depend on the failure. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-26nfsd: randomize SETCLIENTID reply to help distinguish serversJ. Bruce Fields1-0/+2
NFSv4.1 has built-in trunking support that allows a client to determine whether two connections to two different IP addresses are actually to the same server. NFSv4.0 does not, but RFC 7931 attempts to provide clients a means to do this, basically by performing a SETCLIENTID to one address and confirming it with a SETCLIENTID_CONFIRM to the other. Linux clients since 05f4c350ee02 "NFS: Discover NFSv4 server trunking when mounting" implement a variation on this suggestion. It is possible that other clients do too. This depends on the clientid and verifier not being accepted by an unrelated server. Since both are 64-bit values, that would be very unlikely if they were random numbers. But they aren't: knfsd generates the 64-bit clientid by concatenating the 32-bit boot time (in seconds) and a counter. This makes collisions between clientids generated by the same server extremely unlikely. But collisions are very likely between clientids generated by servers that boot at the same time, and it's quite common for multiple servers to boot at the same time. The verifier is a concatenation of the SETCLIENTID time (in seconds) and a counter, so again collisions between different servers are likely if multiple SETCLIENTIDs are done at the same time, which is a common case. Therefore recent NFSv4.0 clients may decide two different servers are really the same, and mount a filesystem from the wrong server. Fortunately the Linux client, since 55b9df93ddd6 "nfsv4/v4.1: Verify the client owner id during trunking detection", only does this when given the non-default "migration" mount option. The fault is really with RFC 7931, and needs a client fix, but in the meantime we can mitigate the chance of these collisions by randomizing the starting value of the counters used to generate clientids and verifiers. Reported-by: Frank Sorenson <fsorenso@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-26nfsd: set the MAY_NOTIFY_LOCK flag in OPEN repliesJeff Layton1-2/+4
If we are using v4.1+, then we can send notification when contended locks become free. Inform the client of that fact. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-26nfsd: add a LRU list for blocked locksJeff Layton3-0/+65
It's possible for a client to call in on a lock that is blocked for a long time, but discontinue polling for it. A malicious client could even set a lock on a file, and then spam the server with failing lock requests from different lockowners that pile up in a DoS attack. Add the blocked lock structures to a per-net namespace LRU when hashing them, and timestamp them. If the lock request is not revisited after a lease period, we'll drop it under the assumption that the client is no longer interested. This also gives us a mechanism to clean up these objects at server shutdown time as well. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-26nfsd: have nfsd4_lock use blocking locks for v4.1+ locksJeff Layton2-20/+156
Create a new per-lockowner+per-inode structure that contains a file_lock. Have nfsd4_lock add this structure to the lockowner's list prior to setting the lock. Then call the vfs and request a blocking lock (by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED back, then we dequeue the block structure and free it. When the next lock request comes in, we'll look for an existing block for the same filehandle and dequeue and reuse it if there is one. When the lock comes free (a'la an lm_notify call), we dequeue it from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to inform the client that it should retry the lock request. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-26nfsd: plumb in a CB_NOTIFY_LOCK operationJeff Layton3-0/+73
Add the encoding/decoding for CB_NOTIFY_LOCK operations. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-26NFSD: fix corruption in notifier registrationVasily Averin1-4/+14
By design notifier can be registered once only, however nfsd registers the same inetaddr notifiers per net-namespace. When this happen it corrupts list of notifiers, as result some notifiers can be not called on proper event, traverse on list can be cycled forever, and second unregister can access already freed memory. Cc: stable@vger.kernel.org fixes: 36684996 ("nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-23nfsd: fix dprintk in nfsd4_encode_getdeviceinfoJeff Layton1-1/+1
nfserr is big-endian, so we should convert it to host-endian before printing it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-22fs: Give dentry to inode_change_ok() instead of inodeJan Kara1-5/+3
inode_change_ok() will be resposible for clearing capabilities and IMA extended attributes and as such will need dentry. Give it as an argument to inode_change_ok() instead of an inode. Also rename inode_change_ok() to setattr_prepare() to better relect that it does also some modifications in addition to checks. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-16nfsd: eliminate cb_minorversion fieldJeff Layton2-5/+3
We already have that info in the client pointer. No need to pass around a copy. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-16nfsd: don't set a FL_LAYOUT lease for flexfiles layoutsJeff Layton3-1/+7
We currently can hit a deadlock (of sorts) when trying to use flexfiles layouts with XFS. XFS will call break_layout when something wants to write to the file. In the case of the (super-simple) flexfiles layout driver in knfsd, the MDS and DS are the same machine. The client can get a layout and then issue a v3 write to do its I/O. XFS will then call xfs_break_layouts, which will cause a CB_LAYOUTRECALL to be issued to the client. The client however can't return the layout until the v3 WRITE completes, but XFS won't allow the write to proceed until the layout is returned. Christoph says: XFS only cares about block-like layouts where the client has direct access to the file blocks. I'd need to look how to propagate the flag into break_layout, but in principle we don't need to do any recalls on truncate ever for file and flexfile layouts. If we're never going to recall the layout, then we don't even need to set the lease at all. Just skip doing so on flexfiles layouts by adding a new flag to struct nfsd4_layout_ops and skipping the lease setting and removal when that flag is true. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-12nfsd: don't return an unhashed lock stateid after taking mutexJeff Layton1-5/+20
nfsd4_lock will take the st_mutex before working with the stateid it gets, but between the time when we drop the cl_lock and take the mutex, the stateid could become unhashed (a'la FREE_STATEID). If that happens the lock stateid returned to the client will be forgotten. Fix this by first moving the st_mutex acquisition into lookup_or_create_lock_state. Then, have it check to see if the lock stateid is still hashed after taking the mutex. If it's not, then put the stateid and try the find/create again. Signed-off-by: Jeff Layton <jlayton@redhat.com> Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com> Cc: stable@vger.kernel.org # feb9dad5 nfsd: Always lock state exclusively. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-11nfsd: Fix race between FREE_STATEID and LOCKChuck Lever1-12/+28
When running LTP's nfslock01 test, the Linux client can send a LOCK and a FREE_STATEID request at the same time. The outcome is: Frame 324 R OPEN stateid [2,O] Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64 Frame 115008 R LOCK stateid [1,L] Frame 115012 C WRITE stateid [0,L] offset 672000 len 64 Frame 115016 R WRITE NFS4_OK Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64 Frame 115022 R LOCKU NFS4_OK Frame 115025 C FREE_STATEID stateid [2,L] Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64 Frame 115029 R FREE_STATEID NFS4_OK Frame 115030 R LOCK stateid [3,L] Frame 115034 C WRITE stateid [0,L] offset 672128 len 64 Frame 115038 R WRITE NFS4ERR_BAD_STATEID In other words, the server returns stateid L in a successful LOCK reply, but it has already released it. Subsequent uses of stateid L fail. To address this, protect the generation check in nfsd4_free_stateid with the st_mutex. This should guarantee that only one of two outcomes occurs: either LOCK returns a fresh valid stateid, or FREE_STATEID returns NFS4ERR_LOCKS_HELD. Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com> Fix-suggested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>