summaryrefslogtreecommitdiffstats
path: root/net/sunrpc
AgeCommit message (Collapse)AuthorFilesLines
2019-09-05sunrpc: Use kzfree rather than its implementation.zhong jiang1-6/+3
Use kzfree instead of memset() + kfree(). Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26xprtrdma: Send Queue size grows after a reconnectChuck Lever1-12/+14
Eli Dorfman reports that after a series of idle disconnects, an RPC/RDMA transport becomes unusable (rdma_create_qp returns -ENOMEM). Problem was tracked down to increasing Send Queue size after each reconnect. The rdma_create_qp() API does not promise to leave its @qp_init_attr parameter unaltered. In fact, some drivers do modify one or more of its fields. Thus our calls to rdma_create_qp must use a fresh copy of ib_qp_init_attr each time. This fix is appropriate for kernels dating back to late 2007, though it will have to be adapted, as the connect code has changed over the years. Reported-by: Eli Dorfman <eli@vastdata.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-26xprtrdma: Clear xprt->reestablish_timeout on closeChuck Lever3-4/+9
Ensure that the re-establishment delay does not grow exponentially on each good reconnect. This probably should have been part of commit 675dd90ad093 ("xprtrdma: Modernize ops->connect"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-26SUNRPC: Handle connection breakages correctly in call_status()Trond Myklebust1-1/+1
If the connection breaks while we're waiting for a reply from the server, then we want to immediately try to reconnect. Fixes: ec6017d90359 ("SUNRPC fix regression in umount of a secure mount") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"Trond Myklebust1-7/+0
This reverts commit a79f194aa4879e9baad118c3f8bb2ca24dbef765. The mechanism for aborting I/O is racy, since we are not guaranteed that the request is asleep while we're changing both task->tk_status and task->tk_action. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v5.1
2019-08-26SUNRPC: Handle EADDRINUSE and ENOBUFS correctlyTrond Myklebust1-3/+7
If a connect or bind attempt returns EADDRINUSE, that means we want to retry with a different port. It is not a fatal connection error. Similarly, ENOBUFS is not fatal, but just indicates a memory allocation issue. Retry after a short delay. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26SUNRPC: Don't handle errors if the bind/connect succeededTrond Myklebust1-11/+24
Don't handle errors in call_bind_status()/call_connect_status() if it turns out that a previous call caused it to succeed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v5.1+
2019-08-26xprtrdma: Recycle MRs after disconnectChuck Lever3-9/+29
The optimization done in "xprtrdma: Simplify rpcrdma_mr_pop" was a bit too optimistic. MRs left over after a reconnect still need to be recycled, not added back to the free list, since they could be in flight or actually fully registered. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-21xprtrdma: Optimize rpcrdma_post_recvs()Chuck Lever1-1/+1
Micro-optimization: In rpcrdma_post_recvs, since commit e340c2d6ef2a ("xprtrdma: Reduce the doorbell rate (Receive)"), the common case is to return without doing anything. Found with perf. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-21xprtrdma: Inline XDR chunk encoder functionsChuck Lever1-9/+12
Micro-optimization: Save the cost of three function calls during transport header encoding. These were "noinline" before to generate more meaningful call stacks during debugging, but this code is now pretty stable. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-21xprtrdma: Fix bc_max_slots return valueChuck Lever1-3/+1
For the moment the returned value just happens to be correct because the current backchannel server implementation does not vary the number of credits it offers. The spec does permit this value to change during the lifetime of a connection, however. The actual maximum is fixed for all RPC/RDMA transports, because each transport instance has to pre-allocate the resources for processing BC requests. That's the value that should be returned. Fixes: 7402a4fedc2b ("SUNRPC: Fix up backchannel slot table ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-21xprtrdma: Clean up xprt_rdma_set_connect_timeout()Chuck Lever1-4/+4
Clean up: The function name should match the documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-21xprtrdma: Use an llist to manage free rpcrdma_repsChuck Lever2-59/+53
rpcrdma_rep objects are removed from their free list by only a single thread: the Receive completion handler. Thus that free list can be converted to an llist, where a single-threaded consumer and a multi-threaded producer (rpcrdma_buffer_put) can both access the llist without the need for any serialization. This eliminates spin lock contention between the Receive completion handler and rpcrdma_buffer_get, and makes the rep consumer wait- free. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-21xprtrdma: Remove rpcrdma_buffer::rb_mrlockChuck Lever3-18/+16
Clean up: Now that the free list is used sparingly, get rid of the separate spin lock protecting it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-21xprtrdma: Cache free MRs in each rpcrdma_reqChuck Lever4-12/+33
Instead of a globally-contended MR free list, cache MRs in each rpcrdma_req as they are released. This means acquiring and releasing an MR will be lock-free in the common case, even outside the transport send lock. The original idea of per-rpcrdma_req MR free lists was suggested by Shirley Ma <shirley.ma@oracle.com> several years ago. I just now figured out how to make that idea work with on-demand MR allocation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20xprtrdma: Ensure creating an MR does not trigger FS writebackChuck Lever2-4/+5
Probably would be good to also pass GFP flags to ib_alloc_mr. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20xprtrdma: Move rpcrdma_mr_get out of frwr_mapChuck Lever4-43/+35
Refactor: Retrieve an MR and handle error recovery entirely in rpc_rdma.c, as this is not a device-specific function. Note that since commit 89f90fe1ad8b ("SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit queue"), the xprt_transmit function handles the cond_resched. The transport no longer has to do this itself. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20xprtrdma: Combine rpcrdma_mr_put and rpcrdma_mr_unmap_and_putChuck Lever3-28/+11
Clean up. There is only one remaining rpcrdma_mr_put call site, and it can be directly replaced with unmap_and_put because mr->mr_dir is set to DMA_NONE just before the call. Now all the call sites do a DMA unmap, and we can just rename mr_unmap_and_put to mr_put, which nicely matches mr_get. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20xprtrdma: Simplify rpcrdma_mr_popChuck Lever4-21/+11
Clean up: rpcrdma_mr_pop call sites check if the list is empty first. Let's replace the list_empty with less costly logic. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20xprtrdma: Toggle XPRT_CONGESTED in xprtrdma's slot methodsChuck Lever1-1/+3
Commit 48be539dd44a ("xprtrdma: Introduce ->alloc_slot call-out for xprtrdma") added a separate alloc_slot and free_slot to the RPC/RDMA transport. Later, commit 75891f502f5f ("SUNRPC: Support for congestion control when queuing is enabled") modified the generic alloc/free_slot methods, but neglected the methods in xprtrdma. Found via code review. Fixes: 75891f502f5f ("SUNRPC: Support for congestion control ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20xprtrdma: Rename rpcrdma_buffer::rb_allChuck Lever2-19/+9
Clean up: There are other "all" list heads. For code clarity distinguish this one as for use only for MRs by renaming it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20xprtrdma: Rename CQE field in Receive trace pointsChuck Lever1-1/+1
Make the field name the same for all trace points that handle pointers to struct rpcrdma_rep. That makes it easy to grep for matching rep points in trace output. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20xprtrdma: Boost maximum transport header sizeChuck Lever2-14/+18
Although I haven't seen any performance results that justify it, I've received several complaints that NFS/RDMA no longer supports a maximum rsize and wsize of 1MB. These days it is somewhat smaller. To simplify the logic that determines whether a chunk list is necessary, the implementation uses a fixed maximum size of the transport header. Currently that maximum size is 256 bytes, one quarter of the default inline threshold size for RPC/RDMA v1. Since commit a78868497c2e ("xprtrdma: Reduce max_frwr_depth"), the size of chunks is also smaller to take advantage of inline page lists in device internal MR data structures. The combination of these two design choices has reduced the maximum NFS rsize and wsize that can be used for most RNIC/HCAs. Increasing the maximum transport header size and the maximum number of RDMA segments it can contain increases the negotiated maximum rsize/wsize on common RNIC/HCAs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20xprtrdma: Fix calculation of ri_max_segs againChuck Lever1-2/+2
Commit 302d3deb206 ("xprtrdma: Prevent inline overflow") added this calculation back in 2016, but got it wrong. I tested only the lower bound, which is why there is a max_t there. The upper bound should be rounded up too. Now, when using DIV_ROUND_UP, that takes care of the lower bound as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20xprtrdma: Update obsolete commentChuck Lever1-3/+0
Comment was made obsolete by commit 8cec3dba76a4 ("xprtrdma: rpcrdma_regbuf alignment"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20xprtrdma: Refresh the documenting comment in frwr_ops.cChuck Lever1-48/+18
Things have changed since this comment was written. In particular, the reworking of connection closing, on-demand creation of MRs, and the removal of fr_state all mean that deferring MR recovery to frwr_map is no longer needed. The description is obsolete. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20SUNRPC: Inline xdr_commit_encodeChuck Lever1-1/+1
Micro-optimization: For xdr_commit_encode call sites in net/sunrpc/xdr.c, eliminate the extra calling sequence. On my client, this change saves about a microsecond for every 30 calls to xdr_reserve_space(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20SUNRPC: Remove rpc_wake_up_queued_task_on_wq()Chuck Lever1-23/+4
Clean up: commit c544577daddb ("SUNRPC: Clean up transport write space handling") appears to have removed the last caller of rpc_wake_up_queued_task_on_wq(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-19sunrpc: add a new cache_detail operation for when a cache is flushedJeff Layton1-0/+3
When the exports table is changed, exportfs will usually write a new time to the "flush" file in the nfsd.export cache procfile. This tells the kernel to flush any entries that are older than that value. This gives us a mechanism to tell whether an unexport might have occurred. Add a new ->flush cache_detail operation that is called after flushing the cache whenever someone writes to a "flush" file. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19svcrdma: Use llist for managing cache of recv_ctxtsChuck Lever2-16/+11
Use a wait-free mechanism for managing the svc_rdma_recv_ctxts free list. Subsequently, sc_recv_lock can be eliminated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19svcrdma: Remove svc_rdma_wqChuck Lever2-8/+2
Clean up: the system workqueue will work just as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-05rdma: Enable ib_alloc_cq to spread work over a device's comp_vectorsChuck Lever2-11/+10
Send and Receive completion is handled on a single CPU selected at the time each Completion Queue is allocated. Typically this is when an initiator instantiates an RDMA transport, or when a target accepts an RDMA connection. Some ULPs cannot open a connection per CPU to spread completion workload across available CPUs and MSI vectors. For such ULPs, provide an API that allows the RDMA core to select a completion vector based on the device's complement of available comp_vecs. ULPs that invoke ib_alloc_cq() with only comp_vector 0 are converted to use the new API so that their completion workloads interfere less with each other. Suggested-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Cc: <linux-cifs@vger.kernel.org> Cc: <v9fs-developer@lists.sourceforge.net> Link: https://lore.kernel.org/r/20190729171923.13428.52555.stgit@manet.1015granger.net Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-30SUNRPC: Track writers of the 'channel' file to improve cache_listeners_existDave Wysochanski1-4/+8
The sunrpc cache interface is susceptible to being fooled by a rogue process just reading a 'channel' file. If this happens the kernel may think a valid daemon exists to service the cache when it does not. For example, the following may fool the kernel: cat /proc/net/rpc/auth.unix.gid/channel Change the tracking of readers to writers when considering whether a listener exists as all valid daemon processes either open a channel file O_RDWR or O_WRONLY. While this does not prevent a rogue process from "stealing" a message from the kernel, it does at least improve the kernels perception of whether a valid process servicing the cache exists. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-19Merge branch 'work.mount0' of ↵Linus Torvalds1-8/+26
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount updates from Al Viro: "The first part of mount updates. Convert filesystems to use the new mount API" * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) mnt_init(): call shmem_init() unconditionally constify ksys_mount() string arguments don't bother with registering rootfs init_rootfs(): don't bother with init_ramfs_fs() vfs: Convert smackfs to use the new mount API vfs: Convert selinuxfs to use the new mount API vfs: Convert securityfs to use the new mount API vfs: Convert apparmorfs to use the new mount API vfs: Convert openpromfs to use the new mount API vfs: Convert xenfs to use the new mount API vfs: Convert gadgetfs to use the new mount API vfs: Convert oprofilefs to use the new mount API vfs: Convert ibmasmfs to use the new mount API vfs: Convert qib_fs/ipathfs to use the new mount API vfs: Convert efivarfs to use the new mount API vfs: Convert configfs to use the new mount API vfs: Convert binfmt_misc to use the new mount API convenience helper: get_tree_single() convenience helper get_tree_nodev() vfs: Kill sget_userns() ...
2019-07-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+1
Merge yet more updates from Andrew Morton: "The rest of MM and a kernel-wide procfs cleanup. Summary of the more significant patches: - Patch series "mm/memory_hotplug: Factor out memory block devicehandling", v3. David Hildenbrand. Some spring-cleaning of the memory hotplug code, notably in drivers/base/memory.c - "mm: thp: fix false negative of shmem vma's THP eligibility". Yang Shi. Fix /proc/pid/smaps output for THP pages used in shmem. - "resource: fix locking in find_next_iomem_res()" + 1. Nadav Amit. Bugfix and speedup for kernel/resource.c - Patch series "mm: Further memory block device cleanups", David Hildenbrand. More spring-cleaning of the memory hotplug code. - Patch series "mm: Sub-section memory hotplug support". Dan Williams. Generalise the memory hotplug code so that pmem can use it more completely. Then remove the hacks from the libnvdimm code which were there to work around the memory-hotplug code's constraints. - "proc/sysctl: add shared variables for range check", Matteo Croce. We have about 250 instances of int zero; ... .extra1 = &zero, in the tree. This is a tree-wide sweep to make all those private "zero"s and "one"s use global variables. Alas, it isn't practical to make those two global integers const" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits) proc/sysctl: add shared variables for range check mm: migrate: remove unused mode argument mm/sparsemem: cleanup 'section number' data types libnvdimm/pfn: stop padding pmem namespaces to section alignment libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields mm/devm_memremap_pages: enable sub-section remap mm: document ZONE_DEVICE memory-model implications mm/sparsemem: support sub-section hotplug mm/sparsemem: prepare for sub-section ranges mm: kill is_dev_zone() helper mm/hotplug: kill is_dev_zone() usage in __remove_pages() mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal mm/sparsemem: add helpers track active portions of a section at boot mm/sparsemem: introduce a SECTION_IS_EARLY flag mm/sparsemem: introduce struct mem_section_usage drivers/base/memory.c: get rid of find_memory_block_hinted() mm/memory_hotplug: move and simplify walk_memory_blocks() mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns mm: make register_mem_sect_under_node() static ...
2019-07-18proc/sysctl: add shared variables for range checkMatteo Croce1-2/+1
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18Merge tag 'nfs-for-5.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds18-508/+845
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - SUNRPC: Ensure bvecs are re-synced when we re-encode the RPC request - Fix an Oops in ff_layout_track_ds_error due to a PTR_ERR() dereference - Revert buggy NFS readdirplus optimisation - NFSv4: Handle the special Linux file open access mode - pnfs: Fix a problem where we gratuitously start doing I/O through the MDS Features: - Allow NFS client to set up multiple TCP connections to the server using a new 'nconnect=X' mount option. Queue length is used to balance load. - Enhance statistics reporting to report on all transports when using multiple connections. - Speed up SUNRPC by removing bh-safe spinlocks - Add a mechanism to allow NFSv4 to request that containers set a unique per-host identifier for when the hostname is not set. - Ensure NFSv4 updates the lease_time after a clientid update Bugfixes and cleanup: - Fix use-after-free in rpcrdma_post_recvs - Fix a memory leak when nfs_match_client() is interrupted - Fix buggy file access checking in NFSv4 open for execute - disable unsupported client side deduplication - Fix spurious client disconnections - Fix occasional RDMA transport deadlock - Various RDMA cleanups - Various tracepoint fixes - Fix the TCP callback channel to guarantee the server can actually send the number of callback requests that was negotiated at mount time" * tag 'nfs-for-5.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits) pnfs/flexfiles: Add tracepoints for detecting pnfs fallback to MDS pnfs: Fix a problem where we gratuitously start doing I/O through the MDS SUNRPC: Optimise transport balancing code SUNRPC: Ensure the bvecs are reset when we re-encode the RPC request pnfs/flexfiles: Fix PTR_ERR() dereferences in ff_layout_track_ds_error NFSv4: Don't use the zero stateid with layoutget SUNRPC: Fix up backchannel slot table accounting SUNRPC: Fix initialisation of struct rpc_xprt_switch SUNRPC: Skip zero-refcount transports SUNRPC: Replace division by multiplication in calculation of queue length NFSv4: Validate the stateid before applying it to state recovery nfs4.0: Refetch lease_time after clientid update nfs4: Rename nfs41_setup_state_renewal nfs4: Make nfs4_proc_get_lease_time available for nfs4.0 nfs: Fix copy-and-paste error in debug message NFS: Replace 16 seq_printf() calls by seq_puts() NFS: Use seq_putc() in nfs_show_stats() Revert "NFS: readdirplus optimization by cache mechanism" (memleak) SUNRPC: Fix transport accounting when caller specifies an rpc_xprt NFS: Record task, client ID, and XID in xdr_status trace points ...
2019-07-18SUNRPC: Optimise transport balancing codeTrond Myklebust1-29/+38
Moves the balancing code to avoid doing cursor changes on every search iteration. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-18SUNRPC: Ensure the bvecs are reset when we re-encode the RPC requestTrond Myklebust3-2/+4
The bvec tracks the list of pages, so if the number of pages changes due to a re-encode, we need to reset the bvec as well. Fixes: 277e4ab7d530 ("SUNRPC: Simplify TCP receive code by switching...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.20+
2019-07-18SUNRPC: Fix up backchannel slot table accountingTrond Myklebust7-19/+46
Add a per-transport maximum limit in the socket case, and add helpers to allow the NFSv4 code to discover that limit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-18SUNRPC: Fix initialisation of struct rpc_xprt_switchTrond Myklebust1-1/+3
Ensure that we do initialise the fields xps_nactive, xps_queuelen and xps_net. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-16SUNRPC: Skip zero-refcount transportsTrond Myklebust1-1/+19
When looking for the next transport to use for an RPC call, skip those that are in the process of being destroyed and that have a zero refcount. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-16SUNRPC: Replace division by multiplication in calculation of queue lengthTrond Myklebust1-3/+2
When checking whether or not a particular xprt queue length is shorter than the average queue length for all xprts, prefer to use multiplication rather than division for performance reasons. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-12Merge tag 'driver-core-5.3-rc1' of ↵Linus Torvalds1-55/+11
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and debugfs updates from Greg KH: "Here is the "big" driver core and debugfs changes for 5.3-rc1 It's a lot of different patches, all across the tree due to some api changes and lots of debugfs cleanups. Other than the debugfs cleanups, in this set of changes we have: - bus iteration function cleanups - scripts/get_abi.pl tool to display and parse Documentation/ABI entries in a simple way - cleanups to Documenatation/ABI/ entries to make them parse easier due to typos and other minor things - default_attrs use for some ktype users - driver model documentation file conversions to .rst - compressed firmware file loading - deferred probe fixes All of these have been in linux-next for a while, with a bunch of merge issues that Stephen has been patient with me for" * tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (102 commits) debugfs: make error message a bit more verbose orangefs: fix build warning from debugfs cleanup patch ubifs: fix build warning after debugfs cleanup patch driver: core: Allow subsystems to continue deferring probe drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT arch_topology: Remove error messages on out-of-memory conditions lib: notifier-error-inject: no need to check return value of debugfs_create functions swiotlb: no need to check return value of debugfs_create functions ceph: no need to check return value of debugfs_create functions sunrpc: no need to check return value of debugfs_create functions ubifs: no need to check return value of debugfs_create functions orangefs: no need to check return value of debugfs_create functions nfsd: no need to check return value of debugfs_create functions lib: 842: no need to check return value of debugfs_create functions debugfs: provide pr_fmt() macro debugfs: log errors when something goes wrong drivers: s390/cio: Fix compilation warning about const qualifiers drivers: Add generic helper to match by of_node driver_find_device: Unify the match function with class_find_device() bus_find_device: Unify the match callback with class_find_device ...
2019-07-12SUNRPC: Fix transport accounting when caller specifies an rpc_xprtTrond Myklebust2-23/+22
Ensure that we do the required accounting for the round robin queue when the caller to rpc_init_task() has passed in a transport to be used. Reported-by: Olga Kornievskaia <aglo@umich.edu> Reported-by: Neil Brown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-12Merge tag 'nfs-rdma-for-5.3-1' of ↵Trond Myklebust8-332/+441
git://git.linux-nfs.org/projects/anna/linux-nfs NFSoRDMA client updates for 5.3 New features: - Add a way to place MRs back on the free list - Reduce context switching - Add new trace events Bugfixes and cleanups: - Fix a BUG when tracing is enabled with NFSv4.1 - Fix a use-after-free in rpcrdma_post_recvs - Replace use of xdr_stream_pos in rpcrdma_marshal_req - Fix occasional transport deadlock - Fix show_nfs_errors macros, other tracing improvements - Remove RPCRDMA_REQ_F_PENDING and fr_state - Various simplifications and refactors
2019-07-11Merge tag 'scsi-sg' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-2/+3
Pull SCSI scatter-gather list updates from James Bottomley: "This topic branch covers a fundamental change in how our sg lists are allocated to make mq more efficient by reducing the size of the preallocated sg list. This necessitates a large number of driver changes because the previous guarantee that if a driver specified SG_ALL as the size of its scatter list, it would get a non-chained list and didn't need to bother with scatterlist iterators is now broken and every driver *must* use scatterlist iterators. This was broken out as a separate topic because we need to convert all the drivers before pulling the trigger and unconverted drivers kept being found, necessitating a rebase" * tag 'scsi-sg' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (21 commits) scsi: core: don't preallocate small SGL in case of NO_SG_CHAIN scsi: lib/sg_pool.c: clear 'first_chunk' in case of no preallocation scsi: core: avoid preallocating big SGL for data scsi: core: avoid preallocating big SGL for protection information scsi: lib/sg_pool.c: improve APIs for allocating sg pool scsi: esp: use sg helper to iterate over scatterlist scsi: NCR5380: use sg helper to iterate over scatterlist scsi: wd33c93: use sg helper to iterate over scatterlist scsi: ppa: use sg helper to iterate over scatterlist scsi: pcmcia: nsp_cs: use sg helper to iterate over scatterlist scsi: imm: use sg helper to iterate over scatterlist scsi: aha152x: use sg helper to iterate over scatterlist scsi: s390: zfcp_fc: use sg helper to iterate over scatterlist scsi: staging: unisys: visorhba: use sg helper to iterate over scatterlist scsi: usb: image: microtek: use sg helper to iterate over scatterlist scsi: pmcraid: use sg helper to iterate over scatterlist scsi: ipr: use sg helper to iterate over scatterlist scsi: mvumi: use sg helper to iterate over scatterlist scsi: lpfc: use sg helper to iterate over scatterlist scsi: advansys: use sg helper to iterate over scatterlist ...
2019-07-10Merge tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2-2/+1
Pull nfsd updates from Bruce Fields: "Highlights: - Add a new /proc/fs/nfsd/clients/ directory which exposes some long-requested information about NFSv4 clients (like open files) and allows forced revocation of client state. - Replace the global duplicate reply cache by a cache per network namespace; previously, a request in one network namespace could incorrectly match an entry from another, though we haven't seen this in production. This is the last remaining container bug that I'm aware of; at this point you should be able to run separate nfsd's in each network namespace, each with their own set of exports, and everything should work. - Cleanup and modify lock code to show the pid of lockd as the owner of NLM locks. This is the correct version of the bugfix originally attempted in b8eee0e90f97 ("lockd: Show pid of lockd for remote locks")" * tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linux: (34 commits) nfsd: Make __get_nfsdfs_client() static nfsd: Make two functions static nfsd: Fix misuse of strlcpy sunrpc/cache: remove the exporting of cache_seq_next nfsd: decode implementation id nfsd: create xdr_netobj_dup helper nfsd: allow forced expiration of NFSv4 clients nfsd: create get_nfsdfs_clp helper nfsd4: show layout stateids nfsd: show lock and deleg stateids nfsd4: add file to display list of client's opens nfsd: add more information to client info file nfsd: escape high characters in binary data nfsd: copy client's address including port number to cl_addr nfsd4: add a client info file nfsd: make client/ directory names small ints nfsd: add nfsd/clients directory nfsd4: use reference count to free client nfsd: rename cl_refcount nfsd: persist nfsd filesystem across mounts ...
2019-07-10Merge tag 'fsnotify_for_v5.3-rc1' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "This contains cleanups of the fsnotify name removal hook and also a patch to disable fanotify permission events for 'proc' filesystem" * tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: get rid of fsnotify_nameremove() fsnotify: move fsnotify_nameremove() hook out of d_delete() configfs: call fsnotify_rmdir() hook debugfs: call fsnotify_{unlink,rmdir}() hooks debugfs: simplify __debugfs_remove_file() devpts: call fsnotify_unlink() hook tracefs: call fsnotify_{unlink,rmdir}() hooks rpc_pipefs: call fsnotify_{unlink,rmdir}() hooks btrfs: call fsnotify_rmdir() hook fsnotify: add empty fsnotify_{unlink,rmdir}() hooks fanotify: Disallow permission events for proc filesystem
2019-07-09xprtrdma: Modernize ops->connectChuck Lever5-36/+87
Adapt and apply changes that were made to the TCP socket connect code. See the following commits for details on the purpose of these changes: Commit 7196dbb02ea0 ("SUNRPC: Allow changing of the TCP timeout parameters on the fly") Commit 3851f1cdb2b8 ("SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout") Commit 02910177aede ("SUNRPC: Fix reconnection timeouts") Some common transport code is moved to xprt.c to satisfy the code duplication police. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>