summaryrefslogtreecommitdiffstats
path: root/net/sunrpc
AgeCommit message (Collapse)AuthorFilesLines
2011-03-27NFS: Ensure that rpc_release_resources_task() can be called twice.OGAWA Hirofumi1-1/+3
BUG: atomic_dec_and_test(): -1: atomic counter underflow at: Pid: 2827, comm: mount.nfs Not tainted 2.6.38 #1 Call Trace: [<ffffffffa02223a0>] ? put_rpccred+0x44/0x14e [sunrpc] [<ffffffffa021bbe9>] ? rpc_ping+0x4e/0x58 [sunrpc] [<ffffffffa021c4a5>] ? rpc_create+0x481/0x4fc [sunrpc] [<ffffffffa022298a>] ? rpcauth_lookup_credcache+0xab/0x22d [sunrpc] [<ffffffffa028be8c>] ? nfs_create_rpc_client+0xa6/0xeb [nfs] [<ffffffffa028c660>] ? nfs4_set_client+0xc2/0x1f9 [nfs] [<ffffffffa028cd3c>] ? nfs4_create_server+0xf2/0x2a6 [nfs] [<ffffffffa0295d07>] ? nfs4_remote_mount+0x4e/0x14a [nfs] [<ffffffff810dd570>] ? vfs_kern_mount+0x6e/0x133 [<ffffffffa029605a>] ? nfs_do_root_mount+0x76/0x95 [nfs] [<ffffffffa029643d>] ? nfs4_try_mount+0x56/0xaf [nfs] [<ffffffffa0297434>] ? nfs_get_sb+0x435/0x73c [nfs] [<ffffffff810dd59b>] ? vfs_kern_mount+0x99/0x133 [<ffffffff810dd693>] ? do_kern_mount+0x48/0xd8 [<ffffffff810f5b75>] ? do_mount+0x6da/0x741 [<ffffffff810f5c5f>] ? sys_mount+0x83/0xc0 [<ffffffff8100293b>] ? system_call_fastpath+0x16/0x1b Well, so, I think this is real bug of nfs codes somewhere. With some review, the code rpc_call_sync() rpc_run_task rpc_execute() __rpc_execute() rpc_release_task() rpc_release_resources_task() put_rpccred() <= release cred rpc_put_task rpc_do_put_task() rpc_release_resources_task() put_rpccred() <= release cred again seems to be release cred unintendedly. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-25Merge branch 'nfs-for-2.6.39' of ↵Linus Torvalds2-0/+40
git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (28 commits) Cleanup XDR parsing for LAYOUTGET, GETDEVICEINFO NFSv4.1 convert layoutcommit sync to boolean NFSv4.1 pnfs_layoutcommit_inode fixes NFS: Determine initial mount security NFS: use secinfo when crossing mountpoints NFS: Add secinfo procedure NFS: lookup supports alternate client NFS: convert call_sync() to a function NFSv4.1 remove temp code that prevented ds commits NFSv4.1: layoutcommit NFSv4.1: filelayout driver specific code for COMMIT NFSv4.1: remove GETATTR from ds commits NFSv4.1: add generic layer hooks for pnfs COMMIT NFSv4.1: alloc and free commit_buckets NFSv4.1: shift filelayout_free_lseg NFSv4.1: pull out code from nfs_commit_release NFSv4.1: pull error handling out of nfs_commit_list NFSv4.1: add callback to nfs4_commit_done NFSv4.1: rearrange nfs_commit_rpcsetup NFSv4.1: don't send COMMIT to ds for data sync writes ...
2011-03-24Merge branch 'nfs-for-2.6.39' into nfs-for-nextTrond Myklebust1-0/+2
2011-03-24NFS: Determine initial mount securityBryan Schumaker1-0/+16
When sec=<something> is not presented as a mount option, we should attempt to determine what security flavor the server is using. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24NFS: use secinfo when crossing mountpointsBryan Schumaker1-0/+22
A submount may use different security than the parent mount does. We should figure out what sec flavor the submount uses at mount time. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-9/+9
* 'for-2.6.39' of git://linux-nfs.org/~bfields/linux: SUNRPC: Remove resource leak in svc_rdma_send_error() nfsd: wrong index used in inner loop nfsd4: fix comment and remove unused nfsd4_file fields nfs41: make sure nfs server return right ca_maxresponsesize_cached nfsd: fix compile error svcrpc: fix bad argument in unix_domain_find nfsd4: fix struct file leak nfsd4: minor nfs4state.c reshuffling svcrpc: fix rare race on unix_domain creation nfsd41: modify the members value of nfsd4_op_flags nfsd: add proc file listing kernel's gss_krb5 enctypes gss:krb5 only include enctype numbers in gm_upcall_enctypes NFSD, VFS: Remove dead code in nfsd_rename() nfsd: kill unused macro definition locks: use assign_type()
2011-03-22SUNRPC: Never reuse the socket port after an xs_close()Trond Myklebust1-0/+2
If we call xs_close(), we're in one of two situations: - Autoclose, which means we don't expect to resend a request - bind+connect failed, which probably means the port is in use Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2011-03-17Merge branch 'nfs-for-2.6.39' of ↵Linus Torvalds8-96/+120
git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (54 commits) RPC: killing RPC tasks races fixed xprt: remove redundant check SUNRPC: Convert struct rpc_xprt to use atomic_t counters SUNRPC: Ensure we always run the tk_callback before tk_action sunrpc: fix printk format warning xprt: remove redundant null check nfs: BKL is no longer needed, so remove the include NFS: Fix a warning in fs/nfs/idmap.c Cleanup: Factor out some cut-and-paste code. cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions. NFS: account direct-io into task io accounting gss:krb5 only include enctype numbers in gm_upcall_enctypes RPCRDMA: Fix FRMR registration/invalidate handling. RPCRDMA: Fix to XDR page base interpretation in marshalling logic. NFSv4: Send unmapped uid/gids to the server when using auth_sys NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr NFSv4: cleanup idmapper functions to take an nfs_server argument NFSv4: Send unmapped uid/gids to the server if the idmapper fails NFSv4: If the server sends us a numeric uid/gid then accept it NFSv4.1: reject zero layout with zeroed stripe unit ...
2011-03-17SUNRPC: Remove resource leak in svc_rdma_send_error()Jesper Juhl1-0/+1
We leak the memory allocated to 'ctxt' when we return after 'ib_dma_mapping_error()' returns !=0. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-17RPC: killing RPC tasks races fixedStanislav Kinsbursky1-1/+3
RPC task RPC_TASK_QUEUED bit is set must be checked before trying to wake up task rpc_killall_tasks() because task->tk_waitqueue can not be set (equal to NULL). Also, as Trond Myklebust mentioned, such approach (instead of checking tk_waitqueue to NULL) allows us to "optimise away the call to rpc_wake_up_queued_task() altogether for those tasks that aren't queued". Here is an example of dereferencing of tk_waitqueue equal to NULL: CPU 0 CPU 1 CPU 2 -------------------- --------------------- -------------------------- nfs4_run_open_task rpc_run_task rpc_execute rpc_set_active rpc_make_runnable (waiting) rpc_async_schedule nfs4_open_prepare nfs_wait_on_sequence nfs_umount_begin rpc_killall_tasks rpc_wake_up_task rpc_wake_up_queued_task spin_lock(tk_waitqueue == NULL) BUG() rpc_sleep_on spin_lock(&q->lock) __rpc_sleep_on task->tk_waitqueue = q Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-17xprt: remove redundant checkj223yang@asset.uwaterloo.ca1-1/+1
remove redundant check. Signed-off-by: Jinqiu Yang <crindy646@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-17SUNRPC: Convert struct rpc_xprt to use atomic_t countersTrond Myklebust1-8/+8
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-17SUNRPC: Ensure we always run the tk_callback before tk_actionTrond Myklebust1-8/+6
This fixes a race in which the task->tk_callback() puts the rpc_task to sleep, setting a new callback. Under certain circumstances, the current code may end up executing the task->tk_action before it gets round to the callback. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2011-03-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds1-12/+20
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits) bonding: enable netpoll without checking link status xfrm: Refcount destination entry on xfrm_lookup net: introduce rx_handler results and logic around that bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag bonding: wrap slave state work net: get rid of multiple bond-related netdevice->priv_flags bonding: register slave pointer for rx_handler be2net: Bump up the version number be2net: Copyright notice change. Update to Emulex instead of ServerEngines e1000e: fix kconfig for crc32 dependency netfilter ebtables: fix xt_AUDIT to work with ebtables xen network backend driver bonding: Improve syslog message at device creation time bonding: Call netif_carrier_off after register_netdevice bonding: Incorrect TX queue offset net_sched: fix ip_tos2prio xfrm: fix __xfrm_route_forward() be2net: Fix UDP packet detected status in RX compl Phonet: fix aligned-mode pipe socket buffer header reserve netxen: support for GbE port settings ... Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c with the staging updates.
2011-03-16Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-1/+1
* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix build failure introduced by s/freezeable/freezable/ workqueue: add system_freezeable_wq rds/ib: use system_wq instead of rds_ib_fmr_wq net/9p: replace p9_poll_task with a work net/9p: use system_wq instead of p9_mux_wq xfs: convert to alloc_workqueue() reiserfs: make commit_wq use the default concurrency level ocfs2: use system_wq instead of ocfs2_quota_wq ext4: convert to alloc_workqueue() scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path scsi/be2iscsi,qla2xxx: convert to alloc_workqueue() misc/iwmc3200top: use system_wq instead of dedicated workqueues i2o: use alloc_workqueue() instead of create_workqueue() acpi: kacpi*_wq don't need WQ_MEM_RECLAIM fs/aio: aio_wq isn't used in memory reclaim path input/tps6507x-ts: use system_wq instead of dedicated workqueue cpufreq: use system_wq instead of dedicated workqueues wireless/ipw2x00: use system_wq instead of dedicated workqueues arm/omap: use system_wq in mailbox workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER
2011-03-15sunrpc: fix printk format warningRandy Dunlap1-1/+2
Fix printk format build warning: net/sunrpc/xprtrdma/verbs.c:1463: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'dma_addr_t' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-15xprt: remove redundant null checkj223yang@asset.uwaterloo.ca1-4/+3
'req' is dereferenced before checked for NULL. The patch simply removes the check. Signed-off-by: Jinqiu Yang<crindy646@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11gss:krb5 only include enctype numbers in gm_upcall_enctypesKevin Coffman2-2/+2
Make the value in gm_upcall_enctypes just the enctype values. This allows the values to be used more easily elsewhere. Signed-off-by: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11RPCRDMA: Fix FRMR registration/invalidate handling.Tom Tucker2-8/+45
When the rpc_memreg_strategy is 5, FRMR are used to map RPC data. This mode uses an FRMR to map the RPC data, then invalidates (i.e. unregisers) the data in xprt_rdma_free. These FRMR are used across connections on the same mount, i.e. if the connection goes away on an idle timeout and reconnects later, the FRMR are not destroyed and recreated. This creates a problem for transport errors because the WR that invalidate an FRMR may be flushed (i.e. fail) leaving the FRMR valid. When the FRMR is later used to map an RPC it will fail, tearing down the transport and starting over. Over time, more and more of the FRMR pool end up in the wrong state resulting in seemingly random disconnects. This fix keeps track of the FRMR state explicitly by setting it's state based on the successful completion of a reg/inv WR. If the FRMR is ever used and found to be in the wrong state, an invalidate WR is prepended, re-syncing the FRMR state and avoiding the connection loss. Signed-off-by: Tom Tucker <tom@ogc.us> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11RPCRDMA: Fix to XDR page base interpretation in marshalling logic.Tom Tucker1-44/+42
The RPCRDMA marshalling logic assumed that xdr->page_base was an offset into the first page of xdr->page_list. It is in fact an offset into the xdr->page_list itself, that is, it selects the first page in the page_list and the offset into that page. The symptom depended in part on the rpc_memreg_strategy, if it was FRMR, or some other one-shot mapping mode, the connection would get torn down on a base and bounds error. When the badly marshalled RPC was retransmitted it would reconnect, get the error, and tear down the connection again in a loop forever. This resulted in a hung-mount. For the other modes, it would result in silent data corruption. This bug is most easily reproduced by writing more data than the filesystem has space for. This fix corrects the page_base assumption and otherwise simplifies the iov mapping logic. Signed-off-by: Tom Tucker <tom@ogc.us> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11NFSv4.1: filelayout async error handlerAndy Adamson1-0/+8
Use our own async error handler. Mark the layout as failed and retry i/o through the MDS on specified errors. Update the mds_offset in nfs_readpage_retry so that a failed short-read retry to a DS gets correctly resent through the MDS. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11RPC: clarify rpc_run_task error handlingFred Isaman2-12/+0
rpc_run_task can only fail if it is not passed in a preallocated task. However, that is not at all clear with the current code. So remove several impossible to occur failure checks. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11RPC: remove check for impossible condition in rpc_make_runnableFred Isaman1-8/+1
queue_work() only returns 0 or 1, never a negative value. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10sunrpc: Propagate errors from xs_bind() through xs_create_sock()Ben Hutchings1-1/+2
xs_create_sock() is supposed to return a pointer or an ERR_PTR-encoded error, but it currently returns 0 if xs_bind() fails. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Cc: stable@kernel.org [v2.6.37] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10SUNRPC: Remove resource leak in svc_rdma_send_error()Jesper Juhl1-0/+1
We leak the memory allocated to 'ctxt' when we return after 'ib_dma_mapping_error()' returns !=0. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10SUNRPC: Close a race in __rpc_wait_for_completion_task()Trond Myklebust1-14/+61
Although they run as rpciod background tasks, under normal operation (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck() and nfs4_do_close() want to be fully synchronous. This means that when we exit, we want all references to the rpc_task to be gone, and we want any dentry references etc. held by that task to be released. For this reason these functions call __rpc_wait_for_completion_task(), followed by rpc_put_task() in the expectation that the latter will be releasing the last reference to the rpc_task, and thus ensuring that the callback_ops->rpc_release() has been called synchronously. This patch fixes a race which exists due to the fact that rpciod calls rpc_complete_task() (in order to wake up the callers of __rpc_wait_for_completion_task()) and then subsequently calls rpc_put_task() without ensuring that these two steps are done atomically. In order to avoid adding new spin locks, the patch uses the existing waitqueue spin lock to order the rpc_task reference count releases between the waiting process and rpciod. The common case where nobody is waiting for completion is optimised for by checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task reference count is 1: in those cases we drop trying to grab the spin lock, and immediately free up the rpc_task. Those few processes that need to put the rpc_task from inside an asynchronous context and that do not care about ordering are given a new helper: rpc_put_task_async(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-09svcrpc: fix bad argument in unix_domain_findJ. Bruce Fields1-1/+1
"After merging the nfsd tree, today's linux-next build (powerpc ppc64_defconfig) produced this warning: net/sunrpc/svcauth_unix.c: In function 'unix_domain_find': net/sunrpc/svcauth_unix.c:58: warning: passing argument 1 of +'svcauth_unix_domain_release' from incompatible pointer type net/sunrpc/svcauth_unix.c:41: note: expected 'struct auth_domain *' but argument +is of type 'struct unix_domain *' Introduced by commit 8b3e07ac908d ("svcrpc: fix rare race on unix_domain creation")." Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-08svcrpc: fix rare race on unix_domain creationJ. Bruce Fields1-9/+9
Note that "new" here is not yet fully initialized; auth_domain_put should be called only on auth_domains that have actually been added to the hash. Before this fix, two attempts to add the same domain at once could cause the hlist_del in auth_domain_put to fail. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-07gss:krb5 only include enctype numbers in gm_upcall_enctypesKevin Coffman2-2/+2
Make the value in gm_upcall_enctypes just the enctype values. This allows the values to be used more easily elsewhere. Signed-off-by: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-22net: add __rcu annotations to sk_wq and wqEric Dumazet1-12/+20
Add proper RCU annotations/verbs to sk_wq and wq members Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access) Fix sunrpc sk_sleep() abuse too Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-21Merge branch 'master' into for-2.6.39Tejun Heo1-3/+1
2011-01-25NFS do not find client in NFSv4 pg_authenticateAndy Adamson1-3/+1
The information required to find the nfs_client cooresponding to the incoming back channel request is contained in the NFS layer. Perform minimal checking in the RPC layer pg_authenticate method, and push more detailed checking into the NFS layer where the nfs_client can be found. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-25workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUERTejun Heo1-1/+1
WQ_RESCUER is now an internal flag and should only be used in the workqueue implementation proper. Use WQ_MEM_RECLAIM instead. This doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de>
2011-01-14Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linuxLinus Torvalds10-94/+141
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits) nfsd4: fix callback restarting nfsd: break lease on unlink, link, and rename nfsd4: break lease on nfsd setattr nfsd: don't support msnfs export option nfsd4: initialize cb_per_client nfsd4: allow restarting callbacks nfsd4: simplify nfsd4_cb_prepare nfsd4: give out delegations more quickly in 4.1 case nfsd4: add helper function to run callbacks nfsd4: make sure sequence flags are set after destroy_session nfsd4: re-probe callback on connection loss nfsd4: set sequence flag when backchannel is down nfsd4: keep finer-grained callback status rpc: allow xprt_class->setup to return a preexisting xprt rpc: keep backchannel xprt as long as server connection rpc: move sk_bc_xprt to svc_xprt nfsd4: allow backchannel recovery nfsd4: support BIND_CONN_TO_SESSION nfsd4: modify session list under cl_lock Documentation: fl_mylease no longer exists ... Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work. The vfs-scale work touched some msnfs cases, and this merge removes support for that entirely, so the conflict was trivial to resolve.
2011-01-11Merge branch 'nfs-for-2.6.38' of ↵Linus Torvalds9-196/+345
git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (89 commits) NFS fix the setting of exchange id flag NFS: Don't use vm_map_ram() in readdir NFSv4: Ensure continued open and lockowner name uniqueness NFS: Move cl_delegations to the nfs_server struct NFS: Introduce nfs_detach_delegations() NFS: Move cl_state_owners and related fields to the nfs_server struct NFS: Allow walking nfs_client.cl_superblocks list outside client.c pnfs: layout roc code pnfs: update nfs4_callback_recallany to handle layouts pnfs: add CB_LAYOUTRECALL handling pnfs: CB_LAYOUTRECALL xdr code pnfs: change lo refcounting to atomic_t pnfs: check that partial LAYOUTGET return is ignored pnfs: add layout to client list before sending rpc pnfs: serialize LAYOUTGET(openstateid) pnfs: layoutget rpc code cleanup pnfs: change how lsegs are removed from layout list pnfs: change layout state seqlock to a spinlock pnfs: add prefix to struct pnfs_layout_hdr fields pnfs: add prefix to struct pnfs_layout_segment fields ...
2011-01-11rpc: allow xprt_class->setup to return a preexisting xprtJ. Bruce Fields2-9/+12
This allows us to reuse the xprt associated with a server connection if one has already been set up. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11rpc: keep backchannel xprt as long as server connectionJ. Bruce Fields3-11/+29
Multiple backchannels can share the same tcp connection; from rfc 5661 section 2.10.3.1: A connection's association with a session is not exclusive. A connection associated with the channel(s) of one session may be simultaneously associated with the channel(s) of other sessions including sessions associated with other client IDs. However, multiple backchannels share a connection, they must all share the same xid stream (hence the same rpc_xprt); the only way we have to match replies with calls at the rpc layer is using the xid. So, keep the rpc_xprt around as long as the connection lasts, in case we're asked to use the connection as a backchannel again. Requests to create new backchannel clients over a given server connection should results in creating new clients that reuse the existing rpc_xprt. But to start, just reject attempts to associate multiple rpc_xprt's with the same underlying bc_xprt. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11rpc: move sk_bc_xprt to svc_xprtJ. Bruce Fields2-5/+7
This seems obviously transport-level information even if it's currently used only by the server socket code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-10Merge branch 'bugfixes' into nfs-for-2.6.38Trond Myklebust1-31/+124
Conflicts: fs/nfs/nfs2xdr.c fs/nfs/nfs3xdr.c fs/nfs/nfs4xdr.c
2011-01-10NFS: Don't use vm_map_ram() in readdirTrond Myklebust1-31/+124
vm_map_ram() is not available on NOMMU platforms, and causes trouble on incoherrent architectures such as ARM when we access the page data through both the direct and the virtual mapping. The alternative is to use the direct mapping to access page data for the case when we are not crossing a page boundary, but to copy the data into a linear scratch buffer when we are accessing data that spans page boundaries. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Cc: stable@kernel.org [2.6.37]
2011-01-07Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-1/+1
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (33 commits) usb: don't use flush_scheduled_work() speedtch: don't abuse struct delayed_work media/video: don't use flush_scheduled_work() media/video: explicitly flush request_module work ioc4: use static work_struct for ioc4_load_modules() init: don't call flush_scheduled_work() from do_initcalls() s390: don't use flush_scheduled_work() rtc: don't use flush_scheduled_work() mmc: update workqueue usages mfd: update workqueue usages dvb: don't use flush_scheduled_work() leds-wm8350: don't use flush_scheduled_work() mISDN: don't use flush_scheduled_work() macintosh/ams: don't use flush_scheduled_work() vmwgfx: don't use flush_scheduled_work() tpm: don't use flush_scheduled_work() sonypi: don't use flush_scheduled_work() hvsi: don't use flush_scheduled_work() xen: don't use flush_scheduled_work() gdrom: don't use flush_scheduled_work() ... Fixed up trivial conflict in drivers/media/video/bt8xx/bttv-input.c as per Tejun.
2011-01-07Merge branch 'vfs-scale-working' of ↵Linus Torvalds1-3/+11
git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits) fs: scale mntget/mntput fs: rename vfsmount counter helpers fs: implement faster dentry memcmp fs: prefetch inode data in dcache lookup fs: improve scalability of pseudo filesystems fs: dcache per-inode inode alias locking fs: dcache per-bucket dcache hash locking bit_spinlock: add required includes kernel: add bl_list xfs: provide simple rcu-walk ACL implementation btrfs: provide simple rcu-walk ACL implementation ext2,3,4: provide simple rcu-walk ACL implementation fs: provide simple rcu-walk generic_check_acl implementation fs: provide rcu-walk aware permission i_ops fs: rcu-walk aware d_revalidate method fs: cache optimise dentry and inode for rcu-walk fs: dcache reduce branches in lookup path fs: dcache remove d_mounted fs: fs_struct use seqlock fs: rcu-walk for path lookup ...
2011-01-07fs: dcache reduce branches in lookup pathNick Piggin1-1/+1
Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: icache RCU free inodesNick Piggin1-1/+9
RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: change d_delete semanticsNick Piggin1-1/+1
Change d_delete from a dentry deletion notification to a dentry caching advise, more like ->drop_inode. Require it to be constant and idempotent, and not take d_lock. This is how all existing filesystems use the callback anyway. This makes fine grained dentry locking of dput and dentry lru scanning much simpler. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-06NFS rename client back channel transport fieldAndy Adamson2-2/+2
Differentiate from server backchannel Signed-off-by: Andy Adamson <andros@netapp.com> Acked-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06NFS associate sessionid with callback connectionAndy Adamson1-1/+3
The sessions based callback service is started prior to the CREATE_SESSION call so that it can handle CB_NULL requests which can be sent before the CREATE_SESSION call returns and the session ID is known. Set the callback sessionid after a sucessful CREATE_SESSION. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06SUNRPC register and unregister the back channel transportAndy Adamson1-0/+20
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06SUNRPC new transport for the NFSv4.1 shared back channelAndy Adamson2-24/+62
Move the current sock create and destroy routines into the new transport ops. Back channel socket will be destroyed by the svc_closs_all call in svc_destroy. Added check: only TCP supported on shared back channel. Signed-off-by: Andy Adamson <andros@netapp.com> Acked-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06SUNRPC fix bc_send printAndy Adamson1-1/+1
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>