Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable patches:
- Fix a situation where the client uses the wrong (zero) stateid.
- Fix a memory leak in nfs_do_recoalesce
Bugfixes:
- Plug a memory leak when ->prepare_layoutcommit fails
- Fix an Oops in the NFSv4 open code
- Fix a backchannel deadlock
- Fix a livelock in sunrpc when sendmsg fails due to low memory
availability
- Don't revalidate the mapping if both size and change attr are up to
date
- Ensure we don't miss a file extension when doing pNFS
- Several fixes to handle NFSv4.1 sequence operation status bits
correctly
- Several pNFS layout return bugfixes"
* tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits)
nfs: Fix an oops caused by using other thread's stack space in ASYNC mode
nfs: plug memory leak when ->prepare_layoutcommit fails
SUNRPC: Report TCP errors to the caller
sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable.
NFSv4.2: handle NFS-specific llseek errors
NFS: Don't clear desc->pg_moreio in nfs_do_recoalesce()
NFS: Fix a memory leak in nfs_do_recoalesce
NFS: nfs_mark_for_revalidate should always set NFS_INO_REVAL_PAGECACHE
NFS: Remove the "NFS_CAP_CHANGE_ATTR" capability
NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialised
NFS: Don't revalidate the mapping if both size and change attr are up to date
NFSv4/pnfs: Ensure we don't miss a file extension
NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked
SUNRPC: xprt_complete_bc_request must also decrement the free slot count
SUNRPC: Fix a backchannel deadlock
pNFS: Don't throw out valid layout segments
pNFS: pnfs_roc_drain() fix a race with open
pNFS: Fix races between return-on-close and layoutreturn.
pNFS: pnfs_roc_drain should return 'true' when sleeping
pNFS: Layoutreturn must invalidate all existing layout segments.
...
|
|
An oops caused by using other thread's stack space in sunrpc ASYNC sending thread.
[ 9839.007187] ------------[ cut here ]------------
[ 9839.007923] kernel BUG at fs/nfs/nfs4xdr.c:910!
[ 9839.008069] invalid opcode: 0000 [#1] SMP
[ 9839.008069] Modules linked in: blocklayoutdriver rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm joydev iosf_mbi crct10dif_pclmul snd_timer crc32_pclmul crc32c_intel ghash_clmulni_intel snd soundcore ppdev pvpanic parport_pc i2c_piix4 serio_raw virtio_balloon parport acpi_cpufreq nfsd nfs_acl lockd grace auth_rpcgss sunrpc qxl drm_kms_helper virtio_net virtio_console virtio_blk ttm drm virtio_pci virtio_ring virtio ata_generic pata_acpi
[ 9839.008069] CPU: 0 PID: 308 Comm: kworker/0:1H Not tainted 4.0.0-0.rc4.git1.3.fc23.x86_64 #1
[ 9839.008069] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 9839.008069] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ 9839.008069] task: ffff8800d8b4d8e0 ti: ffff880036678000 task.ti: ffff880036678000
[ 9839.008069] RIP: 0010:[<ffffffffa0339cc9>] [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4]
[ 9839.008069] RSP: 0018:ffff88003667ba58 EFLAGS: 00010246
[ 9839.008069] RAX: 0000000000000000 RBX: 000000001fc15e18 RCX: ffff8800c0193800
[ 9839.008069] RDX: ffff8800e4ae3f24 RSI: 000000001fc15e2c RDI: ffff88003667bcd0
[ 9839.008069] RBP: ffff88003667ba58 R08: ffff8800d9173008 R09: 0000000000000003
[ 9839.008069] R10: ffff88003667bcd0 R11: 000000000000000c R12: 0000000000010000
[ 9839.008069] R13: ffff8800d9173350 R14: 0000000000000000 R15: ffff8800c0067b98
[ 9839.008069] FS: 0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
[ 9839.008069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9839.008069] CR2: 00007f988c9c8bb0 CR3: 00000000d99b6000 CR4: 00000000000407f0
[ 9839.008069] Stack:
[ 9839.008069] ffff88003667bbc8 ffffffffa03412c5 00000000c6c55680 ffff880000000003
[ 9839.008069] 0000000000000088 00000010c6c55680 0001000000000002 ffffffff816e87e9
[ 9839.008069] 0000000000000000 00000000477290e2 ffff88003667bab8 ffffffff81327ba3
[ 9839.008069] Call Trace:
[ 9839.008069] [<ffffffffa03412c5>] encode_attrs+0x435/0x530 [nfsv4]
[ 9839.008069] [<ffffffff816e87e9>] ? inet_sendmsg+0x69/0xb0
[ 9839.008069] [<ffffffff81327ba3>] ? selinux_socket_sendmsg+0x23/0x30
[ 9839.008069] [<ffffffff8164c1df>] ? do_sock_sendmsg+0x9f/0xc0
[ 9839.008069] [<ffffffff8164c278>] ? kernel_sendmsg+0x58/0x70
[ 9839.008069] [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc]
[ 9839.008069] [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc]
[ 9839.008069] [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4]
[ 9839.008069] [<ffffffffa03419a5>] encode_open+0x2d5/0x340 [nfsv4]
[ 9839.008069] [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4]
[ 9839.008069] [<ffffffffa011ab89>] ? xdr_encode_opaque+0x19/0x20 [sunrpc]
[ 9839.008069] [<ffffffffa0339cfb>] ? encode_string+0x2b/0x40 [nfsv4]
[ 9839.008069] [<ffffffffa0341bf3>] nfs4_xdr_enc_open+0xb3/0x140 [nfsv4]
[ 9839.008069] [<ffffffffa0110a4c>] rpcauth_wrap_req+0xac/0xf0 [sunrpc]
[ 9839.008069] [<ffffffffa01017db>] call_transmit+0x18b/0x2d0 [sunrpc]
[ 9839.008069] [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc]
[ 9839.008069] [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc]
[ 9839.008069] [<ffffffffa010caa0>] __rpc_execute+0x90/0x460 [sunrpc]
[ 9839.008069] [<ffffffffa010ce85>] rpc_async_schedule+0x15/0x20 [sunrpc]
[ 9839.008069] [<ffffffff810b452b>] process_one_work+0x1bb/0x410
[ 9839.008069] [<ffffffff810b47d3>] worker_thread+0x53/0x470
[ 9839.008069] [<ffffffff810b4780>] ? process_one_work+0x410/0x410
[ 9839.008069] [<ffffffff810b4780>] ? process_one_work+0x410/0x410
[ 9839.008069] [<ffffffff810ba7b8>] kthread+0xd8/0xf0
[ 9839.008069] [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180
[ 9839.008069] [<ffffffff81786418>] ret_from_fork+0x58/0x90
[ 9839.008069] [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180
[ 9839.008069] Code: 00 00 48 c7 c7 21 fa 37 a0 e8 94 1c d6 e0 c6 05 d2 17 05 00 01 8b 03 eb d7 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 89 f3
[ 9839.008069] RIP [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4]
[ 9839.008069] RSP <ffff88003667ba58>
[ 9839.071114] ---[ end trace cc14c03adb522e94 ]---
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
"data" is currently leaked when the prepare_layoutcommit operation
returns an error. Put the cred before taking the spinlock in that
case, take the lock and then goto out_unlock which will drop the
lock and then free "data".
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Handle NFS-specific llseek errors instead of letting them leak out to
userspace.
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Recoalescing does not affect whether or not we've already sent off
I/O, and doing so means that we end up sending a bunch of synchronous
for cases where we actually need to be using unstable writes.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If the function exits early, then we must put those requests that were
not processed back onto the &mirror->pg_list so they can be cleaned up
by nfs_pgio_error().
Fixes: a7d42ddb30997 ("nfs: add mirroring support to pgio layer")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Setting the change attribute has been mandatory for all NFS versions, since
commit 3a1556e8662c ("NFSv2/v3: Simulate the change attribute"). We should
therefore not have anything be conditional on it being set/unset.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
We can't allow caching of data until the change attribute has been
initialised correctly.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If we've ensured that the size and the change attribute are both correct,
then there is no point in marking those attributes as needing revalidation
again. Only do so if we know the size is incorrect and was not updated.
Fixes: f2467b6f64da ("NFS: Clear NFS_INO_REVAL_PAGECACHE when...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
pNFS writes don't return attributes, however that doesn't mean that we
should ignore the fact that they may be extending the file. This patch
ensures that if a write is seen to extend the file, then we always set
an attribute barrier, and update the cached file size.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Otherwise, nfs4_select_rw_stateid() will always return the zero stateid
instead of the correct open stateid.
Fixes: f95549cf24660 ("NFSv4: More CLOSE/OPEN races")
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Now that we have file locking helpers that can deal with an inode
instead of a filp, we can change the NFSv4 locking code to use that
instead.
This should fix the case where we have a filp that is closed while flock
or OFD locks are set on it, and the task is signaled so that it doesn't
wait for the LOCKU reply to come in before the filp is freed. At that
point we can end up with a use-after-free with the current code, which
relies on dereferencing the fl_file in the lock request.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: "J. Bruce Fields" <bfields@fieldses.org>
Tested-by: "J. Bruce Fields" <bfields@fieldses.org>
|
|
This reverts commit db2efec0caba4f81a22d95a34da640b86c313c8e.
William reported that he was seeing instability with this patch, which
is likely due to the fact that it can cause the kernel to take a new
reference to a filp after the last reference has already been put.
Revert this patch for now, as we'll need to fix this in another way.
Cc: stable@vger.kernel.org
Reported-by: William Dauchy <william@gandi.net>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: "J. Bruce Fields" <bfields@fieldses.org>
Tested-by: "J. Bruce Fields" <bfields@fieldses.org>
|
|
It is OK for layout segments to remain hashed even if no-one holds any
references to them, provided that the segments are still valid.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If a process reopens the file before we can send off the CLOSE/DELEGRETURN,
then pnfs_roc_drain() may end up waiting for a new set of layout segments
that are marked as return-on-close, but haven't yet been returned.
Fix this by only waiting for those layout segments that were invalidated in
pnfs_roc().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If one or more of the layout segments reports an error during I/O, then
we may have to send a layoutreturn to report the error back to the NFS
metadata server.
This patch ensures that the return-on-close code can detect the
outstanding layoutreturn, and not preempt it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Also clean up the case where we don't find a return-on-close layout segment.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Ensure that the calls to renew_lease() in open_done() etc. only apply
to session-less versions of NFSv4.x (i.e. NFSv4.0).
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Instead of just kicking off lease recovery, we should look into the
sequence flag errors and handle them.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
RFC5661 states:
The server has encountered an unrecoverable fault with the
backchannel (e.g., it has lost track of the sequence ID for a slot
in the backchannel). The client MUST stop sending more requests
on the session's fore channel, wait for all outstanding requests
to complete on the fore and back channel, and then destroy the
session.
Ensure we do so...
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Try to handle this for now by invalidating all outstanding layouts for this
server and then testing all the open+lock+delegation stateids.
At some later stage, we may want to optimise by separating out the testing of
delegation stateids only, and adding testing of layout stateids.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If the server tells us that only some state has been revoked, then we
need to run the full TEST_STATEID dog and pony show in order to discover
which locks and delegations are still OK. Currently we blow away all
state, which means that we lose all locks!
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"
[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...
|
|
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable patches:
- Fix a crash in the NFSv4 file locking code.
- Fix an fsync() regression, where we were failing to retry I/O in
some circumstances.
- Fix an infinite loop in NFSv4.0 OPEN stateid recovery
- Fix a memory leak when an attempted pnfs fails.
- Fix a memory leak in the backchannel code
- Large hostnames were not supported correctly in NFSv4.1
- Fix a pNFS/flexfiles bug that was impeding error reporting on I/O.
- Fix a couple of credential issues in pNFS/flexfiles
Bugfixes + cleanups:
- Open flag sanity checks in the NFSv4 atomic open codepath
- More NFSv4 delegation related bugfixes
- Various NFSv4.1 backchannel bugfixes and cleanups
- Fix the NFS swap socket code
- Various cleanups of the NFSv4 SETCLIENTID and EXCHANGE_ID code
- Fix a UDP transport deadlock issue
Features:
- More RDMA client transport improvements
- NFSv4.2 LAYOUTSTATS functionality for pnfs flexfiles"
* tag 'nfs-for-4.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (87 commits)
nfs: Remove invalid tk_pid from debug message
nfs: Remove invalid NFS_ATTR_FATTR_V4_REFERRAL checking in nfs4_get_rootfh
nfs: Drop bad comment in nfs41_walk_client_list()
nfs: Remove unneeded micro checking of CONFIG_PROC_FS
nfs: Don't setting FILE_CREATED flags always
nfs: Use remove_proc_subtree() instead remove_proc_entry()
nfs: Remove unused argument in nfs_server_set_fsinfo()
nfs: Fix a memory leak when meeting an unsupported state protect
nfs: take extra reference to fl->fl_file when running a LOCKU operation
NFSv4: When returning a delegation, don't reclaim an incompatible open mode.
NFSv4.2: LAYOUTSTATS is optional to implement
NFSv4.2: Fix up a decoding error in layoutstats
pNFS/flexfiles: Fix the reset of struct pgio_header when resending
pNFS/flexfiles: Turn off layoutcommit for servers that don't need it
pnfs/flexfiles: protect ktime manipulation with mirror lock
nfs: provide pnfs_report_layoutstat when NFS42 is disabled
nfs: verify open flags before allowing open
nfs: always update creds in mirror, even when we have an already connected ds
nfs: fix potential credential leak in ff_layout_update_mirror_cred
pnfs/flexfiles: report layoutstat regularly
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"Main excitement here is Peter Zijlstra's lockless rbtree optimization
to speed module address lookup. He found some abusers of the module
lock doing that too.
A little bit of parameter work here too; including Dan Streetman's
breaking up the big param mutex so writing a parameter can load
another module (yeah, really). Unfortunately that broke the usual
suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
appended too"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
modules: only use mod->param_lock if CONFIG_MODULES
param: fix module param locks when !CONFIG_SYSFS.
rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
module: add per-module param_lock
module: make perm const
params: suppress unused variable error, warn once just in case code changes.
modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
kernel/module.c: avoid ifdefs for sig_enforce declaration
kernel/workqueue.c: remove ifdefs over wq_power_efficient
kernel/params.c: export param_ops_bool_enable_only
kernel/params.c: generalize bool_enable_only
kernel/module.c: use generic module param operaters for sig_enforce
kernel/params: constify struct kernel_param_ops uses
sysfs: tightened sysfs permission checks
module: Rework module_addr_{min,max}
module: Use __module_address() for module_address_lookup()
module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
module: Optimize __module_address() using a latched RB-tree
rbtree: Implement generic latch_tree
seqlock: Introduce raw_read_seqcount_latch()
...
|
|
Before rpc_run_task(), tk_pid is uninitiated as 0 always.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
NFS_ATTR_FATTR_V4_REFERRAL is only set in nfs4_proc_lookup_common.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Commit 7b1f1fd184 "NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list"
have change the logical of the list_for_each_entry().
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Have checking CONFIG_PROC_FS in include/linux/sunrpc/stats.h.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Commit 5bc2afc2b5 "NFSv4: Honour the 'opened' parameter in the atomic_open()
filesystem method" have support the opened arguments now.
Also,
Commit 03da633aa7 "atomic_open: take care of EEXIST in no-open case with
O_CREAT|O_EXCL in fs/namei.c" have change vfs's logical.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Thanks for Al Viro's comments of killing proc_fs_nfs completely.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Commit e38eb6506f "NFS: set_pnfs_layoutdriver() from nfs4_proc_fsinfo()"
have remove the using of mntfh from nfs_server_set_fsinfo().
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Jean reported another crash, similar to the one fixed by feaff8e5b2cf:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
IP: [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache vmw_vsock_vmci_transport vsock cfg80211 rfkill coretemp crct10dif_pclmul ppdev vmw_balloon crc32_pclmul crc32c_intel ghash_clmulni_intel pcspkr vmxnet3 parport_pc i2c_piix4 microcode serio_raw parport nfsd floppy vmw_vmci acpi_cpufreq auth_rpcgss shpchp nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih ata_generic mptbase i2c_core pata_acpi
CPU: 0 PID: 329 Comm: kworker/0:1H Not tainted 4.1.0-rc7+ #2
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
Workqueue: rpciod rpc_async_schedule [sunrpc]
30ec000
RIP: 0010:[<ffffffff8124ef7f>] [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
RSP: 0018:ffff8802330efc08 EFLAGS: 00010296
RAX: ffff8802330efc58 RBX: ffff880097187c80 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffff8802330efc18 R08: ffff88023fc173d8 R09: 3038b7bf00000000
R10: 00002f1a02000000 R11: 3038b7bf00000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8802337a2300 R15: 0000000000000020
FS: 0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000148 CR3: 000000003680f000 CR4: 00000000001407f0
Stack:
ffff880097187c80 ffff880097187cd8 ffff8802330efc98 ffffffff81250281
ffff8802330efc68 ffffffffa013e7df ffff8802330efc98 0000000000000246
ffff8801f6901c00 ffff880233d2b8d8 ffff8802330efc58 ffff8802330efc58
Call Trace:
[<ffffffff81250281>] __posix_lock_file+0x31/0x5e0
[<ffffffffa013e7df>] ? rpc_wake_up_task_queue_locked.part.35+0xcf/0x240 [sunrpc]
[<ffffffff8125088b>] posix_lock_file_wait+0x3b/0xd0
[<ffffffffa03890b2>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
[<ffffffffa0365808>] ? nfs41_sequence_done+0xd8/0x300 [nfsv4]
[<ffffffffa0367525>] do_vfs_lock+0x35/0x40 [nfsv4]
[<ffffffffa03690c1>] nfs4_locku_done+0x81/0x120 [nfsv4]
[<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
[<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
[<ffffffffa013e33c>] rpc_exit_task+0x2c/0x90 [sunrpc]
[<ffffffffa0134400>] ? call_refreshresult+0x170/0x170 [sunrpc]
[<ffffffffa013ece4>] __rpc_execute+0x84/0x410 [sunrpc]
[<ffffffffa013f085>] rpc_async_schedule+0x15/0x20 [sunrpc]
[<ffffffff810add67>] process_one_work+0x147/0x400
[<ffffffff810ae42b>] worker_thread+0x11b/0x460
[<ffffffff810ae310>] ? rescuer_thread+0x2f0/0x2f0
[<ffffffff810b35d9>] kthread+0xc9/0xe0
[<ffffffff81010000>] ? perf_trace_xen_mmu_set_pmd+0xa0/0x160
[<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
[<ffffffff8173c222>] ret_from_fork+0x42/0x70
[<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
Code: a5 81 e8 85 75 e4 ff c6 05 31 ee aa 00 01 eb 98 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 <48> 8b 9f 48 01 00 00 48 85 db 74 08 48 89 d8 5b 41 5c 5d c3 83
RIP [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
RSP <ffff8802330efc08>
CR2: 0000000000000148
---[ end trace 64484f16250de7ef ]---
The problem is almost exactly the same as the one fixed by feaff8e5b2cf.
We must take a reference to the struct file when running the LOCKU
compound to prevent the final fput from running until the operation is
complete.
Reported-by: Jean Spector <jean@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
It is possible to have an active open with one mode, and a delegation
for the same file with a different mode.
In particular, a WR_ONLY open and an RD_ONLY delegation.
This happens if a WR_ONLY open is followed by a RD_ONLY open which
provides a delegation, but is then close.
When returning the delegation, we currently try to claim opens for
every open type (n_rdwr, n_rdonly, n_wronly). As there is no harm
in claiming an open for a mode that we already have, this is often
simplest.
However if the delegation only provides a subset of the modes that we
currently have open, this will produce an error from the server.
So when claiming open modes prior to returning a delegation, skip the
open request if the mode is not covered by the delegation - the open_stateid
must already cover that mode, so there is nothing to do.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Make it so, by checking the return value for NFS4ERR_MOTSUPP and
caching the information as a server capability.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
According to the spec, the server is only returning the status,
which we decode in the op header.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
hdr->good_bytes needs to be set to the length of the request, not
zero.
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This patch ensures that we record the value of 'ffl_flags' from
the layout, and then checks for the presence of the
FF_FLAGS_NO_LAYOUTCOMMIT flag before deciding whether or not to
call pnfs_set_layoutcommit().
The effect is that servers now can decide whether or not they want
the client to call layoutcommit before returning a writeable layout.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
* layoutstats:
pnfs/flexfiles: protect ktime manipulation with mirror lock
nfs: provide pnfs_report_layoutstat when NFS42 is disabled
pnfs/flexfiles: report layoutstat regularly
nfs42: serialize LAYOUTSTATS calls of the same file
pnfs/flexfiles: encode LAYOUTSTATS flexfiles specific data
pnfs/flexfiles: add ff_layout_prepare_layoutstats
pNFS/flexfiles: track when layout is first used
pNFS/flexfiles: add layoutstats tracking
pNFS/flexfiles: Remove unused struct members user_name, group_name
pnfs: add pnfs_report_layoutstat helper function
pNFS: fill in nfs42_layoutstat_ops
NFSv.2/pnfs Add a LAYOUTSTATS rpc function
|
|
It looks as if xchg() and cmpxchg() are not available for 64-bit integers on sparc32:
> New breakage seen in linux-next today:
>
> ERROR: "__xchg_called_with_bad_pointer" [fs/nfs/flexfilelayout/nfs_layout_flexfiles.ko] undefined!
> ERROR: "__cmpxchg_called_with_bad_pointer" [fs/nfs/flexfilelayout/nfs_layout_flexfiles.ko] undefined!
> make[2]: *** [__modpost] Error 1
> make[1]: *** [modules] Error 2
Given that mirror ktime manipulation is already under mirror->lock, let's make use of the fact.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
kbuild test robot reported:
fs/built-in.o: In function `pnfs_report_layoutstat':
>> (.text+0x151a1c): undefined reference to `nfs42_proc_layoutstats_generic'
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Commit 9597c13b forbade opens with O_APPEND|O_DIRECT for NFSv4:
nfs: verify open flags before allowing an atomic open
Currently, you can open a NFSv4 file with O_APPEND|O_DIRECT, but cannot
fcntl(F_SETFL,...) with those flags. This flag combination is explicitly
forbidden on NFSv3 opens, and it seems like it should also be on NFSv4.
However, you can still open a file with O_DIRECT|O_APPEND if there exists a
cached dentry for the file because nfs4_file_open() is used instead of
nfs_atomic_open() and the check is bypassed. Add the check in
nfs4_file_open() as well.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
A ds can be associated with more than one mirror, but we currently skip
setting a mirror's credentials if we find that it's already set up with
a connected client.
The upshot is that we can end up sending DS writes with MDS credentials
instead of properly setting them up. Fix nfs4_ff_layout_prepare_ds to
always verify that the mirror's credentials are set up, even when we
have a DS that's already connected.
Reported-by: Tom Haynes <thomas.haynes@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If we have two tasks racing to update a mirror's credentials, then they
can end up leaking one (or more) sets of credentials. The first task
will set mirror->cred and then the second task will just overwrite it.
Use a cmpxchg to ensure that the creds are only set once. If we get to
the point where we would set mirror->cred and find that they're already
set, then we just release the creds that were just found.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Pull cgroup writeback support from Jens Axboe:
"This is the big pull request for adding cgroup writeback support.
This code has been in development for a long time, and it has been
simmering in for-next for a good chunk of this cycle too. This is one
of those problems that has been talked about for at least half a
decade, finally there's a solution and code to go with it.
Also see last weeks writeup on LWN:
http://lwn.net/Articles/648292/"
* 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
writeback, blkio: add documentation for cgroup writeback support
vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
writeback: do foreign inode detection iff cgroup writeback is enabled
v9fs: fix error handling in v9fs_session_init()
bdi: fix wrong error return value in cgwb_create()
buffer: remove unusued 'ret' variable
writeback: disassociate inodes from dying bdi_writebacks
writeback: implement foreign cgroup inode bdi_writeback switching
writeback: add lockdep annotation to inode_to_wb()
writeback: use unlocked_inode_to_wb transaction in inode_congested()
writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
writeback: implement [locked_]inode_to_wb_and_lock_list()
writeback: implement foreign cgroup inode detection
writeback: make writeback_control track the inode being written back
writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
writeback: implement memcg writeback domain based throttling
writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
writeback: implement memcg wb_domain
writeback: update wb_over_bg_thresh() to use wb_domain aware operations
...
|
|
As a simple scheme, report every minute if IO is still going on.
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
There is no need to report concurrently.
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|