Age | Commit message (Collapse) | Author | Files | Lines |
|
Since we're tracking modifications to the page cache on a per-page
basis, it makes sense to express the limit to how much we may cache
in units of pages.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This reverts commit f895c53f8ace3c3e49ebf9def90e63fc6d46d2bf.
This commit causes a NFSv4 regression in that close()+unlink() can end
up failing. The reason is that we no longer have a guarantee that the
CLOSE has completed on the server, meaning that the subsequent call to
REMOVE may fail with NFS4ERR_FILE_OPEN if the server implements Windows
unlink() semantics.
Reported-by: <Olga Kornievskaia <aglo@umich.edu>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If there is no cached data, then there is no need to track the file
change attribute on close.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
When I/O cannot complete due to a fatal error on the DS, ensure that we
invalidate the corresponding layout segment and return it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The return value from scnprintf always less than the buffer length.
So, result >= len always false. This patch removes those checking.
int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
{
int i;
i = vsnprintf(buf, size, fmt, args);
if (likely(i < size))
return i;
if (size != 0)
return size - 1;
return 0;
}
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The length of "Linux NFSv4.0 " is 14, not 10.
Without this patch, I get a truncated client owner id as,
"Linux NFSv4.0 ::1/::1"
With this patch,
"Linux NFSv4.0 ::1/::1 tcp"
Fixes: a319268891 ("nfs: make nfs4_init_nonuniform_client_string use a dynamically allocated buffer")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If a read-write layout has an invalid mirror, then we should
mark it as invalid, and return it.
If a read-only layout has an invalid mirror, then mark it as invalid
and check if there is still at least one valid mirror before we return
it.
Note: Also fix incorrect use of pnfs_generic_mark_devid_invalid().
We really want nfs4_mark_deviceid_unavailable().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Unlike read layouts, the writeable layout cannot fall back to using only
one of the mirrors. It need to write to all of them.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Unlike the files layout, flexfiles does not test for the NFS_DEVICEID_INVALID
flag. Instead it relies on NFS_DEVICEID_UNAVAILABLE.
Fix is to replace with nfs4_mark_deviceid_unavailable().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Mirrors are now shared objects, so we should not be freeing them directly
inside ff_layout_free_lseg(). We should already be doing the right thing
in _ff_layout_free_lseg(), so just let it handle things.
Also ensure that ff_layout_free_mirror() frees the RPC credential if it
is set.
Fixes: 28a0d72c6867 ("Add refcounting to struct nfs4_ff_layout_mirror")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If we have a read layout, then sanity check the minimal layout length
so that it does not extend beyond the end of file.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
According to RFC5661 section 18.43.3, if the server cannot satisfy
the loga_minlength argument to LAYOUTGET, there are 2 cases:
1) If loga_minlength == 0, it returns NFS4ERR_LAYOUTTRYLATER
2) If loga_minlength != 0, it returns NFS4ERR_BADLAYOUT
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
According to RFC5661 Section 18.2.4, CLOSE is supposed to return
the zero stateid. This means that nfs_clear_open_stateid_locked()
cannot assume that the result stateid will always match the 'other'
field of the existing open stateid when trying to determine a race
with a parallel OPEN.
Instead, we look at the argument, and check for matches.
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If the file was fenced and/or has been deleted on the DS, then we want
to retry pNFS after a layoutreturn with error report. If the server
cannot fix the problem, then we rely on it to tell us so in the
response to the LAYOUTGET.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The "FIXME" is outdated. Flexfiles does add a payload.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
According to the flexfiles protocol, the layoutreturn should specify an
array of errors in the following format:
struct ff_ioerr4 {
offset4 ffie_offset;
length4 ffie_length;
stateid4 ffie_stateid;
device_error4 ffie_errors<>;
};
This patch fixes up the code to ensure that our ffie_errors is indeed
encoded as an array (albeit with only a single entry).
Reported-by: Tom Haynes <thomas.haynes@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Client sends a SETATTR request after OPEN for updating attributes.
For create file with S_ISGID is set, the S_ISGID in SETATTR will be
ignored at nfs server as chmod of no PERMISSION.
v3, same as v2.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Create file with attributs as NFS4_CREATE_EXCLUSIVE4_1 mode
depends on suppattr_exclcreat attribut.
v3, same as v2.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Check opened, only update it when non-NULL.
It's not needs define an unused value for the opened
when calling _nfs4_do_open.
v3, same as v2.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Set rlimit for NFS's files is useless right now.
For local process's rlimit, it should be checked by nfs client.
The same, CIFS also call inode_change_ok checking rlimit at its client
in cifs_setattr_nounix() and cifs_setattr_unix().
v3, fix bad using of error
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
It's not sufficient to just mark the layout segment for layout return. We
also need to set the NFS_LAYOUT_RETURN_BEFORE_CLOSE flag in the layout header.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
In order to ensure atomicity of updates, we merge the old layout segments
into the new ones, and then invalidate the old ones.
Also ensure that we order the list of layout segments so that
RO segments are preferred over RW.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This is needed in order to allow merging of contiguous layout segments,
and also to correct the ordering of layouts for those device drivers that
don't necessarily want to place the read-write layouts first.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Eliminate a couple of holes in the structure, and move the 2 atomics
into the same cacheline.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Allow advanced users to set the layoutstats timer in order to lengthen
or shorten the period between layoutstat transmissions to the server.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Keep the full list of mirrors in the struct nfs4_ff_layout_mirror so that
they can be shared among the layout segments that use them.
Also ensure that we send out only one copy of the layoutstats per mirror.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
When we start sharing mirrors between several lsegs, we won't be able to
keep it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
We do want to share mirrors between layout segments, so add a refcount
to enable that.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
We do not want to update inode attributes with DS values.
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If we return delegation before closing, we fail to do roc check
during close because NFS_LAYOUT_ROC is cleared by delegreturn
and it causes layouts to be still hanging around after delegreturn
+ close, which is a voilation against protocol.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Only support for single file layoutrecall for now.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Allow tracing of return-on-close.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If the ctime or mtime or change attribute have changed because
of an operation we initiated, we should make sure that we force
an attribute update. However we do not want to mark the page cache
for revalidation.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.0+
|
|
Make sure that we also handle RPC level connection and protocol
negotiation errors.
Reported-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
We want to ensure that the stopwatches for the busy timer and the
aggregate timer are consistent. This means that they need to use
the same start/stop times.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
We want to move commiting pages to pages list instead.
Otherwise it causes pnfs small writes crash like:
[34560.037692] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
[34560.038557] IP: [<ffffffffa05423d6>] nfs_init_commit+0x26/0x130 [nfs]
[34560.039400] PGD 69f5a067 PUD 69f59067 PMD 0
[34560.040207] Oops: 0000 [#1] SMP
[34560.041014] Modules linked in: nfsv3(OE) nfs_layout_flexfiles(OE) nfsv4(OE) nfs(OE) fscache(E) rpcsec_gss_krb5(E) xt_addrtype(E) xt_conntrack(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) iptable_filter(E) ip_tables(E) x_tables(E) nf_nat(E) nf_conntrack(E) bridge(E) stp(E) llc(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) ppdev(E) vmw_balloon(E) coretemp(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) cryptd(E) psmouse(E) serio_raw(E) vmw_vmci(E) i2c_piix4(E) shpchp(E) parport_pc(E) parport(E) mac_hid(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) xfs(E) libcrc32c(E) hid_generic(E) usbhid(E) hid(E) e1000(E) mptspi(E)
[34560.045106] mptscsih(E) mptbase(E) vmwgfx(E) drm_kms_helper(E) ttm(E) drm(E) autofs4(E) [last unloaded: fscache]
[34560.045897] CPU: 0 PID: 130543 Comm: bash Tainted: G OE 4.2.0-rc5-dp-00057-gf993a93 #11
[34560.046699] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[34560.047525] task: ffff880031b0a980 ti: ffff880045fec000 task.ti: ffff880045fec000
[34560.048264] RIP: 0010:[<ffffffffa05423d6>] [<ffffffffa05423d6>] nfs_init_commit+0x26/0x130 [nfs]
[34560.049000] RSP: 0018:ffff880045fefc18 EFLAGS: 00010246
[34560.049717] RAX: 0000000000000000 RBX: ffff8800208fbc80 RCX: ffff880045fefd50
[34560.050396] RDX: ffff880031c19ec0 RSI: ffff880045fefc88 RDI: ffff8800208fbc80
[34560.051041] RBP: ffff880045fefc28 R08: ffff8800208fbe68 R09: ffff880045fefc88
[34560.051666] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880045fefc78
[34560.052247] R13: ffff880045fefc88 R14: ffff880045fefa90 R15: ffff880045fefd50
[34560.052825] FS: 00007fa02d58c740(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000
[34560.053410] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[34560.053992] CR2: 0000000000000068 CR3: 000000003b37a000 CR4: 00000000001406f0
[34560.054615] Stack:
[34560.055200] ffff8800208fbc80 ffff8800208fbc80 ffff880045fefcc8 ffffffffa05c1a5b
[34560.055800] ffff880045fefcc8 ffff880045fefd50 0000000045fefcb8 ffff880045fefd40
[34560.056418] ffff8800420608e0 ffffffffa04f3910 0000000100000001 ffff880045fefd50
[34560.057013] Call Trace:
[34560.057672] [<ffffffffa05c1a5b>] pnfs_generic_commit_pagelist+0x1cb/0x300 [nfsv4]
[34560.058277] [<ffffffffa04f3910>] ? ff_layout_commit_pagelist+0x20/0x20 [nfs_layout_flexfiles]
[34560.058907] [<ffffffffa04f3905>] ff_layout_commit_pagelist+0x15/0x20 [nfs_layout_flexfiles]
[34560.059557] [<ffffffffa0543fc1>] nfs_generic_commit_list+0xb1/0xf0 [nfs]
[34560.060214] [<ffffffffa0543e47>] ? nfs_scan_commit+0x37/0xa0 [nfs]
[34560.060825] [<ffffffffa0544081>] nfs_commit_inode+0x81/0x150 [nfs]
[34560.061432] [<ffffffffa05443ae>] nfs_wb_all+0x1ae/0x400 [nfs]
[34560.062035] [<ffffffffa05380ad>] nfs_getattr+0x33d/0x510 [nfs]
[34560.062630] [<ffffffff8122499c>] vfs_getattr_nosec+0x2c/0x40
[34560.063223] [<ffffffff81224a66>] vfs_getattr+0x26/0x30
[34560.063818] [<ffffffff81224b35>] vfs_fstatat+0x65/0xa0
[34560.064413] [<ffffffff81224f3f>] SYSC_newstat+0x1f/0x40
[34560.065016] [<ffffffff8102b176>] ? do_audit_syscall_entry+0x66/0x70
[34560.065626] [<ffffffff8102c773>] ? syscall_trace_enter_phase1+0x113/0x170
[34560.066245] [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
[34560.066868] [<ffffffff812251ae>] SyS_newstat+0xe/0x10
[34560.067533] [<ffffffff817a5df2>] entry_SYSCALL_64_fastpath+0x16/0x7a
[34560.068173] Code: 0f 1f 44 00 00 0f 1f 44 00 00 55 4c 8d 87 e8 01 00 00 48 89 e5 53 48 89 fb 48 83 ec 08 4c 8b 0e 49 8b 41 18 4c 39 ce 48 8b 40 40 <4c> 8b 50 68 74 24 48 8b 87 e8 01 00 00 48 8b 7e 08 4d 89 41 08
[34560.069609] RIP [<ffffffffa05423d6>] nfs_init_commit+0x26/0x130 [nfs]
[34560.070295] RSP <ffff880045fefc18>
[34560.071008] CR2: 0000000000000068
[34560.073207] ---[ end trace f85f873260977406 ]---
[fixes 27571297a7e(pNFS: Tighten up locking around DS commit buckets)]
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Unlike the previous attempt, this takes into account the fact that
we may be calling it from the recovery thread itself. Detect this
by looking at what kind of open we're doing, and checking the state
of the NFS_DELEGATION_NEED_RECLAIM if it turns out we're doing a
reboot reclaim-type open.
Cc: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Otherwise we break fstest case tests/read_write/mctime.t
Does files layout need the same fix as well?
Cc: stable@vger.kernel.org # v4.0+
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If layout is marked by NFS_LAYOUT_RETURN_BEFORE_CLOSE, we should always
send LAYOUTRETURN before close, and we don't need to do ROC drain if we
do send LAYOUTRETURN.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This reverts commit 4e379d36c050b0117b5d10048be63a44f5036115.
This commit opens up a race between the recovery code and the open code.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Cc: stable@vger.kernel # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If we have an OPEN_DOWNGRADE and CLOSE race with one another, we want
to ensure that the layout is forgotten by the client, so that we
start afresh with a new layoutget.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The helper pnfs_roc() has already verified that we have no delegations,
and no further open files, hence no outstanding I/O and it has marked
all the return-on-close lsegs as being invalid.
Furthermore, it sets the NFS_LAYOUT_RETURN bit, thus serialising the
close/delegreturn with all future layoutget calls on this inode.
The checks in pnfs_roc_drain() for valid layout segments are therefore
redundant: those cannot exist until another layoutget completes.
The other check for whether or not NFS_LAYOUT_RETURN is set, actually
causes a hang, since we already know that we hold that flag.
To fix, we therefore strip out all the functionality in pnfs_roc_drain()
except the retrieval of the barrier state, and then rename the function
accordingly.
Reported-by: Christoph Hellwig <hch@infradead.org>
Fixes: 5c4a79fb2b1c ("Don't prevent layoutgets when doing return-on-close")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|