summaryrefslogtreecommitdiffstats
path: root/fs/nfs/pnfs.c
AgeCommit message (Collapse)AuthorFilesLines
2016-08-23pNFS: The client must not do I/O to the DS if it's lease has expiredTrond Myklebust1-0/+1
Ensure that the client conforms to the normative behaviour described in RFC5661 Section 12.7.2: "If a client believes its lease has expired, it MUST NOT send I/O to the storage device until it has validated its lease." So ensure that we wait for the lease to be validated before using the layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v3.20+
2016-08-19pNFS: Handle NFS4ERR_OLD_STATEID correctly in LAYOUTSTAT callsTrond Myklebust1-1/+0
We normally want to update the stateid and then retry, Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24Merge branch 'pnfs'Trond Myklebust1-62/+89
2016-07-24Merge branch 'writeback'Trond Myklebust1-1/+4
2016-07-24pNFS: Remove redundant smp_mb() from pnfs_init_lseg()Trond Myklebust1-1/+0
It's not visible yet, and won't be until after we grab the inode->i_lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: Cleanup - do layout segment initialisation in one placeTrond Myklebust1-4/+6
...instead of splitting the initialisation over init_lseg() and pnfs_layout_process(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: Remove redundant stateid invalidationTrond Myklebust1-1/+0
The layout stateid will be invalidated once it holds no more layout segments anyway. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: Remove redundant pnfs_mark_layout_returned_if_empty()Trond Myklebust1-1/+0
That's already being taken care of in pnfs_layout_remove_lseg(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: Clear the layout metadata if the server changed the layout stateidTrond Myklebust1-1/+1
If the server changed the layout stateid's "other" field, then we should treat the old layout as being completely gone. In that case, we want to clear the metadata such as scheduled layoutreturns. Do this by calling pnfs_mark_layout_stateid_invalid(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid()Trond Myklebust1-1/+1
Ensure nfs42_layoutstat_done() layoutget don't open code layout stateid invalidation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence idTrond Myklebust1-14/+23
When determining which layout segments to return, we do want pnfs_mark_matching_lsegs_return to check that they match the layout sequence id. This ensures that we don't waste time if the server is replaying a layout recall that has already been satisfied. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: Do not set plh_return_seq for non-callback related layoutreturnsTrond Myklebust1-7/+6
In cases where we need to send a layoutreturn in order to propagate an error, we should not tie that to a specific layout stateid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: Ensure layoutreturn acts as a completion for layout callbacksTrond Myklebust1-15/+25
When we return NFS_OK to the CB_LAYOUTRECALL, we are required to send a layoutreturn that "completes" that layout recall request, using the correct stateid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: Always update the layout barrier seqid on LAYOUTGETTrond Myklebust1-13/+14
Currently, pnfs_set_layout_stateid() will update the layout sequence id barrier only if the stateid itself is newer than the current layout stateid. However in a situation where multiple LAYOUTGET calls and a LAYOUTRETURN raced, it is entirely possible for one of the LAYOUTGET to set the current stateid to something newer than the LAYOUTRETURN that needs to set the barrier. The fix is to allow the "update_barrier" flag to force a check as to whether or not the barrier needs to be updated. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is setTrond Myklebust1-1/+1
If the layout stateid is invalid, then pnfs_set_layout_stateid() must always initialise it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24pNFS: Clear the layout return tracking on layout reinitialisationTrond Myklebust1-5/+14
Ensure that we don't carry over layoutreturn info from a previous incarnation of this layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-19pNFS: Handle NFS4ERR_RECALLCONFLICT correctly in LAYOUTGETTrond Myklebust1-2/+11
Instead of giving up altogether and falling back to doing I/O through the MDS, which may make the situation worse, wait for 2 lease periods for the callback to resolve itself, and then try destroying the existing layout. Only if this was an attempt at getting a first layout, do we give up altogether, as the server is clearly crazy. Fixes: 183d9e7b112aa ("pnfs: rework LAYOUTGET retry handling") Cc: stable@vger.kernel.org # 4.7 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-07-19pNFS: Separate handling of NFS4ERR_LAYOUTTRYLATER and RECALLCONFLICTTrond Myklebust1-0/+1
They are not the same error, and need to be handled differently. Fixes: 183d9e7b112aa ("pnfs: rework LAYOUTGET retry handling") Cc: stable@vger.kernel.org # 4.7 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-07-19pNFS: Fix post-layoutget error handling in pnfs_update_layout()Trond Myklebust1-10/+11
The non-retry error path is currently broken and ends up releasing the reference to the layout twice. It also can end up clearing the NFS_LAYOUT_FIRST_LAYOUTGET flag twice, causing a race. In addition, the retry path will fail to decrement the plh_outstanding counter. Fixes: 183d9e7b112aa ("pnfs: rework LAYOUTGET retry handling") Cc: stable@vger.kernel.org # 4.7 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-07-05pNFS: Files and flexfiles always need to commit before layoutcommitTrond Myklebust1-1/+4
So ensure that we mark the layout for commit once the write is done, and then ensure that the commit to ds is finished before sending layoutcommit. Note that by doing this, we're able to optimise away the commit for the case of servers that don't need layoutcommit in order to return updated attributes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-24NFSv4.1/pnfs: Mark the layout stateid invalid when all segments are removedTrond Myklebust1-1/+3
According to RFC5661, section 12.5.3. the layout stateid is no longer valid once the client no longer holds any layout segments. Ensure that we mark it invalid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24NFSv4.1/pnfs: Add sparse lock annotations for pnfs_find_alloc_layoutTrond Myklebust1-0/+2
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24NFSv4.1/pnfs: Layout stateids start out as being invalidTrond Myklebust1-2/+2
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-26pnfs: pnfs_update_layout needs to consider if strict iomode checking is onTom Haynes1-12/+22
As flexfiles has FF_FLAGS_NO_READ_IO, there is a need to generically support enforcing that a IOMODE_RW segment will not allow READ I/O. Signed-off-by: Tom Haynes <loghyr@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: make pnfs_layout_process more robustJeff Layton1-16/+11
It can return NULL if layoutgets are blocked currently. Fix it to return -EAGAIN in that case, so we can properly handle it in pnfs_update_layout. Also, clean up and simplify the error handling -- eliminate "status" and just use "lseg". Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: rework LAYOUTGET retry handlingJeff Layton1-68/+76
There are several problems in the way a stateid is selected for a LAYOUTGET operation: We pick a stateid to use in the RPC prepare op, but that makes it difficult to serialize LAYOUTGETs that use the open stateid. That serialization is done in pnfs_update_layout, which occurs well before the rpc_prepare operation. Between those two events, the i_lock is dropped and reacquired. pnfs_update_layout can find that the list has lsegs in it and not do any serialization, but then later pnfs_choose_layoutget_stateid ends up choosing the open stateid. This patch changes the client to select the stateid to use in the LAYOUTGET earlier, when we're searching for a usable layout segment. This way we can do it all while holding the i_lock the first time, and ensure that we serialize any LAYOUTGET call that uses a non-layout stateid. This also means a rework of how LAYOUTGET replies are handled, as we must now get the latest stateid if we want to retransmit in response to a retryable error. Most of those errors boil down to the fact that the layout state has changed in some fashion. Thus, what we really want to do is to re-search for a layout when it fails with a retryable error, so that we can avoid reissuing the RPC at all if possible. While the LAYOUTGET RPC is async, the initiating thread always waits for it to complete, so it's effectively synchronous anyway. Currently, when we need to retry a LAYOUTGET because of an error, we drive that retry via the rpc state machine. This means that once the call has been submitted, it runs until it completes. So, we must move the error handling for this RPC out of the rpc_call_done operation and into the caller. In order to handle errors like NFS4ERR_DELAY properly, we must also pass a pointer to the sliding timeout, which is now moved to the stack in pnfs_update_layout. The complicating errors are -NFS4ERR_RECALLCONFLICT and -NFS4ERR_LAYOUTTRYLATER, as those involve a timeout after which we give up and return NULL back to the caller. So, there is some special handling for those errors to ensure that the layers driving the retries can handle that appropriately. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: lift retry logic from send_layoutget to pnfs_update_layoutJeff Layton1-36/+36
If we get back something like NFS4ERR_OLD_STATEID, that will be translated into -EAGAIN, and the do/while loop in send_layoutget will drive the call again. This is not quite what we want, I think. An error like that is a sign that something has changed. That something could have been a concurrent LAYOUTGET that would give us a usable lseg. Lift the retry logic into pnfs_update_layout instead. That allows us to redo the layout search, and may spare us from having to issue an RPC. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: fix bad error handling in send_layoutgetJeff Layton1-3/+8
Currently, the code will clear the fail bit if we get back a fatal error. I don't think that's correct -- we want to clear that bit if we do not get a fatal error. Fixes: 0bcbf039f6 (nfs: handle request add failure properly) Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN argsJeff Layton1-22/+42
LAYOUTRETURN is "special" in that servers and clients are expected to work with old stateids. When the client sends a LAYOUTRETURN with an old stateid in it then the server is expected to only tear down layout segments that were present when that seqid was current. Ensure that the client handles its accounting accordingly. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: keep track of the return sequence number in pnfs_layout_hdrJeff Layton1-3/+8
When we want to selectively do a LAYOUTRETURN, we need to specify a stateid that represents most recent layout acquisition that is to be returned. When we mark a layout stateid to be returned, we update the return sequence number in the layout header with that value, if it's newer than the existing one. Then, when we go to do a LAYOUTRETURN on layout header put, we overwrite the seqid in the stateid with the saved one, and then zero it out. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: record sequence in pnfs_layout_segment when it's createdJeff Layton1-0/+1
In later patches, we're going to teach the client to be more selective about how it returns layouts. This means keeping a record of what the stateid's seqid was at the time that the server handed out a layout segment. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pNFS: Fix a leaked layoutstats flagTrond Myklebust1-1/+2
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09pnfs: set NFS_IOHDR_REDO in pnfs_read_resend_pnfsWeston Andros Adamson1-6/+8
Like other resend paths, mark the (old) hdr as NFS_IOHDR_REDO. This ensures the hdr completion function will not count the (old) hdr as good bytes. Also, vector the error back through the hdr->task.tk_status like other retry calls. This fixes a bug with the FlexFiles layout where libaio was reporting more bytes read than requested. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-04-04mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov1-3/+3
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-22NFSv4.x/pnfs: Fix a race between layoutget and bulk recallsTrond Myklebust1-11/+6
Replace another case where the layout 'plh_block_lgets' can trigger infinite loops in send_layoutget(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-22NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layoutTrond Myklebust1-2/+22
If the server reboots while there is a layoutget outstanding, then the call to pnfs_choose_layoutget_stateid() will fail with an EAGAIN error, which causes an infinite loop in send_layoutget(). The reason why we never break out of the loop is that the layout 'plh_block_lgets' field is never cleared. Fix is to replace plh_block_lgets with NFS_LAYOUT_INVALID_STID, which can be reset after a new layoutget. Fixes: ab7d763e477c5 ("pNFS: Ensure nfs4_layoutget_prepare returns...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-15pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomodeTrond Myklebust1-2/+1
When setting the layout return mode, we must always also set the NFS_LAYOUT_RETURN_REQUESTED flag to ensure that we send a layoutreturn. Otherwise pnfs_error_mark_layout_for_return() could set the mode, but fail to send the layoutreturn because another is already in flight. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-15pNFS: Fix pnfs_mark_matching_lsegs_return()Trond Myklebust1-2/+13
We don't need to schedule a layoutreturn if the layout segment can be freed immediately. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-27NFS: Cleanup - rename NFS_LAYOUT_RETURN_BEFORE_CLOSETrond Myklebust1-5/+5
NFS_LAYOUT_RETURN_BEFORE_CLOSE is being used to signal that a layoutreturn is needed, either due to a layout recall or to a layout error. Rename it to NFS_LAYOUT_RETURN_REQUESTED in order to clarify its purpose. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-26pNFS: Fix missing layoutreturn callsTrond Myklebust1-62/+56
The layoutreturn code currently relies on pnfs_put_lseg() to initiate the RPC call when conditions are right. A problem arises when we want to free the layout segment from inside an inode->i_lock section (e.g. in pnfs_clear_request_commit()), since we cannot sleep. The workaround is to move the actual call to pnfs_send_layoutreturn() to pnfs_put_layout_hdr(), which doesn't have this restriction. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04Merge branch 'pnfs_generic'Trond Myklebust1-27/+55
* pnfs_generic: NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid() NFSv4.1/pNFS: Fix a race in initiate_file_draining() NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn() NFS: Relax requirements in nfs_flush_incompatible NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid NFS: Allow multiple commit requests in flight per file NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs NFSv4: List stateid information in the callback tracepoints NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1 pNFS: If we have to delay the layout callback, mark the layout for return NFSv4.1/pNFS: Add a helper to mark the layout as returned pNFS: Ensure nfs4_layoutget_prepare returns the correct error
2016-01-04NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range argumentsTrond Myklebust1-3/+3
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structuresTrond Myklebust1-2/+2
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()Trond Myklebust1-5/+5
Make it more obvious what we're returning... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layoutTrond Myklebust1-6/+20
Fix a bug whereby if all the layout segments could be immediately freed, the call to pnfs_error_mark_layout_for_return() would never result in a layoutreturn. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomodeTrond Myklebust1-4/+12
If pnfs_mark_matching_lsegs_return() needs to mark a layout segment for return, then it must also set the return iomode. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateidsTrond Myklebust1-3/+3
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()Trond Myklebust1-6/+6
A stateid is a structure, pass it as a pointer. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-31NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalidTrond Myklebust1-0/+3
If the layout segment is invalid, then we should not be adding more write requests to the commit list. Instead, those writes should be replayed after requesting a new layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28pNFS: If we have to delay the layout callback, mark the layout for returnTrond Myklebust1-1/+3
If the client needs to delay the layout callback, then speed up the recall process by marking the remaining layout segments to be actively returned by the client. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>