summaryrefslogtreecommitdiffstats
path: root/fs/ceph/caps.c
AgeCommit message (Collapse)AuthorFilesLines
2021-08-25ceph: correctly handle releasing an embedded cap flushXiubo Li1-8/+13
The ceph_cap_flush structures are usually dynamically allocated, but the ceph_cap_snap has an embedded one. When force umounting, the client will try to remove all the session caps. During this, it will free them, but that should not be done with the ones embedded in a capsnap. Fix this by adding a new boolean that indicates that the cap flush is embedded in a capsnap, and skip freeing it if that's set. At the same time, switch to using list_del_init() when detaching the i_list and g_list heads. It's possible for a forced umount to remove these objects but then handle_cap_flushsnap_ack() races in and does the list_del_init() again, corrupting memory. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/52283 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-08-04ceph: reduce contention in ceph_check_delayed_caps()Luis Henriques1-1/+16
Function ceph_check_delayed_caps() is called from the mdsc->delayed_work workqueue and it can be kept looping for quite some time if caps keep being added back to the mdsc->cap_delay_list. This may result in the watchdog tainting the kernel with the softlockup flag. This patch breaks this loop if the caps have been recently (i.e. during the loop execution). Any new caps added to the list will be handled in the next run. Also, allow schedule_delayed() callers to explicitly set the delay value instead of defaulting to 5s, so we can ensure that it runs soon afterward if it looks like there is more work. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/46284 Signed-off-by: Luis Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-06-29ceph: eliminate ceph_async_iput()Jeff Layton1-6/+3
Now that we don't need to hold session->s_mutex or the snap_rwsem when calling ceph_check_caps, we can eliminate ceph_async_iput and just use normal iput calls. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-06-29ceph: don't take s_mutex in ceph_flush_snapsJeff Layton1-10/+3
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-06-29ceph: don't take s_mutex in try_flush_capsJeff Layton1-14/+2
The s_mutex doesn't protect anything in this codepath. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-06-29ceph: don't take s_mutex or snap_rwsem in ceph_check_capsJeff Layton1-61/+11
These locks appear to be completely unnecessary. Almost all of this function is done under the inode->i_ceph_lock, aside from the actual sending of the message. Don't take either lock in this function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-06-29ceph: eliminate session->s_gen_ttl_lockJeff Layton1-9/+6
Turn s_cap_gen field into an atomic_t, and just rely on the fact that we hold the s_mutex when changing the s_cap_ttl field. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-05-06Merge tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds1-18/+9
Pull ceph updates from Ilya Dryomov: "Notable items here are - a series to take advantage of David Howells' netfs helper library from Jeff - three new filesystem client metrics from Xiubo - ceph.dir.rsnaps vxattr from Yanhu - two auth-related fixes from myself, marked for stable. Interspersed is a smattering of assorted fixes and cleanups across the filesystem" * tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-client: (24 commits) libceph: allow addrvecs with a single NONE/blank address libceph: don't set global_id until we get an auth ticket libceph: bump CephXAuthenticate encoding version ceph: don't allow access to MDS-private inodes ceph: fix up some bare fetches of i_size ceph: convert some PAGE_SIZE invocations to thp_size() ceph: support getting ceph.dir.rsnaps vxattr ceph: drop pinned_page parameter from ceph_get_caps ceph: fix inode leak on getattr error in __fh_to_dentry ceph: only check pool permissions for regular files ceph: send opened files/pinned caps/opened inodes metrics to MDS daemon ceph: avoid counting the same request twice or more ceph: rename the metric helpers ceph: fix kerneldoc copypasta over ceph_start_io_direct ceph: use attach/detach_page_private for tracking snap context ceph: don't use d_add in ceph_handle_snapdir ceph: don't clobber i_snap_caps on non-I_NEW inode ceph: fix fall-through warnings for Clang ceph: convert ceph_readpages to ceph_readahead ceph: convert ceph_write_begin to netfs_write_begin ...
2021-04-27ceph: fix up some bare fetches of i_sizeJeff Layton1-3/+3
We need to use i_size_read(), which properly handles the torn read case on 32-bit arches. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: drop pinned_page parameter from ceph_get_capsJeff Layton1-6/+5
All of the existing callers that don't set this to NULL just drop the page reference at some arbitrary point later in processing. There's no point in keeping a page reference that we don't use, so just drop the reference immediately after checking the Uptodate flag. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: fix fscache invalidationJeff Layton1-0/+1
Ensure that we invalidate the fscache whenever we invalidate the pagecache. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: rip out old fscache readpage handlingJeff Layton1-9/+0
With the new netfs read helper functions, we won't need a lot of this infrastructure as it handles the pagecache pages itself. Rip out the read handling for now, and much of the old infrastructure that deals in individual pages. The cookie handling is mostly unchanged, however. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-03-08ceph: don't allow type or device number to change on non-I_NEW inodesJeff Layton1-1/+7
Al pointed out that a malicious or broken MDS could change the type or device number of a given inode number. It may also be possible for the MDS to reuse an old inode number. Ensure that we never allow fill_inode to change the type part of the i_mode or the i_rdev unless I_NEW is set. Throw warnings if the MDS ever changes these on us mid-stream, and return an error. Don't set i_rdev directly, and rely on init_special_inode to do it. Also, fix up error handling in the callers of ceph_get_inode. In handle_cap_grant, check for and warn if the inode type changes, and only overwrite the mode if it didn't. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-02-16ceph: defer flushing the capsnap if the Fb is usedXiubo Li1-13/+20
If the Fb cap is used it means the current inode is flushing the dirty data to OSD, just defer flushing the capsnap. URL: https://tracker.ceph.com/issues/48640 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-02-16ceph: allow queueing cap/snap handling after putting cap referencesJeff Layton1-4/+25
Testing with the fscache overhaul has triggered some lockdep warnings about circular lock dependencies involving page_mkwrite and the mmap_lock. It'd be better to do the "real work" without the mmap lock being held. Change the skip_checking_caps parameter in __ceph_put_cap_refs to an enum, and use that to determine whether to queue check_caps, do it synchronously or not at all. Change ceph_page_mkwrite to do a ceph_put_cap_refs_async(). Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-02-16ceph: fix flush_snap logic after putting capsJeff Layton1-4/+6
A primary reason for skipping ceph_check_caps after putting the references was to avoid the locking in ceph_check_caps during a reconnect. __ceph_put_cap_refs can still call ceph_flush_snaps in that case though, and that takes many of the same inconvenient locks. Fix the logic in __ceph_put_cap_refs to skip flushing snaps when the skip_checking_caps flag is set. Fixes: e64f44a88465 ("ceph: skip checking caps when session reconnecting and releasing reqs") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14ceph: fix race in concurrent __ceph_remove_cap invocationsLuis Henriques1-2/+9
A NULL pointer dereference may occur in __ceph_remove_cap with some of the callbacks used in ceph_iterate_session_caps, namely trim_caps_cb and remove_session_caps_cb. Those callers hold the session->s_mutex, so they are prevented from concurrent execution, but ceph_evict_inode does not. Since the callers of this function hold the i_ceph_lock, the fix is simply a matter of returning immediately if caps->ci is NULL. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/43272 Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Luis Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14ceph: fix up some warnings on W=1 buildsJeff Layton1-7/+4
Convert some decodes into unused variables into skips, and fix up some non-kerneldoc comment headers to not start with "/**". Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14ceph: add new RECOVER mount_state when recovering sessionJeff Layton1-1/+1
When recovering a session (a'la recover_session=clean), we want to do all of the operations that we do on a forced umount, but changing the mount state to SHUTDOWN is can cause queued MDS requests to fail when the session comes back. Most of those can idle until the session is recovered in this situation. Reserve SHUTDOWN state for forced umount, and make a new RECOVER state for the forced reconnect situation. Change several tests for equality with SHUTDOWN to test for that or RECOVER. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14ceph: don't WARN when removing caps due to blocklistingJeff Layton1-1/+2
We expect to remove dirty caps when the client is blocklisted. Don't throw a warning in that case. [ idryomov: break unnecessarily long line ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-11-04ceph: check session state after bumping session->s_seqJeff Layton1-1/+1
Some messages sent by the MDS entail a session sequence number increment, and the MDS will drop certain types of requests on the floor when the sequence numbers don't match. In particular, a REQUEST_CLOSE message can cross with one of the sequence morphing messages from the MDS which can cause the client to stall, waiting for a response that will never come. Originally, this meant an up to 5s delay before the recurring workqueue job kicked in and resent the request, but a recent change made it so that the client would never resend, causing a 60s stall unmounting and sometimes a blockisting event. Add a new helper for incrementing the session sequence and then testing to see whether a REQUEST_CLOSE needs to be resent, and move the handling of CEPH_MDS_SESSION_CLOSING into that function. Change all of the bare sequence counter increments to use the new helper. Reorganize check_session_state with a switch statement. It should no longer be called when the session is CLOSING, so throw a warning if it ever is (but still handle that case sanely). [ idryomov: whitespace, pr_err() call fixup ] URL: https://tracker.ceph.com/issues/47563 Fixes: fa9967734227 ("ceph: fix potential mdsc use-after-free crash") Reported-by: Patrick Donnelly <pdonnell@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-10-12ceph: comment cleanups and clarificationsJeff Layton1-0/+16
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-10-12ceph: break up send_cap_msgJeff Layton1-32/+28
Push the allocation of the msg and the send into the caller. Rename the function to encode_cap_msg and make it void return. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-10-12ceph: drop separate mdsc argument from __send_capJeff Layton1-6/+5
We can get it from the session if we need it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-10-12ceph: metrics for opened files, pinned caps and opened inodesXiubo Li1-2/+36
In client for each inode, it may have many opened files and may have been pinned in more than one MDS servers. And some inodes are idle, which have no any opened files. This patch will show these metrics in the debugfs, likes: item total ----------------------------------------- opened files / total inodes 14 / 5 pinned i_caps / total inodes 7 / 5 opened inodes / total inodes 3 / 5 Will send these metrics to ceph, which will be used by the `fs top`, later. [ jlayton: drop unrelated hunk, count hashed inodes instead of allocated ones ] URL: https://tracker.ceph.com/issues/47005 Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-10-12ceph: add ceph_sb_to_mdsc helper support to parse the mdscXiubo Li1-2/+1
This will help simplify the code. [ jlayton: fix minor merge conflict in quota.c ] Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-24ceph: fix inode number handling on arches with 32-bit ino_tJeff Layton1-7/+7
Tuan and Ulrich mentioned that they were hitting a problem on s390x, which has a 32-bit ino_t value, even though it's a 64-bit arch (for historical reasons). I think the current handling of inode numbers in the ceph driver is wrong. It tries to use 32-bit inode numbers on 32-bit arches, but that's actually not a problem. 32-bit arches can deal with 64-bit inode numbers just fine when userland code is compiled with LFS support (the common case these days). What we really want to do is just use 64-bit numbers everywhere, unless someone has mounted with the ino32 mount option. In that case, we want to ensure that we hash the inode number down to something that will fit in 32 bits before presenting the value to userland. Add new helper functions that do this, and only do the conversion before presenting these values to userland in getattr and readdir. The inode table hashvalue is changed to just cast the inode number to unsigned long, as low-order bits are the most likely to vary anyway. While it's not strictly required, we do want to put something in inode->i_ino. Instead of basing it on BITS_PER_LONG, however, base it on the size of the ino_t type. NOTE: This is a user-visible change on 32-bit arches: 1/ inode numbers will be seen to have changed between kernel versions. 32-bit arches will see large inode numbers now instead of the hashed ones they saw before. 2/ any really old software not built with LFS support may start failing stat() calls with -EOVERFLOW on inode numbers >2^32. Nothing much we can do about these, but hopefully the intersection of people running such code on ceph will be very small. The workaround for both problems is to mount with "-o ino32". [ idryomov: changelog tweak ] URL: https://tracker.ceph.com/issues/46828 Reported-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> Reported-and-Tested-by: Tuan Hoang1 <Tuan.Hoang1@ibm.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03ceph: clean up and optimize ceph_check_delayed_caps()Jeff Layton1-6/+4
Make this loop look a bit more sane. Also optimize away the spinlock release/reacquire if we can't get an inode reference. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03ceph: add global total_caps to count the mdsc's total caps numberXiubo Li1-0/+2
This will help to reduce using the global mdsc->mutex lock in many places. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01ceph: skip checking caps when session reconnecting and releasing reqsXiubo Li1-2/+13
It make no sense to check the caps when reconnecting to mds. And for the async dirop caps, they will be put by its _cb() function, so when releasing the requests, it will make no sense too. URL: https://tracker.ceph.com/issues/45635 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01ceph: ceph_kick_flushing_caps needs the s_mutexJeff Layton1-0/+2
The mdsc->cap_dirty_lock is not held while walking the list in ceph_kick_flushing_caps, which is not safe. ceph_early_kick_flushing_caps does something similar, but the s_mutex is held while it's called and I think that guards against changes to the list. Ensure we hold the s_mutex when calling ceph_kick_flushing_caps, and add some clarifying comments. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01ceph: request expedited service on session's last cap flushJeff Layton1-2/+6
When flushing a lot of caps to the MDS's at once (e.g. for syncfs), we can end up waiting a substantial amount of time for MDS replies, due to the fact that it may delay some of them so that it can batch them up together in a single journal transaction. This can lead to stalls when calling sync or syncfs. What we'd really like to do is request expedited service on the _last_ cap we're flushing back to the server. If the CHECK_CAPS_FLUSH flag is set on the request and the current inode was the last one on the session->s_cap_dirty list, then mark the request with CEPH_CLIENT_CAPS_SYNC. Note that this heuristic is not perfect. New inodes can race onto the list after we've started flushing, but it does seem to fix some common use cases. URL: https://tracker.ceph.com/issues/44744 Reported-by: Jan Fajerski <jfajerski@suse.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01ceph: convert mdsc->cap_dirty to a per-session listJeff Layton1-13/+65
This is a per-sb list now, but that makes it difficult to tell when the cap is the last dirty one associated with the session. Switch this to be a per-session list, but continue using the mdsc->cap_dirty_lock to protect the lists. This list is only ever walked in ceph_flush_dirty_caps, so change that to walk the sessions array and then flush the caps for inodes on each session's list. If the auth cap ever changes while the inode has dirty caps, then move the inode to the appropriate session for the new auth_cap. Also, ensure that we never remove an auth cap while the inode is still on the s_cap_dirty list. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01ceph: reset i_requested_max_size if file write is not wantedYan, Zheng1-10/+19
write can stuck at waiting for larger max_size in following sequence of events: - client opens a file and writes to position 'A' (larger than unit of max size increment) - client closes the file handle and updates wanted caps (not wanting file write caps) - client opens and truncates the file, writes to position 'A' again. At the 1st event, client set inode's requested_max_size to 'A'. At the 2nd event, mds removes client's writable range, but client does not reset requested_max_size. At the 3rd event, client does not request max size because requested_max_size is already larger than 'A'. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01ceph: fix potential race in ceph_check_capsJeff Layton1-1/+13
Nothing ensures that session will still be valid by the time we dereference the pointer. Take and put a reference. In principle, we should always be able to get a reference here, but throw a warning if that's ever not the case. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01ceph: don't take i_ceph_lock in handle_cap_importJeff Layton1-3/+2
Just take it before calling it. This means we have to do a couple of minor in-memory operations under the spinlock now, but those shouldn't be an issue. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01ceph: don't release i_ceph_lock in handle_cap_truncJeff Layton1-8/+10
There's no reason to do this here. Just have the caller handle it. Also, add a lockdep assertion. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01ceph: add comments for handle_cap_flush_ack logicJeff Layton1-1/+13
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01ceph: split up __finish_cap_flushJeff Layton1-31/+29
This function takes a mdsc argument or ci argument, but if both are passed in, it ignores the ci arg. Fortunately, nothing does that, but there's no good reason to have the same function handle both cases. Also, get rid of some branches and just use |= to set the wake_* vals. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01ceph: reorganize __send_cap for less spinlock abuseJeff Layton1-79/+86
Get rid of the __releases annotation by breaking it up into two functions: __prep_cap which is done under the spinlock and __send_cap that is done outside it. Add new fields to cap_msg_args for the wake boolean and old_xattr_buf pointer. Nothing checks the return value from __send_cap, so make it void return. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01ceph: add caps perf metric for each superblockXiubo Li1-0/+19
Count hits and misses in the caps cache. If the client has all of the necessary caps when a task needs references, then it's counted as a hit. Any other situation is a miss. URL: https://tracker.ceph.com/issues/43215 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-05-27ceph: flush release queue when handling caps for unknown inodeJeff Layton1-1/+1
It's possible for the VFS to completely forget about an inode, but for it to still be sitting on the cap release queue. If the MDS sends the client a cap message for such an inode, it just ignores it today, which can lead to a stall of up to 5s until the cap release queue is flushed. If we get a cap message for an inode that can't be located, then go ahead and flush the cap release queue. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/45532 Fixes: 1e9c2eb6811e ("ceph: delete stale dentry when last reference is dropped") Reported-and-Tested-by: Andrej Filipčič <andrej.filipcic@ijs.si> Suggested-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-05-04ceph: fix double unlock in handle_cap_export()Wu Bo1-0/+1
If the ceph_mdsc_open_export_target_session() return fails, it will do a "goto retry", but the session mutex has already been unlocked. Re-lock the mutex in that case to ensure that we don't unlock it twice. Signed-off-by: Wu Bo <wubo40@huawei.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-05-04ceph: fix special error code in ceph_try_get_caps()Wu Bo1-1/+1
There are 3 speical error codes: -EAGAIN/-EFBIG/-ESTALE. After calling try_get_cap_refs, ceph_try_get_caps test for the -EAGAIN twice. Ensure that it tests for -ESTALE instead. Signed-off-by: Wu Bo <wubo40@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: wait for async creating inode before requesting new max sizeYan, Zheng1-0/+5
ceph_check_caps() can't request new max size for async creating inode. This may make ceph_get_caps() loop busily until getting reply of the async create. Also, wait for async creating reply before calling ceph_renew_caps(). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: don't skip updating wanted caps when cap is staleYan, Zheng1-2/+6
1. try_get_cap_refs() fails to get caps and finds that mds_wanted does not include what it wants. It returns -ESTALE. 2. ceph_get_caps() calls ceph_renew_caps(). ceph_renew_caps() finds that inode has cap, so it calls ceph_check_caps(). 3. ceph_check_caps() finds that issued caps (without checking if it's stale) already includes caps wanted by open file, so it skips updating wanted caps. Above events can cause an infinite loop inside ceph_get_caps(). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: request new max size only when there is auth capYan, Zheng1-1/+1
When there is no auth cap, check_max_size() can't do anything and may cause an infinite loop inside ceph_get_caps(). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: cleanup return error of try_get_cap_refs()Yan, Zheng1-11/+15
Returns 0 if caps were not able to be acquired (yet), 1 if cap acquisition succeeded, or a negative error code. There are 3 special error codes: -EAGAIN: need to sleep but non-blocking is specified -EFBIG: ask caller to call check_max_size() and try again. -ESTALE: ask caller to call ceph_renew_caps() and try again. [ jlayton: add WARN_ON_ONCE check for -EAGAIN ] Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: check all mds' caps after page writebackYan, Zheng1-1/+1
If an inode has caps from multiple mds's, the following can happen: - non-auth mds revokes Fsc. Fcb is used, so page writeback is queued. - when writeback finishes, ceph_check_caps() is called with auth only flag. ceph_check_caps() invalidates pagecache, but skips checking any non-auth caps. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: update i_requested_max_size only when sending cap msg to auth mdsYan, Zheng1-1/+2
Non-auth mds can't do anything to 'update max' cap message. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>