Age | Commit message (Collapse) | Author | Files | Lines |
|
The commit 643fa9612bf1 ("fscrypt: remove filesystem specific
build config option") removed modular support for fs/crypto. This
causes the Crypto API to be built-in whenever fscrypt is enabled.
This makes it very difficult for me to test modular builds of
the Crypto API without disabling fscrypt which is a pain.
As fscrypt is still evolving and it's developing new ties with the
fs layer, it's hard to build it as a module for now.
However, the actual algorithms are not required until a filesystem
is mounted. Therefore we can allow them to be built as modules.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20191227024700.7vrzuux32uyfdgum@gondor.apana.org.au
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
fscrypt_get_encryption_info() returns 0 if the encryption key is
unavailable; it never returns ENOKEY. So remove checks for ENOKEY.
Link: https://lore.kernel.org/r/20191209212348.243331-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
fscrypt_is_direct_key_policy() is no longer used, so remove it.
Link: https://lore.kernel.org/r/20191209211829.239800-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
fscrypt_valid_enc_modes() is only used by policy.c, so move it to there.
Also adjust the order of the checks to be more natural, matching the
numerical order of the constants and also keeping AES-256 (the
recommended default) first in the list.
No change in behavior.
Link: https://lore.kernel.org/r/20191209211829.239800-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
FSCRYPT_POLICY_FLAG_DIRECT_KEY is currently only allowed with Adiantum
encryption. But FS_IOC_SET_ENCRYPTION_POLICY allowed it in combination
with other encryption modes, and an error wasn't reported until later
when the encrypted directory was actually used.
Fix it to report the error earlier by validating the correct use of the
DIRECT_KEY flag in fscrypt_supported_policy(), similar to how we
validate the IV_INO_LBLK_64 flag.
Link: https://lore.kernel.org/r/20191209211829.239800-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Make fscrypt_supported_policy() call new functions
fscrypt_supported_v1_policy() and fscrypt_supported_v2_policy(), to
reduce the indentation level and make the code easier to read.
Also adjust the function comment to mention that whether the encryption
policy is supported can also depend on the inode.
No change in behavior.
Link: https://lore.kernel.org/r/20191209211829.239800-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
fscrypt_d_revalidate() and fscrypt_d_ops really belong in fname.c, since
they're specific to filenames encryption. crypto.c is for contents
encryption and general fs/crypto/ initialization and utilities.
Link: https://lore.kernel.org/r/20191209204359.228544-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Constify the struct inode parameter to fscrypt_fname_disk_to_usr() and
the other filename encryption functions so that users don't have to pass
in a non-const inode when they are dealing with a const one, as in [1].
[1] https://lkml.kernel.org/linux-ext4/20191203051049.44573-6-drosen@google.com/
Cc: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20191215213947.9521-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Constify the struct fscrypt_hkdf parameter to fscrypt_hkdf_expand().
This makes it clearer that struct fscrypt_hkdf contains the key only,
not any per-request state.
Link: https://lore.kernel.org/r/20191209204054.227736-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
As a sanity check, verify that the allocated crypto_skcipher actually
has the ivsize that fscrypt is assuming it has. This will always be the
case unless there's a bug. But if there ever is such a bug (e.g. like
there was in earlier versions of the ESSIV conversion patch [1]) it's
preferable for it to be immediately obvious, and not rely on the
ciphertext verification tests failing due to uninitialized IV bytes.
[1] https://lkml.kernel.org/linux-crypto/20190702215517.GA69157@gmail.com/
Link: https://lore.kernel.org/r/20191209203918.225691-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Crypto API users shouldn't really be accessing struct skcipher_alg
directly. <crypto/skcipher.h> already has a function
crypto_skcipher_driver_name(), so use that instead.
No change in behavior.
Link: https://lore.kernel.org/r/20191209203810.225302-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be
specified by a Linux keyring key, rather than specified directly.
This is useful because fscrypt keys belong to a particular filesystem
instance, so they are destroyed when that filesystem is unmounted.
Usually this is desired. But in some cases, userspace may need to
unmount and re-mount the filesystem while keeping the keys, e.g. during
a system update. This requires keeping the keys somewhere else too.
The keys could be kept in memory in a userspace daemon. But depending
on the security architecture and assumptions, it can be preferable to
keep them only in kernel memory, where they are unreadable by userspace.
We also can't solve this by going back to the original fscrypt API
(where for each file, the master key was looked up in the process's
keyring hierarchy) because that caused lots of problems of its own.
Therefore, add the ability for FS_IOC_ADD_ENCRYPTION_KEY to accept a
Linux keyring key. This solves the problem by allowing userspace to (if
needed) save the keys securely in a Linux keyring for re-provisioning,
while still using the new fscrypt key management ioctls.
This is analogous to how dm-crypt accepts a Linux keyring key, but the
key is then stored internally in the dm-crypt data structures rather
than being looked up again each time the dm-crypt device is accessed.
Use a custom key type "fscrypt-provisioning" rather than one of the
existing key types such as "logon". This is strongly desired because it
enforces that these keys are only usable for a particular purpose: for
fscrypt as input to a particular KDF. Otherwise, the keys could also be
passed to any kernel API that accepts a "logon" key with any service
prefix, e.g. dm-crypt, UBIFS, or (recently proposed) AF_ALG. This would
risk leaking information about the raw key despite it ostensibly being
unreadable. Of course, this mistake has already been made for multiple
kernel APIs; but since this is a new API, let's do it right.
This patch has been tested using an xfstest which I wrote to test it.
Link: https://lore.kernel.org/r/20191119222447.226853-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull /proc/locks formatting fix from Jeff Layton:
"This is a trivial fix for a _very_ long standing bug in /proc/locks
formatting. Ordinarily, I'd wait for the merge window for something
like this, but it is making it difficult to validate some overlayfs
fixes.
I've also gone ahead and marked this for stable"
* tag 'locks-v5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: print unsigned ino in /proc/locks
|
|
Pull cifs fixes from Steve French:
"One performance fix for large directory searches, and one minor style
cleanup noticed by Clang"
* tag '5.5-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Optimize readdir on reparse points
cifs: Adjust indentation in smb2_open_file
|
|
An ino is unsigned, so display it as such in /proc/locks.
Cc: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
|
|
Pull io_uring fixes from Jens Axboe:
- Removal of now unused busy wqe list (Hillf)
- Add cond_resched() to io-wq work processing (Hillf)
- And then the series that I hinted at from last week, which removes
the sqe from the io_kiocb and keeps all sqe handling on the prep
side. This guarantees that an opcode can't do the wrong thing and
read the sqe more than once. This is unchanged from last week, no
issues have been observed with this in testing. Hence I really think
we should fold this into 5.5.
* tag 'io_uring-5.5-20191226' of git://git.kernel.dk/linux-block:
io-wq: add cond_resched() to worker thread
io-wq: remove unused busy list from io_sqe
io_uring: pass in 'sqe' to the prep handlers
io_uring: standardize the prep methods
io_uring: read 'count' for IORING_OP_TIMEOUT in prep handler
io_uring: move all prep state for IORING_OP_{SEND,RECV}_MGS to prep handler
io_uring: move all prep state for IORING_OP_CONNECT to prep handler
io_uring: add and use struct io_rw for read/writes
io_uring: use u64_to_user_ptr() consistently
|
|
Reschedule the current IO worker to cut the risk that it is becoming
a cpu hog.
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit e61df66c69b1 ("io-wq: ensure free/busy list browsing see all
items") added a list for io workers in addition to the free and busy
lists, not only making worker walk cleaner, but leaving the busy list
unused. Let's remove it.
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When listing a directory with thounsands of files and most of them are
reparse points, we simply marked all those dentries for revalidation
and then sending additional (compounded) create/getinfo/close requests
for each of them.
Instead, upon receiving a response from an SMB2_QUERY_DIRECTORY
(FileIdFullDirectoryInformation) command, the directory entries that
have a file attribute of FILE_ATTRIBUTE_REPARSE_POINT will contain an
EaSize field with a reparse tag in it, so we parse it and mark the
dentry for revalidation only if it is a DFS or a symlink.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Clang warns:
../fs/cifs/smb2file.c:70:3: warning: misleading indentation; statement
is not part of the previous 'if' [-Wmisleading-indentation]
if (oparms->tcon->use_resilient) {
^
../fs/cifs/smb2file.c:66:2: note: previous statement is here
if (rc)
^
1 warning generated.
This warning occurs because there is a space after the tab on this line.
Remove it so that the indentation is consistent with the Linux kernel
coding style and clang no longer warns.
Fixes: 592fafe644bf ("Add resilienthandles mount parm")
Link: https://github.com/ClangBuiltLinux/linux/issues/826
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Pull vfs fixes from Al Viro:
"Eric's s_inodes softlockup fixes + Jan's fix for recent regression
from pipe rework"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: call fsnotify_sb_delete after evict_inodes
fs: avoid softlockups in s_inodes iterators
pipe: Fix bogus dereference in iov_iter_alignment()
|
|
Pull xfs fixes from Darrick Wong:
"Fix a few bugs that could lead to corrupt files, fsck complaints, and
filesystem crashes:
- Minor documentation fixes
- Fix a file corruption due to read racing with an insert range
operation.
- Fix log reservation overflows when allocating large rt extents
- Fix a buffer log item flags check
- Don't allow administrators to mount with sunit= options that will
cause later xfs_repair complaints about the root directory being
suspicious because the fs geometry appeared inconsistent
- Fix a non-static helper that should have been static"
* tag 'xfs-5.5-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Make the symbol 'xfs_rtalloc_log_count' static
xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
xfs: split the sunit parameter update into two parts
xfs: refactor agfl length computation function
libxfs: resync with the userspace libxfs
xfs: use bitops interface for buf log item AIL flag check
xfs: fix log reservation overflows when allocating large rt extents
xfs: stabilize insert range start boundary to avoid COW writeback race
xfs: fix Sphinx documentation warning
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bug fixes from Ted Ts'o:
"Ext4 bug fixes, including a regression fix"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: clarify impact of 'commit' mount option
ext4: fix unused-but-set-variable warning in ext4_add_entry()
jbd2: fix kernel-doc notation warning
ext4: use RCU API in debug_print_tree
ext4: validate the debug_want_extra_isize mount option at parse time
ext4: reserve revoke credits in __ext4_new_inode
ext4: unlock on error in ext4_expand_extra_isize()
ext4: optimize __ext4_check_dir_entry()
ext4: check for directory entries too close to block end
ext4: fix ext4_empty_dir() for directories with holes
|
|
LTP pipeio_1 test is hanging with v5.5-rc2-385-gb8e382a185eb,
with read side observing empty pipe and sleeping and write
side running out of space and then sleeping as well. In this
scenario there are 5 writers and 1 reader.
Problem is that after pipe_write() reacquires pipe lock, it
re-checks for empty pipe with potentially stale 'head' and
doesn't wake up read side anymore. pipe->tail can advance
beyond 'head', because there are multiple writers.
Use pipe->head for empty pipe check after reacquiring lock
to observe current state.
Testing: With patch, LTP pipeio_1 ran successfully in loop for 1 hour.
Without patch it hanged within a minute.
Fixes: 1b6b26ae7053 ("pipe: fix and clarify pipe write wakeup logic")
Reported-by: Rachel Sibley <rasibley@redhat.com>
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Warning is found when compile with "-Wunused-but-set-variable":
fs/ext4/namei.c: In function ‘ext4_add_entry’:
fs/ext4/namei.c:2167:23: warning: variable ‘sbi’ set but not used
[-Wunused-but-set-variable]
struct ext4_sb_info *sbi;
^~~
Fix this by moving the variable @sbi under CONFIG_UNICODE.
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/cb5eb904-224a-9701-c38f-cb23514b1fff@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Pull io_uring fixes from Jens Axboe:
"Here's a set of fixes that should go into 5.5-rc3 for io_uring.
This is bigger than I'd like it to be, mainly because we're fixing the
case where an application reuses sqe data right after issue. This
really must work, or it's confusing. With 5.5 we're flagging us as
submit stable for the actual data, this must also be the case for
SQEs.
Honestly, I'd really like to add another series on top of this, since
it cleans it up considerable and prevents any SQE reuse by design. I
posted that here:
https://lore.kernel.org/io-uring/20191220174742.7449-1-axboe@kernel.dk/T/#u
and may still send it your way early next week once it's been looked
at and had some more soak time (does pass all regression tests). With
that series, we've unified the prep+issue handling, and only the prep
phase even has access to the SQE.
Anyway, outside of that, fixes in here for a few other issues that
have been hit in testing or production"
* tag 'io_uring-5.5-20191220' of git://git.kernel.dk/linux-block:
io_uring: io_wq_submit_work() should not touch req->rw
io_uring: don't wait when under-submitting
io_uring: warn about unhandled opcode
io_uring: read opcode and user_data from SQE exactly once
io_uring: make IORING_OP_TIMEOUT_REMOVE deferrable
io_uring: make IORING_OP_CANCEL_ASYNC deferrable
io_uring: make IORING_POLL_ADD and IORING_POLL_REMOVE deferrable
io_uring: make HARDLINK imply LINK
io_uring: any deferred command must have stable sqe data
io_uring: remove 'sqe' parameter to the OP helpers that take it
io_uring: fix pre-prepped issue with force_nonblock == true
io-wq: re-add io_wq_current_is_worker()
io_uring: fix sporadic -EFAULT from IORING_OP_RECVMSG
io_uring: fix stale comment and a few typos
|
|
This moves the prep handlers outside of the opcode handlers, and allows
us to pass in the sqe directly. If the sqe is non-NULL, it means that
the request should be prepared for the first time.
With the opcode handlers not having access to the sqe at all, we are
guaranteed that the prep handler has setup the request fully by the
time we get there. As before, for opcodes that need to copy in more
data then the io_kiocb allows for, the io_async_ctx holds that info. If
a prep handler is invoked with req->io set, it must use that to retain
information for later.
Finally, we can remove io_kiocb->sqe as well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We currently have a mix of use cases. Most of the newer ones are pretty
uniform, but we have some older ones that use different calling
calling conventions. This is confusing.
For the opcodes that currently rely on the req->io->sqe copy saving
them from reuse, add a request type struct in the io_kiocb command
union to store the data they need.
Prepare for all opcodes having a standard prep method, so we can call
it in a uniform fashion and outside of the opcode handler. This is in
preparation for passing in the 'sqe' pointer, rather than storing it
in the io_kiocb. Once we have uniform prep handlers, we can leave all
the prep work to that part, and not even pass in the sqe to the opcode
handler. This ensures that we don't reuse sqe data inadvertently.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add the count field to struct io_timeout, and ensure the prep handler
has read it. Timeout also needs an async context always, set it up
in the prep handler if we don't have one.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add struct io_sr_msg in our io_kiocb per-command union, and ensure that
the send/recvmsg prep handlers have grabbed what they need from the SQE
by the time prep is done.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add struct io_connect in our io_kiocb per-command union, and ensure
that io_connect_prep() has grabbed what it needs from the SQE.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Put the kiocb in struct io_rw, and add the addr/len for the request as
well. Use the kiocb->private field for the buffer index for fixed reads
and writes.
Any use of kiocb->ki_filp is flipped to req->file. It's the same thing,
and less confusing.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix the following sparse warning:
fs/xfs/libxfs/xfs_trans_resv.c:206:1: warning: symbol 'xfs_rtalloc_log_count' was not declared. Should it be static?
Fixes: b1de6fc7520f ("xfs: fix log reservation overflows when allocating large rt extents")
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
We use it in some spots, but not consistently. Convert the rest over,
makes it easier to read as well.
No functional changes in this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
failures
Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
and swidth values could cause xfs_repair to fail loudly. The problem
here is that repair calculates the where mkfs should have allocated the
root inode, based on the superblock geometry. The allocation decisions
depend on sunit, which means that we really can't go updating sunit if
it would lead to a subsequent repair failure on an otherwise correct
filesystem.
Port from xfs_repair some code that computes the location of the root
inode and teach mount to skip the ondisk update if it would cause
problems for repair. Along the way we'll update the documentation,
provide a function for computing the minimum AGFL size instead of
open-coding it, and cut down some indenting in the mount code.
Note that we allow the mount to proceed (and new allocations will
reflect this new geometry) because we've never screened this kind of
thing before. We'll have to wait for a new future incompat feature to
enforce correct behavior, alas.
Note that the geometry reporting always uses the superblock values, not
the incore ones, so that is what xfs_info and xfs_growfs will report.
[1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d
Reported-by: Alex Lyakas <alex@zadara.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
If the administrator provided a sunit= mount option, we need to validate
the raw parameter, convert the mount option units (512b blocks) into the
internal unit (fs blocks), and then validate that the (now cooked)
parameter doesn't screw anything up on disk. The incore inode geometry
computation can depend on the new sunit option, but a subsequent patch
will make validating the cooked value depends on the computed inode
geometry, so break the sunit update into two steps.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Refactor xfs_alloc_min_freelist to accept a NULL @pag argument, in which
case it returns the largest possible minimum length. This will be used
in an upcoming patch to compute the length of the AGFL at mkfs time.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Prepare to resync the userspace libxfs with the kernel libxfs. There
were a few things I missed -- a couple of static inline directory
functions that have to be exported for xfs_repair; a couple of directory
naming functions that make porting much easier if they're /not/ static
inline; and a u16 usage that should have been uint16_t.
None of these things are bugs in their own right; this just makes
porting xfsprogs easier.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
|
|
The xfs_log_item flags were converted to atomic bitops as of commit
22525c17ed ("xfs: log item flags are racy"). The assert check for
AIL presence in xfs_buf_item_relse() still uses the old value based
check. This likely went unnoticed as XFS_LI_IN_AIL evaluates to 0
and causes the assert to unconditionally pass. Fix up the check.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Fixes: 22525c17ed ("xfs: log item flags are racy")
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
I've been chasing a weird and obscure crash that was userspace stack
corruption, and finally narrowed it down to a bit flip that made a
stack address invalid. io_wq_submit_work() unconditionally flips
the req->rw.ki_flags IOCB_NOWAIT bit, but since it's a generic work
handler, this isn't valid. Normal read/write operations own that
part of the request, on other types it could be something else.
Move the IOCB_NOWAIT clear to the read/write handlers where it belongs.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is no reliable way to submit and wait in a single syscall, as
io_submit_sqes() may under-consume sqes (in case of an early error).
Then it will wait for not-yet-submitted requests, deadlocking the user
in most cases.
Don't wait/poll if can't submit all sqes
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When a filesystem is unmounted, we currently call fsnotify_sb_delete()
before evict_inodes(), which means that fsnotify_unmount_inodes()
must iterate over all inodes on the superblock looking for any inodes
with watches. This is inefficient and can lead to livelocks as it
iterates over many unwatched inodes.
At this point, SB_ACTIVE is gone and dropping refcount to zero kicks
the inode out out immediately, so anything processed by
fsnotify_sb_delete / fsnotify_unmount_inodes gets evicted in that loop.
After that, the call to evict_inodes will evict everything else with a
zero refcount.
This should speed things up overall, and avoid livelocks in
fsnotify_unmount_inodes().
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Anything that walks all inodes on sb->s_inodes list without rescheduling
risks softlockups.
Previous efforts were made in 2 functions, see:
c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
ac05fbb inode: don't softlockup when evicting inodes
but there hasn't been an audit of all walkers, so do that now. This
also consistently moves the cond_resched() calls to the bottom of each
loop in cases where it already exists.
One loop remains: remove_dquot_ref(), because I'm not quite sure how
to deal with that one w/o taking the i_lock.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Now that we have all the opcodes handled in terms of command prep and
SQE reuse, add a printk_once() to warn about any potentially new and
unhandled ones.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If we defer a request, we can't be reading the opcode again. Ensure that
the user_data and opcode fields are stable. For the user_data we already
have a place for it, for the opcode we can fill a one byte hold and store
that as well. For both of them, assign them when we originally read the
SQE in io_get_sqring(). Any code that uses sqe->opcode or sqe->user_data
is switched to req->opcode and req->user_data.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the timeout remove op into
the prep handling to make it safe for SQE reuse.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the async cancel op into
the prep handling to make it safe for SQE reuse.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If we defer these commands as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the poll add/remove into
the prep handling to make it safe for SQE reuse.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The rules are as follows, if IOSQE_IO_HARDLINK is specified, then it's a
link and there is no need to set IOSQE_IO_LINK separately, though it
could be there. Add proper check and ensure that IOSQE_IO_HARDLINK
implies IOSQE_IO_LINK.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We're currently not retaining sqe data for accept, fsync, and
sync_file_range. None of these commands need data outside of what
is directly provided, hence it can't go stale when the request is
deferred. However, it can get reused, if an application reuses
SQE entries.
Ensure that we retain the information we need and only read the sqe
contents once, off the submission path. Most of this is just moving
code into a prep and finish function.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|