Age | Commit message (Collapse) | Author | Files | Lines |
|
Unfortunately multiple kmap() within a single thread are deadlockable,
so writing out multiple buffers with writev() isn't possible.
Change the implementation so that it does a separate write() for each
buffer. This actually simplifies the code a lot since the
splice_from_pipe() helper can be used.
This limitation is caused by HIGHMEM pages, and so only affects a
subset of architectures and configurations. In the future it may be
worth to implement default_file_splice_write() in a more efficient way
on configs that allow it.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
When a read bio_copy_kern() request fails, the content of the bounce
buffer is not copied back. However, as request failure doesn't
necessarily mean complete failure, the buffer state can be useful.
This behavior is also inconsistent with the user map counterpart and
causes the subtle difference between bounced and unbounced IO causes
confusion.
This patch makes bio_copy_kern_endio() ignore @err and always copy
back data on request completion.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
fs/splice.c: In function 'default_file_splice_read':
fs/splice.c:566: warning: 'error' may be used uninitialized in this function
which is sort-of true. The code will in fact return -ENOMEM instead of the
kernel_readv() return value.
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
We cannot reliably map more than one page at the time, or we risk
deadlocking. Just allocate the pages from low mem instead.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
If f_op->splice_write() is not implemented, fall back to a plain write.
Use vfs_writev() to write from the pipe buffers.
This will allow splice on all filesystems and file types. This
includes "direct_io" files in fuse which bypass the page cache.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
If f_op->splice_read() is not implemented, fall back to a plain read.
Use vfs_readv() to read into previously allocated pages.
This will allow splice and functions using splice, such as the loop
device, to work on all filesystems. This includes "direct_io" files
in fuse which bypass the page cache.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Allow splice(2) to work when both the input and the output is a pipe.
Based on the impementation of the tee(2) syscall, but instead of
duplicating the buffer references move the buffers from the input pipe
to the output pipe.
Moving the whole buffer only succeeds if the full length of the buffer
is spliced. Otherwise duplicate the buffer, just like tee(2), set the
length of the output buffer and advance the offset on the input
buffer.
Since splice is operating on two pipes, special care needs to be taken
with locking to prevent AN ABBA deadlock. Again this is done
similarly to the tee(2) syscall, first preparing the input and output
pipes so there's data to consume and space for that data, and then
doing the move operation while holding both locks.
If other processes are doing I/O on the same pipes parallel to the
splice, then by the time both inodes are locked there might be no
buffers left to move, or no space to move them to. In this case retry
the whole operation, including the preparation phase. This could lead
to starvation, but I'm not sure if that's serious enough to worry
about.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
rq->data_len served two purposes - the length of data buffer on issue
and the residual count on completion. This duality creates some
headaches.
First of all, block layer and low level drivers can't really determine
what rq->data_len contains while a request is executing. It could be
the total request length or it coulde be anything else one of the
lower layers is using to keep track of residual count. This
complicates things because blk_rq_bytes() and thus
[__]blk_end_request_all() relies on rq->data_len for PC commands.
Drivers which want to report residual count should first cache the
total request length, update rq->data_len and then complete the
request with the cached data length.
Secondly, it makes requests default to reporting full residual count,
ie. reporting that no data transfer occurred. The residual count is
an exception not the norm; however, the driver should clear
rq->data_len to zero to signify the normal cases while leaving it
alone means no data transfer occurred at all. This reverse default
behavior complicates code unnecessarily and renders block PC on some
drivers (ide-tape/floppy) unuseable.
This patch adds rq->resid_len which is used only for residual count.
While at it, remove now unnecessasry blk_rq_bytes() caching in
ide_pc_intr() as rq->data_len is not changed anymore.
Boaz : spotted missing conversion in osd
Sergei : spotted too early conversion to blk_rq_bytes() in ide-tape
[ Impact: cleanup residual count handling, report 0 resid by default ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Doing a proper block dev ->readpages() speeds up the crazy dump(8)
approach of using interleaved process IO.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
RomFS should advance the destination buffer pointer when reading data from a
blockdev source (the data may be split over multiple blocks, each requiring its
own sb_read() call). Without this, all the data is copied to the beginning of
the output buffer.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
romfs_lookup() should be using a routine akin to strcmp() on the backing store,
rather than one akin to strncmp(). If it uses the latter, it's liable to match
/bin/shutdown when looking up /bin/sh.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Fix potential inode allocation soft lockup in Orlov allocator
ext4: Make the extent validity check more paranoid
jbd: use SWRITE_SYNC_PLUG when writing synchronous revoke records
jbd2: use SWRITE_SYNC_PLUG when writing synchronous revoke records
ext4: really print the find_group_flex fallback warning only once
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
eCryptfs: Larger buffer for encrypted symlink targets
eCryptfs: Lock lower directory inode mutex during lookup
eCryptfs: Remove ecryptfs_unlink_sigs warnings
eCryptfs: Fix data corruption when using ecryptfs_passthrough
eCryptfs: Print FNEK sig properly in /proc/mounts
eCryptfs: NULL pointer dereference in ecryptfs_send_miscdev()
eCryptfs: Copy lower inode attrs before dentry instantiation
|
|
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] update default configuration.
[S390] omit frame pointers on s390 when possible
[S390] Use tape_generic_offline directly.
[S390] /proc/stat idle field for idle cpus
[S390] appldata: avoid deadlock with appldata_mem
[S390] ipl: fix compile breakage
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
GFS2: Ensure that the inode goal block settings are updated
GFS2: Fix bug in block allocation
bitops: Add __ffs64 bitop
|
|
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: cache prio_tree root in cfqq->p_root
cfq-iosched: fix bug with aliased request and cooperation detection
cfq-iosched: clear ->prio_trees[] on cfqd alloc
block: fix intermittent dm timeout based oops
umem: fix request_queue lock warning
block: simplify I/O stat accounting
pktcdvd.h should include mempool.h
cfq-iosched: use the default seek distance when there aren't enough seek samples
cfq-iosched: make seek_mean converge more quickly
block: make blk_abort_queue() ignore non-request based devices
block: include empty disks in /proc/diskstats
bio: use bio_kmalloc() in copy/map functions
bio: fix bio_kmalloc()
block: fix queue bounce limit setting
block: fix SG_IO vector request data length handling
scatterlist: make sure sg_miter_next() doesn't return 0 sized mappings
|
|
write_lock(¤t->fs->lock) guarantees we can't wrongly miss
LSM_UNSAFE_SHARE, this is what we care about. Use rcu_read_lock()
instead of ->siglock to iterate over the sub-threads. We must see
all CLONE_THREAD|CLONE_FS threads which didn't pass exit_fs(), it
takes fs->lock too.
With or without this patch we can miss the freshly cloned thread
and set LSM_UNSAFE_SHARE, we don't care.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
[ Fixed lock/unlock typo - Hugh ]
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If do_execve() fails after check_unsafe_exec(), it clears fs->in_exec
unconditionally. This is wrong if we race with our sub-thread which
also does do_execve:
Two threads T1 and T2 and another process P, all share the same
->fs.
T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since
->fs is shared, we set LSM_UNSAFE but not ->in_exec.
P exits and decrements fs->users.
T2 starts do_execve(), calls check_unsafe_exec(), now ->fs is not
shared, we set fs->in_exec.
T1 continues, open_exec(BAD_FILE) fails, we clear ->in_exec and
return to the user-space.
T1 does clone(CLONE_FS /* without CLONE_THREAD */).
T2 continues without LSM_UNSAFE_SHARE while ->fs is shared with
another process.
Change check_unsafe_exec() to return res = 1 if we set ->in_exec, and change
do_execve() to clear ->in_exec depending on res.
When do_execve() suceeds, it is safe to clear ->in_exec unconditionally.
It can be set only if we don't share ->fs with another process, and since
we already killed all sub-threads either ->in_exec == 0 or we are the
only user of this ->fs.
Also, we do not need fs->lock to clear fs->in_exec.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The cpu idle field in the output of /proc/stat is too small for cpus
that have been idle for more than a tick. Add the architecture hook
arch_idle_time that allows to add the not accounted idle time of a
sleeping cpu without waking the cpu.
The s390 implementation of arch_idle_time uses the already existing
s390_idle_data per_cpu variable to find the sleep time of a neighboring
idle cpu.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
GFS2 has a goal block associated with each inode indicating the
search start position for future block allocations (in fact there
are two, but thats for backward compatibility with GFS1 as they
are set to identical locations in GFS2).
In some circumstances, depending on the ordering of updates to
the inode it was possible for the goal block settings to not
be updated on disk. This patch ensures that the goal block will
always get updated, thus reducing the potential for searching
the same (already allocated) blocks again when looking for free
space during block allocation.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
The new bitfit algorithm was counting from the wrong end of
64 bit words in the bitfield. This fixes it by using __ffs64
instead of fls64
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
If the Orlov allocator is having trouble finding an appropriate block
group, the fallback code could loop forever, causing a soft lockup
warning in find_group_orlov():
BUG: soft lockup - CPU#0 stuck for 61s! [cp:11728]
...
Pid: 11728, comm: cp Not tainted (2.6.30-rc1-dirty #77) Lenovo
EIP: 0060:[<c021650e>] EFLAGS: 00000246 CPU: 0
EIP is at ext4_get_group_desc+0x54/0x9d
...
Call Trace:
[<c0218021>] find_group_orlov+0x2ee/0x334
[<c0120a5f>] ? sched_clock+0x8/0xb
[<c02188e3>] ext4_new_inode+0x2cf/0xb1a
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Instead of just checking that the extent block number is greater or
equal than s_first_data_block, make sure it it is not pointing into
the block group descriptors, since that is clearly wrong. This helps
prevent filesystem from getting very badly corrupted in case an extent
block is corrupted.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
When using filename encryption with eCryptfs, the value of the symlink
in the lower filesystem is encrypted and stored as a Tag 70 packet.
This results in a longer symlink target than if the target value wasn't
encrypted.
Users were reporting these messages in their syslog:
[ 45.653441] ecryptfs_parse_tag_70_packet: max_packet_size is [56]; real
packet size is [51]
[ 45.653444] ecryptfs_decode_and_decrypt_filename: Could not parse tag
70 packet from filename; copying through filename as-is
This was due to bufsiz, one the arguments in readlink(), being used to
when allocating the buffer passed to the lower inode's readlink().
That symlink target may be very large, but when decoded and decrypted,
could end up being smaller than bufsize.
To fix this, the buffer passed to the lower inode's readlink() will
always be PATH_MAX in size when filename encryption is enabled. Any
necessary truncation occurs after the decoding and decrypting.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
|
|
This patch locks the lower directory inode's i_mutex before calling
lookup_one_len() to find the appropriate dentry in the lower filesystem.
This bug was found thanks to the warning set in commit 2f9092e1.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
|
|
A feature was added to the eCryptfs umount helper to automatically
unlink the keys used for an eCryptfs mount from the kernel keyring upon
umount. This patch keeps the unrecognized mount option warnings for
ecryptfs_unlink_sigs out of the logs.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
|
|
ecryptfs_passthrough is a mount option that allows eCryptfs to allow
data to be written to non-eCryptfs files in the lower filesystem. The
passthrough option was causing data corruption due to it not always
being treated as a non-eCryptfs file.
The first 8 bytes of an eCryptfs file contains the decrypted file size.
This value was being written to the non-eCryptfs files, too. Also,
extra 0x00 characters were being written to make the file size a
multiple of PAGE_CACHE_SIZE.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
|
|
The filename encryption key signature is not properly displayed in
/proc/mounts. The "ecryptfs_sig=" mount option name is displayed for
all global authentication tokens, included those for filename keys.
This patch checks the global authentication token flags to determine if
the key is a FEKEK or FNEK and prints the appropriate mount option name
before the signature.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
|
|
If data is NULL, msg_ctx->msg is set to NULL and then dereferenced
afterwards. ecryptfs_send_raw_message() is the only place that
ecryptfs_send_miscdev() is called with data being NULL, but the only
caller of that function (ecryptfs_process_helo()) is never called. In
short, there is currently no way to trigger the NULL pointer
dereference.
This patch removes the two unused functions and modifies
ecryptfs_send_miscdev() to remove the NULL dereferences.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
|
|
Copies the lower inode attributes to the upper inode before passing the
upper inode to d_instantiate(). This is important for
security_d_instantiate().
The problem was discovered by a user seeing SELinux denials like so:
type=AVC msg=audit(1236812817.898:47): avc: denied { 0x100000 } for
pid=3584 comm="httpd" name="testdir" dev=ecryptfs ino=943872
scontext=root:system_r:httpd_t:s0
tcontext=root:object_r:httpd_sys_content_t:s0 tclass=file
Notice target class is file while testdir is really a directory,
confusing the permission translation (0x100000) due to the wrong i_mode.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
|
|
Impact: remove possible deadlock condition
There is no reason to use mempool backed allocation for map functions.
Also, because kern mapping is used inside LLDs (e.g. for EH), using
mempool backed allocation can lead to deadlock under extreme
conditions (mempool already consumed by the time a request reached EH
and requests are blocked on EH).
Switch copy/map functions to bio_kmalloc().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Impact: fix bio_kmalloc() and its destruction path
bio_kmalloc() was broken in two ways.
* bvec_alloc_bs() first allocates bvec using kmalloc() and then
ignores it and allocates again like non-kmalloc bvecs.
* bio_kmalloc_destructor() didn't check for and free bio integrity
data.
This patch fixes the above problems. kmalloc patch is separated out
from bio_alloc_bioset() and allocates the requested number of bvecs as
inline bvecs.
* bio_alloc_bioset() no longer takes NULL @bs. None other than
bio_kmalloc() used it and outside users can't know how it was
allocated anyway.
* Define and use BIO_POOL_NONE so that pool index check in
bvec_free_bs() triggers if inline or kmalloc allocated bvec gets
there.
* Relocate destructors on top of each allocation function so that how
they're used is more clear.
Jens Axboe suggested allocating bvecs inline.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix btrfs fallocate oops and deadlock
Btrfs: use the right node in reada_for_balance
Btrfs: fix oops on page->mapping->host during writepage
Btrfs: add a priority queue to the async thread helpers
Btrfs: use WRITE_SYNC for synchronous writes
|
|
This fixes the following BUG:
# mount -o size=MM -t hugetlbfs none /huge
hugetlbfs: Bad value 'MM' for mount option 'size=MM'
------------[ cut here ]------------
kernel BUG at fs/super.c:996!
Due to
BUG_ON(!mnt->mnt_sb);
in vfs_kern_mount().
Also, remove unused #include <linux/quotaops.h>
Cc: William Irwin <wli@holomorphy.com>
Cc: <stable@kernel.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Btrfs fallocate was incorrectly starting a transaction with a lock held
on the extent_io tree for the file, which could deadlock. Strictly
speaking it was using join_transaction which would be safe, but it is better
to move the transaction outside of the lock.
When preallocated extents are overwritten, btrfs_mark_buffer_dirty was
being called on an unlocked buffer. This was triggering an assertion and
oops because the lock is supposed to be held.
The bug was calling btrfs_mark_buffer_dirty on a leaf after btrfs_del_item had
been run. btrfs_del_item takes care of dirtying things, so the solution is a
to skip the btrfs_mark_buffer_dirty call in this case.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
GFS2: Fix page_mkwrite() return code
GFS2: Clear dirty bit at end of inode glock sync
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
reiserfs: fix j_last_flush_trans_id type
fs: Mark get_filesystem_list() as __init function.
kill vfs_stat_fd / vfs_lstat_fd
Separate out common fstatat code into vfs_fstatat
ecryptfs: use memdup_user()
ncpfs: use memdup_user()
xfs: use memdup_user()
sysfs: use memdup_user()
btrfs: use memdup_user()
xattr: use memdup_user()
autofs4: use memchr() in invalid_string()
Documentation/filesystems: remove out of date reference to BKL being held
Fix i_mutex vs. readdir handling in nfsd
fs/compat_ioctl: fix build when !BLOCK
Fix autofs_expire()
No need for crossing to mountpoint in audit_tag_tree()
Safer nfsd_cross_mnt()
Touch all affected namespaces on propagation of mount
Fix AUTOFS_DEV_IOCTL_REQUESTER_CMD
|
|
Commit ae46141ff08f1965b17c531b571953c39ce8b9e2 (NFSv3: Fix posix ACL code)
introduces a bug in the calculation of the XDR header iovec. In the case
where we are inlining the acls, we need to adjust the length of the iovec
req->rq_svec, in addition to adjusting the total buffer length.
Tested-by: Leonardo Chiquitto <leonardo.lists@gmail.com>
Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
"int get_filesystem_list(char * buf)" is called by only
"static void __init get_fs_names(char *page)".
We can mark get_filesystem_list() as "__init".
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
There's really no reason to keep vfs_stat_fd and vfs_lstat_fd with
Oleg's vfs_fstatat. Use vfs_fstatat for the few cases having the
directory fd, and switch all others to vfs_stat / vfs_lstat.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This is a version incorporating Christoph's suggestion.
Separate out common *fstatat functionality into a single function
instead of duplicating it all over the code.
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Remove open-coded memdup_user().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Remove open-coded memdup_user()
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Remove open-coded memdup_user()
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Remove open-coded memdup_user().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Remove open-coded memdup_user().
Note this changes some GFP_NOFS to GFP_KERNEL, since copy_from_user() may
cause pagefault, it's pointless to pass GFP_NOFS to kmalloc().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Remove open-coded memdup_user()
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Commit 14f7dd63 ("Copy XFS readdir hack into nfsd code") introduced a
bug to generic code which had been extant for a long time in the XFS
version -- it started to call through into lookup_one_len() and hence
into the file systems' ->lookup() methods without i_mutex held on the
directory.
This patch fixes it by locking the directory's i_mutex again before
calling the filldir functions. The original deadlocks which commit
14f7dd63 was designed to avoid are still avoided, because they were due
to fs-internal locking, not i_mutex.
While we're at it, fix the return type of nfsd_buffered_readdir() which
should be a __be32 not an int -- it's an NFS errno, not a Linux errno.
And return nfserrno(-ENOMEM) when allocation fails, not just -ENOMEM.
Sparse would have caught that, if it wasn't so busy bitching about
__cold__.
Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
code") introduced a similar problem with calling lookup_one_len()
without i_mutex, which this patch also addresses. To fix that, it was
necessary to fix the called functions so that they expect i_mutex to be
held; that part was done by J. Bruce Fields.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Umm-I-can-live-with-that-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Tested-by: J. Bruce Fields <bfields@citi.umich.edu>
LKML-Reference: <8036.1237474444@jrobl>
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
In file included from fs/compat_ioctl.c:61:
include/linux/loop.h:59: error: field 'lo_bio_list' has incomplete type
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|