summaryrefslogtreecommitdiffstats
path: root/lib/iov_iter.c
AgeCommit message (Collapse)AuthorFilesLines
2022-12-13Merge tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linuxLinus Torvalds1-8/+24
Pull block updates from Jens Axboe: - NVMe pull requests via Christoph: - Support some passthrough commands without CAP_SYS_ADMIN (Kanchan Joshi) - Refactor PCIe probing and reset (Christoph Hellwig) - Various fabrics authentication fixes and improvements (Sagi Grimberg) - Avoid fallback to sequential scan due to transient issues (Uday Shankar) - Implement support for the DEAC bit in Write Zeroes (Christoph Hellwig) - Allow overriding the IEEE OUI and firmware revision in configfs for nvmet (Aleksandr Miloserdov) - Force reconnect when number of queue changes in nvmet (Daniel Wagner) - Minor fixes and improvements (Uros Bizjak, Joel Granados, Sagi Grimberg, Christoph Hellwig, Christophe JAILLET) - Fix and cleanup nvme-fc req allocation (Chaitanya Kulkarni) - Use the common tagset helpers in nvme-pci driver (Christoph Hellwig) - Cleanup the nvme-pci removal path (Christoph Hellwig) - Use kstrtobool() instead of strtobool (Christophe JAILLET) - Allow unprivileged passthrough of Identify Controller (Joel Granados) - Support io stats on the mpath device (Sagi Grimberg) - Minor nvmet cleanup (Sagi Grimberg) - MD pull requests via Song: - Code cleanups (Christoph) - Various fixes - Floppy pull request from Denis: - Fix a memory leak in the init error path (Yuan) - Series fixing some batch wakeup issues with sbitmap (Gabriel) - Removal of the pktcdvd driver that was deprecated more than 5 years ago, and subsequent removal of the devnode callback in struct block_device_operations as no users are now left (Greg) - Fix for partition read on an exclusively opened bdev (Jan) - Series of elevator API cleanups (Jinlong, Christoph) - Series of fixes and cleanups for blk-iocost (Kemeng) - Series of fixes and cleanups for blk-throttle (Kemeng) - Series adding concurrent support for sync queues in BFQ (Yu) - Series bringing drbd a bit closer to the out-of-tree maintained version (Christian, Joel, Lars, Philipp) - Misc drbd fixes (Wang) - blk-wbt fixes and tweaks for enable/disable (Yu) - Fixes for mq-deadline for zoned devices (Damien) - Add support for read-only and offline zones for null_blk (Shin'ichiro) - Series fixing the delayed holder tracking, as used by DM (Yu, Christoph) - Series enabling bio alloc caching for IRQ based IO (Pavel) - Series enabling userspace peer-to-peer DMA (Logan) - BFQ waker fixes (Khazhismel) - Series fixing elevator refcount issues (Christoph, Jinlong) - Series cleaning up references around queue destruction (Christoph) - Series doing quiesce by tagset, enabling cleanups in drivers (Christoph, Chao) - Series untangling the queue kobject and queue references (Christoph) - Misc fixes and cleanups (Bart, David, Dawei, Jinlong, Kemeng, Ye, Yang, Waiman, Shin'ichiro, Randy, Pankaj, Christoph) * tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linux: (247 commits) blktrace: Fix output non-blktrace event when blk_classic option enabled block: sed-opal: Don't include <linux/kernel.h> sed-opal: allow using IOC_OPAL_SAVE for locking too blk-cgroup: Fix typo in comment block: remove bio_set_op_attrs nvmet: don't open-code NVME_NS_ATTR_RO enumeration nvme-pci: use the tagset alloc/free helpers nvme: add the Apple shared tag workaround to nvme_alloc_io_tag_set nvme: only set reserved_tags in nvme_alloc_io_tag_set for fabrics controllers nvme: consolidate setting the tagset flags nvme: pass nr_maps explicitly to nvme_alloc_io_tag_set block: bio_copy_data_iter nvme-pci: split out a nvme_pci_ctrl_is_dead helper nvme-pci: return early on ctrl state mismatch in nvme_reset_work nvme-pci: rename nvme_disable_io_queues nvme-pci: cleanup nvme_suspend_queue nvme-pci: remove nvme_pci_disable nvme-pci: remove nvme_disable_admin_queue nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl nvme: use nvme_wait_ready in nvme_shutdown_ctrl ...
2022-11-25iov_iter: saner checks for attempt to copy to/from iteratorAl Viro1-14/+17
instead of "don't do it to ITER_PIPE" check for ->data_source being false on copying from iterator. Check for !->data_source for copying to iterator, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25csum_and_copy_to_iter(): handle ITER_DISCARDAl Viro1-2/+6
Not hard to implement - we are not copying anything here, so csum_and_memcpy() is not usable, but calculating a checksum of source directly is trivial... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25get rid of unlikely() on page_copy_sane() callsAl Viro1-6/+5
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-09iov_iter: introduce iov_iter_get_pages_[alloc_]flags()Logan Gunthorpe1-8/+24
Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags() which take a flags argument that is passed to get_user_pages_fast(). This is so that FOLL_PCI_P2PDMA can be passed when appropriate. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221021174116.7200-4-logang@deltatee.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-03instrumented.h: allow instrumenting both sides of copy_from_user()Alexander Potapenko1-3/+6
Introduce instrument_copy_from_user_before() and instrument_copy_from_user_after() hooks to be invoked before and after the call to copy_from_user(). KASAN and KCSAN will be only using instrument_copy_from_user_before(), but for KMSAN we'll need to insert code after copy_from_user(). Link: https://lkml.kernel.org/r/20220915150417.722975-4-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08fix copy_page_from_iter() for compound destinationsAl Viro1-4/+18
had been broken for ITER_BVEC et.al. since ever (OK, v3.17 when ITER_BVEC had first appeared)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08copy_page_to_iter(): don't split high-order page in case of ITER_PIPEAl Viro1-15/+6
... just shove it into one pipe_buffer. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08expand those iov_iter_advance()...Al Viro1-2/+9
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08pipe_get_pages(): switch to append_pipe()Al Viro1-29/+6
now that we are advancing the iterator, there's no need to treat the first page separately - just call append_pipe() in a loop. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08get rid of non-advancing variantsAl Viro1-9/+18
mechanical change; will be further massaged in subsequent commits Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08iov_iter: saner helper for page array allocationAl Viro1-45/+32
All call sites of get_pages_array() are essenitally identical now. Replace with common helper... Returns number of slots available in resulting array or 0 on OOM; it's up to the caller to make sure it doesn't ask to zero-entry array (i.e. neither maxpages nor size are allowed to be zero). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08fold __pipe_get_pages() into pipe_get_pages()Al Viro1-37/+38
... and don't mangle maxsize there - turn the loop into counting one instead. Easier to see that we won't run out of array that way. Note that special treatment of the partial buffer in that thing is an artifact of the non-advancing semantics of iov_iter_get_pages() - if not for that, it would be append_pipe(), same as the body of the loop that follows it. IOW, once we make iov_iter_get_pages() advancing, the whole thing will turn into calculate how many pages do we want allocate an array (if needed) call append_pipe() that many times. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08ITER_XARRAY: don't open-code DIV_ROUND_UP()Al Viro1-9/+1
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() gutsAl Viro1-59/+27
same as for pipes and xarrays; after that iov_iter_get_pages() becomes a wrapper for __iov_iter_get_pages_alloc(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08unify xarray_get_pages() and xarray_get_pages_alloc()Al Viro1-39/+10
same as for pipes Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08unify pipe_get_pages() and pipe_get_pages_alloc()Al Viro1-32/+17
The differences between those two are * pipe_get_pages() gets a non-NULL struct page ** value pointing to preallocated array + array size. * pipe_get_pages_alloc() gets an address of struct page ** variable that contains NULL, allocates the array and (on success) stores its address in that variable. Not hard to combine - always pass struct page ***, have the previous pipe_get_pages_alloc() caller pass ~0U as cap for array size. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08iov_iter_get_pages(): sanity-check argumentsAl Viro1-7/+2
zero maxpages is bogus, but best treated as "just return 0"; NULL pages, OTOH, should be treated as a hard bug. get rid of now completely useless checks in xarray_get_pages{,_alloc}(). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into ↵Al Viro1-16/+22
wrapper Incidentally, ITER_XARRAY did *not* free the sucker in case when iter_xarray_populate_pages() returned 0... Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08ITER_PIPE: fold data_start() and pipe_space_for_user() togetherAl Viro1-25/+19
All their callers are next to each other; all of them want the total amount of pages and, possibly, the offset in the partial final buffer. Combine into a new helper (pipe_npages()), fix the bogosity in pipe_space_for_user(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08ITER_PIPE: cache the type of last bufferAl Viro1-39/+38
We often need to find whether the last buffer is anon or not, and currently it's rather clumsy: check if ->iov_offset is non-zero (i.e. that pipe is not empty) if so, get the corresponding pipe_buffer and check its ->ops if it's &default_pipe_buf_ops, we have an anon buffer. Let's replace the use of ->iov_offset (which is nowhere near similar to its role for other flavours) with signed field (->last_offset), with the following rules: empty, no buffers occupied: 0 anon, with bytes up to N-1 filled: N zero-copy, with bytes up to N-1 filled: -N That way abs(i->last_offset) is equal to what used to be in i->iov_offset and empty vs. anon vs. zero-copy can be distinguished by the sign of i->last_offset. Checks for "should we extend the last buffer or should we start a new one?" become easier to follow that way. Note that most of the operations can only be done in a sane state - i.e. when the pipe has nothing past the current position of iterator. About the only thing that could be done outside of that state is iov_iter_advance(), which transitions to the sane state by truncating the pipe. There are only two cases where we leave the sane state: 1) iov_iter_get_pages()/iov_iter_get_pages_alloc(). Will be dealt with later, when we make get_pages advancing - the callers are actually happier that way. 2) iov_iter copied, then something is put into the copy. Since they share the underlying pipe, the original gets behind. When we decide that we are done with the copy (original is not usable until then) we advance the original. direct_io used to be done that way; nowadays it operates on the original and we do iov_iter_revert() to discard the excessive data. At the moment there's nothing in the kernel that could do that to ITER_PIPE iterators, so this reason for insane state is theoretical right now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08ITER_PIPE: clean iov_iter_revert()Al Viro1-46/+14
Fold pipe_truncate() into it, clean up. We can release buffers in the same loop where we walk backwards to the iterator beginning looking for the place where the new position will be. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08ITER_PIPE: clean pipe_advance() upAl Viro1-17/+17
instead of setting ->iov_offset for new position and calling pipe_truncate() to adjust ->len of the last buffer and discard everything after it, adjust ->len at the same time we set ->iov_offset and use pipe_discard_from() to deal with buffers past that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08ITER_PIPE: lose iter_head argument of __pipe_get_pages()Al Viro1-4/+3
it's only used to get to the partial buffer we can add to, and that's always the last one, i.e. pipe->head - 1. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08ITER_PIPE: fold push_pipe() into __pipe_get_pages()Al Viro1-55/+25
Expand the only remaining call of push_pipe() (in __pipe_get_pages()), combine it with the page-collecting loop there. Note that the only reason it's not a loop doing append_pipe() is that append_pipe() is advancing, while iov_iter_get_pages() is not. As soon as it switches to saner semantics, this thing will switch to using append_pipe(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08ITER_PIPE: allocate buffers as we go in copy-to-pipe primitivesAl Viro1-73/+98
New helper: append_pipe(). Extends the last buffer if possible, allocates a new one otherwise. Returns page and offset in it on success, NULL on failure. iov_iter is advanced past the data we've got. Use that instead of push_pipe() in copy-to-pipe primitives; they get simpler that way. Handling of short copy (in "mc" one) is done simply by iov_iter_revert() - iov_iter is in consistent state after that one, so we can use that. [Fix for braino caught by Liu Xinpeng <liuxp11@chinatelecom.cn> folded in] [another braino fix, this time in copy_pipe_to_iter() and pipe_zero(); caught by testcase from Hugh Dickins] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08ITER_PIPE: helpers for adding pipe buffersAl Viro1-42/+46
There are only two kinds of pipe_buffer in the area used by ITER_PIPE. 1) anonymous - copy_to_iter() et.al. end up creating those and copying data there. They have zero ->offset, and their ->ops points to default_pipe_page_ops. 2) zero-copy ones - those come from copy_page_to_iter(), and page comes from caller. ->offset is also caller-supplied - it might be non-zero. ->ops points to page_cache_pipe_buf_ops. Move creation and insertion of those into helpers - push_anon(pipe, size) and push_page(pipe, page, offset, size) resp., separating them from the "could we avoid creating a new buffer by merging with the current head?" logics. Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08ITER_PIPE: helper for getting pipe buffer by indexAl Viro1-6/+9
pipe_buffer instances of a pipe are organized as a ring buffer, with power-of-2 size. Indices are kept *not* reduced modulo ring size, so the buffer refered to by index N is pipe->bufs[N & (pipe->ring_size - 1)]. Ring size can change over the lifetime of a pipe, but not while the pipe is locked. So for any iov_iter primitives it's a constant. Original conversion of pipes to this layout went overboard trying to microoptimize that - calculating pipe->ring_size - 1, storing it in a local variable and using through the function. In some cases it might be warranted, but most of the times it only obfuscates what's going on in there. Introduce a helper (pipe_buf(pipe, N)) that would encapsulate that and use it in the obvious cases. More will follow... Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08new iov_iter flavour - ITER_UBUFAl Viro1-18/+69
Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(), checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC ones. We are going to expose the things like ->write_iter() et.al. to those in subsequent commits. New predicate (user_backed_iter()) that is true for ITER_IOVEC and ITER_UBUF; places like direct-IO handling should use that for checking that pages we modify after getting them from iov_iter_get_pages() would need to be dirtied. DO NOT assume that replacing iter_is_iovec() with user_backed_iter() will solve all problems - there's code that uses iter_is_iovec() to decide how to poke around in iov_iter guts and for that the predicate replacement obviously won't suffice. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-03Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-4/+11
Pull copy_to_iter_mc fix from Al Viro: "Backportable fix for copy_to_iter_mc() - the second part of iov_iter work will pretty much overwrite this, but would be much harder to backport" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix short copy handling in copy_mc_pipe_to_iter()
2022-08-03Merge tag 'pull-work.iov_iter-base' of ↵Linus Torvalds1-231/+52
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs iov_iter updates from Al Viro: "Part 1 - isolated cleanups and optimizations. One of the goals is to reduce the overhead of using ->read_iter() and ->write_iter() instead of ->read()/->write(). new_sync_{read,write}() has a surprising amount of overhead, in particular inside iocb_flags(). That's the explanation for the beginning of the series is in this pile; it's not directly iov_iter-related, but it's a part of the same work..." * tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: first_iovec_segment(): just return address iov_iter: massage calling conventions for first_{iovec,bvec}_segment() iov_iter: first_{iovec,bvec}_segment() - simplify a bit iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment() iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT iov_iter_bvec_advance(): don't bother with bvec_iter copy_page_{to,from}_iter(): switch iovec variants to generic keep iocb_flags() result cached in struct file iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC struct file: use anonymous union member for rcuhead and llist btrfs: use IOMAP_DIO_NOSYNC teach iomap_dio_rw() to suppress dsync No need of likely/unlikely on calls of check_copy_size()
2022-07-06first_iovec_segment(): just return addressAl Viro1-7/+8
... and calculate the offset in the caller Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-07-06iov_iter: massage calling conventions for first_{iovec,bvec}_segment()Al Viro1-24/+18
Pass maxsize by reference, return length via the same. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-07-06iov_iter: first_{iovec,bvec}_segment() - simplify a bitAl Viro1-16/+16
We return length + offset in page via *size. Don't bother - the caller can do that arithmetics just as well; just report the length to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-07-06iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment()Al Viro1-10/+10
caller can do that just as easily Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-07-06iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNTAl Viro1-0/+4
All callers can and should handle iov_iter_get_pages() returning fewer pages than requested. All in-kernel ones do. And it makes the arithmetical overflow analysis much simpler... Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-07-06iov_iter_bvec_advance(): don't bother with bvec_iterAl Viro1-9/+14
do what we do for iovec/kvec; that ends up generating better code, AFAICS. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-06-28fix short copy handling in copy_mc_pipe_to_iter()Al Viro1-4/+11
Unlike other copying operations on ITER_PIPE, copy_mc_to_iter() can result in a short copy. In that case we need to trim the unused buffers, as well as the length of partially filled one - it's not enough to set ->head, ->iov_offset and ->count to reflect how much had we copied. Not hard to fix, fortunately... I'd put a helper (pipe_discard_from(pipe, head)) into pipe_fs_i.h, rather than iov_iter.c - it has nothing to do with iov_iter and having it will allow us to avoid an ugly kludge in fs/splice.c. We could put it into lib/iov_iter.c for now and move it later, but I don't see the point going that way... Cc: stable@kernel.org # 4.19+ Fixes: ca146f6f091e "lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe()" Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-06-28copy_page_{to,from}_iter(): switch iovec variants to genericAl Viro1-187/+4
we can do copyin/copyout under kmap_local_page(); it shouldn't overflow the kmap stack - the maximal footprint increase only by one here. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-06-27iov: introduce iov_iter_alignedKeith Busch1-0/+92
The existing iov_iter_alignment() function returns the logical OR of address and length. For cases where address and length need to be considered separately, introduce a helper function that a caller can specificy length and address masks that indicate if the iov is unaligned. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220610195830.3574005-9-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-11iov_iter: fix build issue due to possible type mis-matchLinus Torvalds1-2/+2
Commit 6c77676645ad ("iov_iter: Fix iter_xarray_get_pages{,_alloc}()") introduced a problem on some 32-bit architectures (at least arm, xtensa, csky,sparc and mips), that have a 'size_t' that is 'unsigned int'. The reason is that we now do min(nr * PAGE_SIZE - offset, maxsize); where 'nr' and 'offset' and both 'unsigned int', and PAGE_SIZE is 'unsigned long'. As a result, the normal C type rules means that the first argument to 'min()' ends up being 'unsigned long'. In contrast, 'maxsize' is of type 'size_t'. Now, 'size_t' and 'unsigned long' are always the same physical type in the kernel, so you'd think this doesn't matter, and from an actual arithmetic standpoint it doesn't. But on 32-bit architectures 'size_t' is commonly 'unsigned int', even if it could also be 'unsigned long'. In that situation, both are unsigned 32-bit types, but they are not the *same* type. And as a result 'min()' will complain about the distinct types (ignore the "pointer types" part of the error message: that's an artifact of the way we have made 'min()' check types for being the same): lib/iov_iter.c: In function 'iter_xarray_get_pages': include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror] 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^~ lib/iov_iter.c:1464:16: note: in expansion of macro 'min' 1464 | return min(nr * PAGE_SIZE - offset, maxsize); | ^~~ This was not visible on 64-bit architectures (where we always define 'size_t' to be 'unsigned long'). Force these cases to use 'min_t(size_t, x, y)' to make the type explicit and avoid the issue. [ Nit-picky note: technically 'size_t' doesn't have to match 'unsigned long' arithmetically. We've certainly historically seen environments with 16-bit address spaces and 32-bit 'unsigned long'. Similarly, even in 64-bit modern environments, 'size_t' could be its own type distinct from 'unsigned long', even if it were arithmetically identical. So the above type commentary is only really descriptive of the kernel environment, not some kind of universal truth for the kinds of wild and crazy situations that are allowed by the C standard ] Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: https://lore.kernel.org/all/YqRyL2sIqQNDfky2@debian/ Cc: Jeff Layton <jlayton@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-10iov_iter: Fix iter_xarray_get_pages{,_alloc}()David Howells1-16/+4
The maths at the end of iter_xarray_get_pages() to calculate the actual size doesn't work under some circumstances, such as when it's been asked to extract a partial single page. Various terms of the equation cancel out and you end up with actual == offset. The same issue exists in iter_xarray_get_pages_alloc(). Fix these to just use min() to select the lesser amount from between the amount of page content transcribed into the buffer, minus the offset, and the size limit specified. This doesn't appear to have caused a problem yet upstream because network filesystems aren't getting the pages from an xarray iterator, but rather passing it directly to the socket, which just iterates over it. Cachefiles *does* do DIO from one to/from ext4/xfs/btrfs/etc. but it always asks for whole pages to be written or read. Fixes: 7ff5062079ef ("iov_iter: Add ITER_XARRAY") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Mike Marshall <hubcap@omnibond.com> cc: Gao Xiang <xiang@kernel.org> cc: linux-afs@lists.infradead.org cc: v9fs-developer@lists.sourceforge.net cc: devel@lists.orangefs.org cc: linux-erofs@lists.ozlabs.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-02-21lib/iov_iter: initialize "flags" in new pipe_bufferMax Kellermann1-0/+2
The functions copy_page_to_iter_pipe() and push_pipe() can both allocate a new pipe_buffer, but the "flags" member initializer is missing. Fixes: 241699cd72a8 ("new iov_iter flavour: pipe-backed") To: Alexander Viro <viro@zeniv.linux.org.uk> To: linux-fsdevel@vger.kernel.org To: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-01-04iov_iter: Convert iter_xarray to use foliosMatthew Wilcox (Oracle)1-16/+14
Take advantage of how kmap_local_folio() works to simplify the loop. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2021-10-24iov_iter: Introduce nofault flag to disable page faultsAndreas Gruenbacher1-5/+15
Introduce a new nofault flag to indicate to iov_iter_get_pages not to fault in user pages. This is implemented by passing the FOLL_NOFAULT flag to get_user_pages, which causes get_user_pages to fail when it would otherwise fault in a page. We'll use the ->nofault flag to prevent iomap_dio_rw from faulting in pages when page faults are not allowed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-20iov_iter: Introduce fault_in_iov_iter_writeableAndreas Gruenbacher1-0/+39
Introduce a new fault_in_iov_iter_writeable helper for safely faulting in an iterator for writing. Uses get_user_pages() to fault in the pages without actually writing to them, which would be destructive. We'll use fault_in_iov_iter_writeable in gfs2 once we've determined that the iterator passed to .read_iter isn't in memory. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-18iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readableAndreas Gruenbacher1-12/+21
Turn iov_iter_fault_in_readable into a function that returns the number of bytes not faulted in, similar to copy_to_user, instead of returning a non-zero value when any of the requested pages couldn't be faulted in. This supports the existing users that require all pages to be faulted in as well as new users that are happy if any pages can be faulted in. Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make sure this change doesn't silently break things. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-18gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable}Andreas Gruenbacher1-6/+4
Turn fault_in_pages_{readable,writeable} into versions that return the number of bytes not faulted in, similar to copy_to_user, instead of returning a non-zero value when any of the requested pages couldn't be faulted in. This supports the existing users that require all pages to be faulted in as well as new users that are happy if any pages can be faulted in. Rename the functions to fault_in_{readable,writeable} to make sure this change doesn't silently break things. Neither of these functions is entirely trivial and it doesn't seem useful to inline them, so move them to mm/gup.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-12iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return valueAndreas Gruenbacher1-2/+3
Both iov_iter_get_pages and iov_iter_get_pages_alloc return the number of bytes of the iovec they could get the pages for. When they cannot get any pages, they're supposed to return 0, but when the start of the iovec isn't page aligned, the calculation goes wrong and they return a negative value. Fix both functions. In addition, change iov_iter_get_pages_alloc to return NULL in that case to prevent resource leaks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-09-14iov_iter: add helper to save iov_iter stateJens Axboe1-0/+36
In an ideal world, when someone is passed an iov_iter and returns X bytes, then X bytes would have been consumed/advanced from the iov_iter. But we have use cases that always consume the entire iterator, a few examples of that are iomap and bdev O_DIRECT. This means we cannot rely on the state of the iov_iter once we've called ->read_iter() or ->write_iter(). This would be easier if we didn't always have to deal with truncate of the iov_iter, as rewinding would be trivial without that. We recently added a commit to track the truncate state, but that grew the iov_iter by 8 bytes and wasn't the best solution. Implement a helper to save enough of the iov_iter state to sanely restore it after we've called the read/write iterator helpers. This currently only works for IOVEC/BVEC/KVEC as that's all we need, support for other iterator types are left as an exercise for the reader. Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>