summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2018-10-30ocfs2: support partial clone range and dedupe rangeDarrick J. Wong3-45/+46
Change the ocfs2 remap code to allow for returning partial results. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30ocfs2: fix pagecache truncation prior to reflinkDarrick J. Wong1-2/+3
Prior to remapping blocks, it is necessary to remove pages from the destination file's page cache. Unfortunately, the truncation is not aggressive enough -- if page size > block size, we'll end up zeroing subpage blocks instead of removing them. So, round the start offset down and the end offset up to page boundaries. We already wrote all the dirty data so the larger range should be fine. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30ocfs2: truncate page cache for clone destination file before remappingDarrick J. Wong1-6/+4
When cloning blocks into another file, truncate the page cache before we start remapping blocks so that concurrent reads wait for us to finish. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: clean up generic_remap_file_range_prep return valueDarrick J. Wong3-6/+6
Since the remap prep function can update the length of the remap request, we can change this function to return the usual return status instead of the odd behavior it has now. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: hide file range comparison functionDarrick J. Wong1-96/+91
There are no callers of vfs_dedupe_file_range_compare, so we might as well make it a static helper and remove the export. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: enable remap callers that can handle short operationsDarrick J. Wong1-8/+20
Plumb in a remap flag that enables the filesystem remap handler to shorten remapping requests for callers that can handle it. Now copy_file_range can report partial success (in case we run up against alignment problems, resource limits, etc.). We also enable CAN_SHORTEN for fideduperange to maintain existing userspace-visible behavior where xfs/btrfs shorten the dedupe range to avoid stale post-eof data exposure. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: plumb remap flags through the vfs dedupe functionsDarrick J. Wong2-4/+8
Plumb a remap_flags argument through the vfs_dedupe_file_range_one functions so that dedupe can take advantage of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: plumb remap flags through the vfs clone functionsDarrick J. Wong5-10/+15
Plumb a remap_flags argument through the {do,vfs}_clone_file_range functions so that clone can take advantage of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: make remap_file_range functions take and return bytes completedDarrick J. Wong15-64/+87
Change the remap_file_range functions to take a number of bytes to operate upon and return the number of bytes they operated on. This is a requirement for allowing fs implementations to return short clone/dedupe results to the user, which will enable us to obey resource limits in a graceful manner. A subsequent patch will enable copy_file_range to signal to the ->clone_file_range implementation that it can handle a short length, which will be returned in the function's return value. For now the short return is not implemented anywhere so the behavior won't change -- either copy_file_range manages to clone the entire range or it tries an alternative. Neither clone ioctl can take advantage of this, alas. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: remap helper should update destination inode metadataDarrick J. Wong2-23/+19
Extend generic_remap_file_range_prep to handle inode metadata updates when remapping into a file. If the operation can possibly alter the file contents, we must update the ctime and mtime and remove security privileges, just like we do for regular file writes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: pass remap flags to generic_remap_checksDarrick J. Wong1-1/+1
Pass the same remap flags to generic_remap_checks for consistency. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: pass remap flags to generic_remap_file_range_prepDarrick J. Wong7-23/+25
Plumb the remap flags through the filesystem from the vfs function dispatcher all the way to the prep function to prepare for behavior changes in subsequent patches. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: combine the clone and dedupe into a single remap_file_rangeDarrick J. Wong9-95/+88
Combine the clone_file_range and dedupe_file_range operations into a single remap_file_range file operation dispatch since they're fundamentally the same operation. The differences between the two can be made in the prep functions. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: rename clone_verify_area to remap_verify_areaDarrick J. Wong1-5/+5
Since we use clone_verify_area for both clone and dedupe range checks, rename the function to make it clear that it's for both. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: rename vfs_clone_file_prep to be more descriptiveDarrick J. Wong3-6/+6
The vfs_clone_file_prep is a generic function to be called by filesystem implementations only. Rename the prefix to generic_ and make it more clear that it applies to remap operations, not just clones. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: skip zero-length dedupe requestsDarrick J. Wong1-0/+5
Don't bother calling the filesystem for a zero-length dedupe request; we can return zero and exit. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: avoid problematic remapping requests into partial EOF blockDarrick J. Wong1-0/+33
A deduplication data corruption is exposed in XFS and btrfs. It is caused by extending the block match range to include the partial EOF block, but then allowing unknown data beyond EOF to be considered a "match" to data in the destination file because the comparison is only made to the end of the source file. This corrupts the destination file when the source extent is shared with it. The VFS remapping prep functions only support whole block dedupe, but we still need to appear to support whole file dedupe correctly. Hence if the dedupe request includes the last block of the souce file, don't include it in the actual dedupe operation. If the rest of the range dedupes successfully, then reject the entire request. A subsequent patch will enable us to shorten dedupe requests correctly. When reflinking sub-file ranges, a data corruption can occur when the source file range includes a partial EOF block. This shares the unknown data beyond EOF into the second file at a position inside EOF, exposing stale data in the second file. If the reflink request includes the last block of the souce file, only proceed with the reflink operation if it lands at or past the destination file's current EOF. If it lands within the destination file EOF, reject the entire request with -EINVAL and make the caller go the hard way. A subsequent patch will enable us to shorten reflink requests correctly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: exit early from zero length remap operationsDarrick J. Wong1-0/+2
If a remap caller asks us to remap to the source file's EOF and the source file length leaves us with a zero byte request, exit early. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: check file ranges before cloning filesDarrick J. Wong3-44/+15
Move the file range checks from vfs_clone_file_prep into a separate generic_remap_checks function so that all the checks are collected in a central location. This forms the basis for adding more checks from generic_write_checks that will make cloning's input checking more consistent with write input checking. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: vfs_clone_file_prep_inodes should return EINVAL for a clone from beyond EOFDarrick J. Wong1-3/+0
vfs_clone_file_prep_inodes cannot return 0 if it is asked to remap from a zero byte file because that's what btrfs does. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18fscache: Fix out of bound read in long cookie keysEric Sandeen1-3/+7
fscache_set_key() can incur an out-of-bounds read, reported by KASAN: BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache] Read of size 4 at addr ffff88084ff056d4 by task mount.nfs/32615 and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236 BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline] BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171 Read of size 4 at addr ffff8801d3cc8bb4 by task syz-executor907/4466 This happens for any index_key_len which is not divisible by 4 and is larger than the size of the inline key, because the code allocates exactly index_key_len for the key buffer, but the hashing loop is stepping through it 4 bytes (u32) at a time in the buf[] array. Fix this by calculating how many u32 buffers we'll need by using DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation buffer to hold the index_key, then using that same count as the hashing index limit. Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies") Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18fscache: Fix incomplete initialisation of inline key spaceDavid Howells3-23/+5
The inline key in struct rxrpc_cookie is insufficiently initialized, zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15 bytes will end up hashing uninitialized memory because the memcpy only partially fills the last buf[] element. Fix this by clearing fscache_cookie objects on allocation rather than using the slab constructor to initialise them. We're going to pretty much fill in the entire struct anyway, so bringing it into our dcache writably shouldn't incur much overhead. This removes the need to do clearance in fscache_set_key() (where we aren't doing it correctly anyway). Also, we don't need to set cookie->key_len in fscache_set_key() as we already did it in the only caller, so remove that. Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies") Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com Reported-by: Eric Sandeen <sandeen@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18cachefiles: fix the race between cachefiles_bury_object() and rmdir(2)Al Viro1-1/+1
the victim might've been rmdir'ed just before the lock_rename(); unlike the normal callers, we do not look the source up after the parents are locked - we know it beforehand and just recheck that it's still the child of what used to be its parent. Unfortunately, the check is too weak - we don't spot a dead directory since its ->d_parent is unchanged, dentry is positive, etc. So we sail all the way to ->rename(), with hosting filesystems _not_ expecting to be asked renaming an rmdir'ed subdirectory. The fix is easy, fortunately - the lock on parent is sufficient for making IS_DEADDIR() on child safe. Cc: stable@vger.kernel.org Fixes: 9ae326a69004 (CacheFiles: A cache that backs onto a mounted filesystem) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15afs: Fix clearance of replyDavid Howells2-4/+0
The recent patch to fix the afs_server struct leak didn't actually fix the bug, but rather fixed some of the symptoms. The problem is that an asynchronous call that holds a resource pointed to by call->reply[0] will find the pointer cleared in the call destructor, thereby preventing the resource from being cleaned up. In the case of the server record leak, the afs_fs_get_capabilities() function in devel code sets up a call with reply[0] pointing at the server record that should be altered when the result is obtained, but this was being cleared before the destructor was called, so the put in the destructor does nothing and the record is leaked. Commit f014ffb025c1 removed the additional ref obtained by afs_install_server(), but the removal of this ref is actually used by the garbage collector to mark a server record as being defunct after the record has expired through lack of use. The offending clearance of call->reply[0] upon completion in afs_process_async_call() has been there from the origin of the code, but none of the asynchronous calls actually use that pointer currently, so it should be safe to remove (note that synchronous calls don't involve this function). Fix this by the following means: (1) Revert commit f014ffb025c1. (2) Remove the clearance of reply[0] from afs_process_async_call(). Without this, afs_manage_servers() will suffer an assertion failure if it sees a server record that didn't get used because the usage count is not 1. Fixes: f014ffb025c1 ("afs: Fix afs_server struct leak") Fixes: 08e0e7c82eea ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.") Signed-off-by: David Howells <dhowells@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-14Merge tag 'libnvdimm-fixes-4.19-rc8' of ↵Greg Kroah-Hartman1-2/+11
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Dan writes: "libnvdimm/dax 4.19-rc8 * Fix a livelock in dax_layout_busy_page() present since v4.18. The lockup triggers when truncating an actively mapped huge page out of a mapping pinned for direct-I/O. * Fix mprotect() clobbers of _PAGE_DEVMAP. Broken since v4.5 mprotect() clears this flag that is needed to communicate the liveness of device pages to the get_user_pages() path." * tag 'libnvdimm-fixes-4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: mm: Preserve _PAGE_DEVMAP across mprotect() calls filesystem-dax: Fix dax_layout_busy_page() livelock
2018-10-13ubifs: Fix WARN_ON logic in exit pathRichard Weinberger1-2/+2
ubifs_assert() is not WARN_ON(), so we have to invert the checks. Randy faced this warning with UBIFS being a module, since most users use UBIFS as builtin because UBIFS is the rootfs nobody noticed so far. :-( Including me. Reported-by: Randy Dunlap <rdunlap@infradead.org> Fixes: 54169ddd382d ("ubifs: Turn two ubifs_assert() into a WARN_ON()") Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-13Merge branch 'akpm'Greg Kroah-Hartman2-0/+3
Fixes from Andrew: * akpm: fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters() mm/thp: fix call to mmu_notifier in set_pmd_migration_entry() v2 mm/mmap.c: don't clobber partially overlapping VMA with MAP_FIXED_NOREPLACE ocfs2: fix a GCC warning
2018-10-13fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters()Khazhismel Kumykov1-0/+1
On non-preempt kernels this loop can take a long time (more than 50 ticks) processing through entries. Link: http://lkml.kernel.org/r/20181010172623.57033-1-khazhy@google.com Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-13ocfs2: fix a GCC warningzhong jiang1-0/+2
Fix the following compile warning: fs/ocfs2/dlmglue.c:99:30: warning: ‘lockdep_keys’ defined but not used [-Wunused-variable] static struct lock_class_key lockdep_keys[OCFS2_NUM_LOCK_TYPES]; Link: http://lkml.kernel.org/r/1536938148-32110-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-13Merge tag 'gfs2-4.19.fixes3' of ↵Greg Kroah-Hartman1-5/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Andreas writes: "gfs2 4.19 fixes Fix iomap buffered write support for journaled files" * tag 'gfs2-4.19.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix iomap buffered write support for journaled files (2)
2018-10-12afs: Fix afs_server struct leakDavid Howells1-0/+2
Fix a leak of afs_server structs. The routine that installs them in the various lookup lists and trees gets a ref on leaving the function, whether it added the server or a server already exists. It shouldn't increment the refcount if it added the server. The effect of this that "rmmod kafs" will hang waiting for the leaked server to become unused. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-12gfs2: Fix iomap buffered write support for journaled files (2)Andreas Gruenbacher1-5/+1
It turns out that the fix in commit 6636c3cc56 is bad; the assertion that the iomap code no longer creates buffer heads is incorrect for filesystems that set the IOMAP_F_BUFFER_HEAD flag. Instead, what's happening is that gfs2_iomap_begin_write treats all files that have the jdata flag set as journaled files, which is incorrect as long as those files are inline ("stuffed"). We're handling stuffed files directly via the page cache, which is why we ended up with pages without buffer heads in gfs2_page_add_databufs. Fix this by handling stuffed journaled files correctly in gfs2_iomap_begin_write. This reverts commit 6636c3cc5690c11631e6366cf9a28fb99c8b25bb. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-10-12afs: Fix cell proc listDavid Howells5-10/+22
Access to the list of cells by /proc/net/afs/cells has a couple of problems: (1) It should be checking against SEQ_START_TOKEN for the keying the header line. (2) It's only holding the RCU read lock, so it can't just walk over the list without following the proper RCU methods. Fix these by using an hlist instead of an ordinary list and using the appropriate accessor functions to follow it with RCU. Since the code that adds a cell to the list must also necessarily change, sort the list on insertion whilst we're at it. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-11Merge tag 'xfs-fixes-for-4.19-rc7' of ↵Greg Kroah-Hartman1-35/+165
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Dave writes: "xfs: fixes for 4.19-rc7 Update for 4.19-rc7 to fix numerous file clone and deduplication issues." * tag 'xfs-fixes-for-4.19-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix data corruption w/ unaligned reflink ranges xfs: fix data corruption w/ unaligned dedupe ranges xfs: update ctime and remove suid before cloning files xfs: zero posteof blocks when cloning above eof xfs: refactor clonerange preparation into a separate helper
2018-10-09gfs2: Fix iomap buffered write support for journaled filesAndreas Gruenbacher1-0/+4
Commit 64bc06bb32ee broke buffered writes to journaled files (chattr +j): we'll try to journal the buffer heads of the page being written to in gfs2_iomap_journaled_page_done. However, the iomap code no longer creates buffer heads, so we'll BUG() in gfs2_page_add_databufs. Fix that by creating buffer heads ourself when needed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-10-08filesystem-dax: Fix dax_layout_busy_page() livelockDan Williams1-2/+11
In the presence of multi-order entries the typical pagevec_lookup_entries() pattern may loop forever: while (index < end && pagevec_lookup_entries(&pvec, mapping, index, min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) { ... for (i = 0; i < pagevec_count(&pvec); i++) { index = indices[i]; ... } index++; /* BUG */ } The loop updates 'index' for each index found and then increments to the next possible page to continue the lookup. However, if the last entry in the pagevec is multi-order then the next possible page index is more than 1 page away. Fix this locally for the filesystem-dax case by checking for dax-multi-order entries. Going forward new users of multi-order entries need to be similarly careful, or we need a generic way to report the page increment in the radix iterator. Fixes: 5fac7408d828 ("mm, fs, dax: handle layout changes to pinned dax...") Cc: <stable@vger.kernel.org> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-06xfs: fix data corruption w/ unaligned reflink rangesDave Chinner1-13/+34
When reflinking sub-file ranges, a data corruption can occur when the source file range includes a partial EOF block. This shares the unknown data beyond EOF into the second file at a position inside EOF, exposing stale data in the second file. XFS only supports whole block sharing, but we still need to support whole file reflink correctly. Hence if the reflink request includes the last block of the souce file, only proceed with the reflink operation if it lands at or past the destination file's current EOF. If it lands within the destination file EOF, reject the entire request with -EINVAL and make the caller go the hard way. This avoids the data corruption vector, but also avoids disruption of returning EINVAL to userspace for the common case of whole file cloning. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-06xfs: fix data corruption w/ unaligned dedupe rangesDave Chinner1-0/+21
A deduplication data corruption is Exposed by fstests generic/505 on XFS. It is caused by extending the block match range to include the partial EOF block, but then allowing unknown data beyond EOF to be considered a "match" to data in the destination file because the comparison is only made to the end of the source file. This corrupts the destination file when the source extent is shared with it. XFS only supports whole block dedupe, but we still need to appear to support whole file dedupe correctly. Hence if the dedupe request includes the last block of the souce file, don't include it in the actual XFS dedupe operation. If the rest of the range dedupes successfully, then report the partial last block as deduped, too, so that userspace sees it as a successful dedupe rather than return EINVAL because we can't dedupe unaligned blocks. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-05ocfs2: fix locking for res->tracking and dlm->tracking_listAshish Samant1-2/+2
In dlm_init_lockres() we access and modify res->tracking and dlm->tracking_list without holding dlm->track_lock. This can cause list corruptions and can end up in kernel panic. Fix this by locking res->tracking and dlm->tracking_list with dlm->track_lock instead of dlm->spinlock. Link: http://lkml.kernel.org/r/1529951192-4686-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Acked-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <ge.changwei@h3c.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05proc: restrict kernel stack dumps to rootJann Horn1-0/+14
Currently, you can use /proc/self/task/*/stack to cause a stack walk on a task you control while it is running on another CPU. That means that the stack can change under the stack walker. The stack walker does have guards against going completely off the rails and into random kernel memory, but it can interpret random data from your kernel stack as instruction pointers and stack pointers. This can cause exposure of kernel stack contents to userspace. Restrict the ability to inspect kernel stacks of arbitrary tasks to root in order to prevent a local attacker from exploiting racy stack unwinding to leak kernel task stack contents. See the added comment for a longer rationale. There don't seem to be any users of this userspace API that can't gracefully bail out if reading from the file fails. Therefore, I believe that this change is unlikely to break things. In the case that this patch does end up needing a revert, the next-best solution might be to fake a single-entry stack based on wchan. Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com Fixes: 2ec220e27f50 ("proc: add /proc/*/stack") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ken Chen <kenchen@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05ocfs2: fix crash in ocfs2_duplicate_clusters_by_page()Larry Chen1-4/+12
ocfs2_duplicate_clusters_by_page() may crash if one of the extent's pages is dirty. When a page has not been written back, it is still in dirty state. If ocfs2_duplicate_clusters_by_page() is called against the dirty page, the crash happens. To fix this bug, we can just unlock the page and wait until the page until its not dirty. The following is the backtrace: kernel BUG at /root/code/ocfs2/refcounttree.c:2961! [exception RIP: ocfs2_duplicate_clusters_by_page+822] __ocfs2_move_extent+0x80/0x450 [ocfs2] ? __ocfs2_claim_clusters+0x130/0x250 [ocfs2] ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2] __ocfs2_move_extents_range+0x2a4/0x470 [ocfs2] ocfs2_move_extents+0x180/0x3b0 [ocfs2] ? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2] ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2] ocfs2_ioctl+0x253/0x640 [ocfs2] do_vfs_ioctl+0x90/0x5f0 SyS_ioctl+0x74/0x80 do_syscall_64+0x74/0x140 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Once we find the page is dirty, we do not wait until it's clean, rather we use write_one_page() to write it back Link: http://lkml.kernel.org/r/20180829074740.9438-1-lchen@suse.com [lchen@suse.com: update comments] Link: http://lkml.kernel.org/r/20180830075041.14879-1-lchen@suse.com [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Larry Chen <lchen@suse.com> Acked-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05Merge tag '4.19-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Greg Kroah-Hartman4-6/+31
Steve writes: "SMB3 fixes four small SMB3 fixes: one for stable, the others to address a more recent regression" * tag '4.19-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: fix lease break problem introduced by compounding cifs: only wake the thread for the very last PDU in a compound cifs: add a warning if we try to to dequeue a deleted mid smb2: fix missing files in root share directory listing
2018-10-05xfs: update ctime and remove suid before cloning filesDarrick J. Wong1-0/+25
Before cloning into a file, update the ctime and remove sensitive attributes like suid, just like we'd do for a regular file write. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-05xfs: zero posteof blocks when cloning above eofDarrick J. Wong1-8/+25
When we're reflinking between two files and the destination file range is well beyond the destination file's EOF marker, zero any posteof speculative preallocations in the destination file so that we don't expose stale disk contents. The previous strategy of trying to clear the preallocations does not work if the destination file has the PREALLOC flag set. Uncovered by shared/010. Reported-by: Zorro Lang <zlang@redhat.com> Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=201259 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-05xfs: refactor clonerange preparation into a separate helperDarrick J. Wong1-27/+73
Refactor all the reflink preparation steps into a separate helper that we'll use to land all the upcoming fixes for insufficient input checks. This rework also moves the invalidation of the destination range to the prep function so that it is done before the range is remapped. This ensures that nobody can access the data in range being remapped until the remap is complete. [dgc: fix xfs_reflink_remap_prep() return value and caller check to handle vfs_clone_file_prep_inodes() returning 0 to mean "nothing to do". ] [dgc: make sure length changed by vfs_clone_file_prep_inodes() gets propagated back to XFS code that does the remapping. ] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-04Merge tag 'ovl-fixes-4.19-rc7' of ↵Greg Kroah-Hartman9-10/+27
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Miklos writes: "overlayfs fixes for 4.19-rc7 This update fixes a couple of regressions in the stacked file update added in this cycle, as well as some older bugs uncovered by syzkaller. There's also one trivial naming change that touches other parts of the fs subsystem." * tag 'ovl-fixes-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix format of setxattr debug ovl: fix access beyond unterminated strings ovl: make symbol 'ovl_aops' static vfs: swap names of {do,vfs}_clone_file_range() ovl: fix freeze protection bypass in ovl_clone_file_range() ovl: fix freeze protection bypass in ovl_write_iter() ovl: fix memory leak on unlink of indexed file
2018-10-04Merge tag 'xfs-fixes-for-4.19-rc6' of ↵Greg Kroah-Hartman18-264/+256
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Dave writes: "XFS fixes for 4.19-rc6 Accumlated regression and bug fixes for 4.19-rc6, including: o make iomap correctly mark dirty pages for sub-page block sizes o fix regression in handling extent-to-btree format conversion errors o fix torn log wrap detection for new logs o various corrupt inode detection fixes o various delalloc state fixes o cleanup all the missed transaction cancel cases missed from changes merged in 4.19-rc1 o fix lockdep false positive on transaction allocation o fix locking and reference counting on buffer log items" * tag 'xfs-fixes-for-4.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix error handling in xfs_bmap_extents_to_btree iomap: set page dirty after partial delalloc on mkwrite xfs: remove invalid log recovery first/last cycle check xfs: validate inode di_forkoff xfs: skip delalloc COW blocks in xfs_reflink_end_cow xfs: don't treat unknown di_flags2 as corruption in scrub xfs: remove duplicated include from alloc.c xfs: don't bring in extents in xfs_bmap_punch_delalloc_range xfs: fix transaction leak in xfs_reflink_allocate_cow() xfs: avoid lockdep false positives in xfs_trans_alloc xfs: refactor xfs_buf_log_item reference count handling xfs: clean up xfs_trans_brelse() xfs: don't unlock invalidated buf on aborted tx commit xfs: remove last of unnecessary xfs_defer_cancel() callers xfs: don't crash the vfs on a garbage inline symlink
2018-10-04ovl: fix format of setxattr debugMiklos Szeredi1-2/+2
Format has a typo: it was meant to be "%.*s", not "%*s". But at some point callers grew nonprintable values as well, so use "%*pE" instead with a maximized length. Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") Cc: <stable@vger.kernel.org> # v4.12
2018-10-04ovl: fix access beyond unterminated stringsAmir Goldstein1-1/+1
KASAN detected slab-out-of-bounds access in printk from overlayfs, because string format used %*s instead of %.*s. > BUG: KASAN: slab-out-of-bounds in string+0x298/0x2d0 lib/vsprintf.c:604 > Read of size 1 at addr ffff8801c36c66ba by task syz-executor2/27811 > > CPU: 0 PID: 27811 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #36 ... > printk+0xa7/0xcf kernel/printk/printk.c:1996 > ovl_lookup_index.cold.15+0xe8/0x1f8 fs/overlayfs/namei.c:689 Reported-by: syzbot+376cea2b0ef340db3dd4@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin") Cc: <stable@vger.kernel.org> # v4.13
2018-10-03Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsGreg Kroah-Hartman1-11/+13
Al writes: "xattrs regression fix from Andreas; sat in -next for quite a while." * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: sysfs: Do not return POSIX ACL xattrs via listxattr