summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-05-05hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wpPeter Xu1-0/+4
Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because userfaultfd-wp is always based on pgtable entries, so they cannot be shared. Walk the hugetlb range and unshare all such mappings if there is, right before UFFDIO_REGISTER will succeed and return to userspace. This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing is completely disabled for userfaultfd-wp registered range. Link: https://lkml.kernel.org/r/20210218231206.15524-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Price <steven.price@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05mm: remove nrexceptional from inode: remove BUG_ONHugh Dickins1-1/+8
clear_inode()'s BUG_ON(!mapping_empty(&inode->i_data)) is unsafe: we know of two ways in which nodes can and do (on rare occasions) get left behind. Until those are fixed, do not BUG_ON() nor even WARN_ON(). Yes, this will then leak those nodes (or the next user of the struct inode may use them); but this has been happening for years, and the new BUG_ON(!mapping_empty) was only guilty of revealing that. A proper fix will follow, but no hurry. Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2104292229380.16080@eggly.anvils Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05mm: remove nrexceptional from inodeMatthew Wilcox (Oracle)1-1/+1
We no longer track anything in nrexceptional, so remove it, saving 8 bytes per inode. Link: https://lkml.kernel.org/r/20201026151849.24232-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05dax: account DAX entries as nrpagesMatthew Wilcox (Oracle)1-3/+3
Simplify mapping_needs_writeback() by accounting DAX entries as pages instead of exceptional entries. Link: https://lkml.kernel.org/r/20201026151849.24232-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05mm: introduce and use mapping_empty()Matthew Wilcox (Oracle)3-4/+3
Patch series "Remove nrexceptional tracking", v2. We actually use nrexceptional for very little these days. It's a minor pain to keep in sync with nrpages, but the pain becomes much bigger with the THP patches because we don't know how many indices a shadow entry occupies. It's easier to just remove it than keep it accurate. Also, we save 8 bytes per inode which is nothing to sneeze at; on my laptop, it would improve shmem_inode_cache from 22 to 23 objects per 16kB, and inode_cache from 26 to 27 objects. Combined, that saves a megabyte of memory from a combined usage of 25MB for both caches. Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save any memory for ext4. This patch (of 4): Instead of checking the two counters (nrpages and nrexceptional), we can just check whether i_pages is empty. Link: https://lkml.kernel.org/r/20201026151849.24232-1-willy@infradead.org Link: https://lkml.kernel.org/r/20201026151849.24232-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-04Merge tag 'for-linus-5.13-rc1' of ↵Linus Torvalds6-12/+19
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull JFFS2, UBI and UBIFS updates from Richard Weinberger: "JFFS2: - Use splice_write() - Fix for a slab-out-of-bounds bug UBI: - Fix for clang related warnings - Code cleanup UBIFS: - Fix for inode rebirth at replay - Set s_uuid - Use zstd for default filesystem" * tag 'for-linus-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: Remove unnecessary struct declaration jffs2: Hook up splice_write callback jffs2: avoid Wempty-body warnings jffs2: Fix kasan slab-out-of-bounds problem ubi: Fix fall-through warnings for Clang ubifs: Report max LEB count at mount time ubifs: Set s_uuid in super block to support ima/evm uuid options ubifs: Default to zstd compression ubifs: Only check replay with inode type to judge if inode linked
2021-05-04Merge tag 'f2fs-for-5.13-rc1' of ↵Linus Torvalds24-219/+615
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we added a new mount option, "checkpoint_merge", which introduces a kernel thread dealing with the f2fs checkpoints. Once we start to manage the IO priority along with blk-cgroup, the checkpoint operation can be processed in a lower priority under the process context. Since the checkpoint holds all the filesystem operations, we give a higher priority to the checkpoint thread all the time. Enhancements: - introduce gc_merge mount option to introduce a checkpoint thread - improve to run discard thread efficiently - allow modular compression algorithms - expose # of overprivision segments to sysfs - expose runtime compression stat to sysfs Bug fixes: - fix OOB memory access by the node id lookup - avoid touching checkpointed data in the checkpoint-disabled mode - fix the resizing flow to avoid kernel panic and race conditions - fix block allocation issues on pinned files - address some swapfile issues - fix hugtask problem and kernel panic during atomic write operations - don't start checkpoint thread in RO And, we've cleaned up some kernel coding style and build warnings. In addition, we fixed some minor race conditions and error handling routines" * tag 'f2fs-for-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (48 commits) f2fs: drop inplace IO if fs status is abnormal f2fs: compress: remove unneed check condition f2fs: clean up left deprecated IO trace codes f2fs: avoid using native allocate_segment_by_default() f2fs: remove unnecessary struct declaration f2fs: fix to avoid NULL pointer dereference f2fs: avoid duplicated codes for cleanup f2fs: document: add description about compressed space handling f2fs: clean up build warnings f2fs: fix the periodic wakeups of discard thread f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block() f2fs: fix to avoid GC/mmap race with f2fs_truncate() f2fs: set checkpoint_merge by default f2fs: Fix a hungtask problem in atomic write f2fs: fix to restrict mount condition on readonly block device f2fs: introduce gc_merge mount option f2fs: fix to cover __allocate_new_section() with curseg_lock f2fs: fix wrong alloc_type in f2fs_do_replace_block f2fs: delete empty compress.h f2fs: fix a typo in inode.c ...
2021-05-04Merge tag 'm68knommu-for-v5.13' of ↵Linus Torvalds2-5/+16
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu updates from Greg Ungerer: - a fix for interrupt number range checking for the ColdFire SIMR interrupt controller. - changes for the binfmt_flat binary loader to allow RISC-V nommu support it needs to be able to accept flat binaries that have no gap between the text and data sections. * tag 'm68knommu-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: coldfire: fix irq ranges riscv: Disable data start offset in flat binaries binfmt_flat: allow not offsetting data start
2021-05-04fs/cifs: Fix resource leakKhaled ROMDHANI1-1/+5
The -EIO error return path is leaking memory allocated to page. Fix this by moving the allocation block after the check of cifs_forced_shutdown. Addresses-Coverity: ("Resource leak") Fixes: 087f757b0129 ("cifs: add shutdown support") Signed-off-by: Khaled ROMDHANI <khaledromdhani216@gmail.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-04Cifs: Fix kernel oops caused by deferred close for files.Rohith Surabattula4-5/+33
Fix regression issue caused by deferred close for files. Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-04cifs: fix regression when mounting shares with prefix pathsPaulo Alcantara3-13/+23
The commit 315db9a05b7a ("cifs: fix leak in cifs_smb3_do_mount() ctx") revealed an existing bug when mounting shares that contain a prefix path or DFS links. cifs_setup_volume_info() requires the @devname to contain the full path (UNC + prefix) to update the fs context with the new UNC and prepath values, however we were passing only the UNC path (old_ctx->UNC) in @device thus discarding any prefix paths. Instead of concatenating both old_ctx->{UNC,prepath} and pass it in @devname, just keep the dup'ed values of UNC and prepath in cifs_sb->ctx after calling smb3_fs_context_dup(), and fix smb3_parse_devname() to correctly parse and not leak the new UNC and prefix paths. Cc: <stable@vger.kernel.org> # v5.11+ Fixes: 315db9a05b7a ("cifs: fix leak in cifs_smb3_do_mount() ctx") Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Acked-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-04btrfs: initialize return variable in cleanup_free_space_cache_v1Tom Rix1-1/+1
Static analysis reports this problem free-space-cache.c:3965:2: warning: Undefined or garbage value returned return ret; ^~~~~~~~~~ ret is set in the node handling loop. Treat doing nothing as a success and initialize ret to 0, although it's unlikely the loop would be skipped. We always have block groups, but as it could lead to transaction abort in the caller it's better to be safe. CC: stable@vger.kernel.org # 5.12+ Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-04iomap: remove unused private field from ioendBrian Foster2-7/+2
The only remaining user of ->io_private is the generic ioend merging infrastructure. The only user of that is XFS, which no longer sets ->io_private or passes an associated merge callback. Remove the unused parameter and the ->io_private field. CC: linux-fsdevel@vger.kernel.org Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-05-04xfs: don't allow log writes if the data device is readonlyDarrick J. Wong1-4/+6
While running generic/050 with an external log, I observed this warning in dmesg: Trying to write to read-only block-device sda4 (partno 4) WARNING: CPU: 2 PID: 215677 at block/blk-core.c:704 submit_bio_checks+0x256/0x510 Call Trace: submit_bio_noacct+0x2c/0x430 _xfs_buf_ioapply+0x283/0x3c0 [xfs] __xfs_buf_submit+0x6a/0x210 [xfs] xfs_buf_delwri_submit_buffers+0xf8/0x270 [xfs] xfsaild+0x2db/0xc50 [xfs] kthread+0x14b/0x170 I think this happened because we tried to cover the log after a readonly mount, and the AIL tried to write the primary superblock to the data device. The test marks the data device readonly, but it doesn't do the same to the external log device. Therefore, XFS thinks that the log is writable, even though AIL writes whine to dmesg because the data device is read only. Fix this by amending xfs_log_writable to prevent writes when the AIL can't possible write anything into the filesystem. Note: As for the external log or the rt devices being readonly-- xfs_blkdev_get will complain about that if we aren't doing a norecovery mount. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-05-04btrfs: zoned: sanity check zone typeNaohiro Aota1-0/+5
The fstests test case generic/475 creates a dm-linear device that gets changed to a dm-error device. This leads to errors in loading the block group's zone information when running on a zoned file system, ultimately resulting in a list corruption. When running on a kernel with list debugging enabled this leads to the following crash. BTRFS: error (device dm-2) in cleanup_transaction:1953: errno=-5 IO failure kernel BUG at lib/list_debug.c:54! invalid opcode: 0000 [#1] SMP PTI CPU: 1 PID: 2433 Comm: umount Tainted: G W 5.12.0+ #1018 RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47 RSP: 0018:ffffc90001473df0 EFLAGS: 00010296 RAX: 0000000000000054 RBX: ffff8881038fd000 RCX: ffffc90001473c90 RDX: 0000000100001a31 RSI: 0000000000000003 RDI: 0000000000000003 RBP: ffff888308871108 R08: 0000000000000003 R09: 0000000000000001 R10: 3961373532383838 R11: 6666666620736177 R12: ffff888308871000 R13: ffff8881038fd088 R14: ffff8881038fdc78 R15: dead000000000100 FS: 00007f353c9b1540(0000) GS:ffff888627d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f353cc2c710 CR3: 000000018e13c000 CR4: 00000000000006a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_free_block_groups+0xc9/0x310 [btrfs] close_ctree+0x2ee/0x31a [btrfs] ? call_rcu+0x8f/0x270 ? mutex_lock+0x1c/0x40 generic_shutdown_super+0x67/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x90 cleanup_mnt+0x13e/0x1b0 task_work_run+0x63/0xb0 exit_to_user_mode_loop+0xd9/0xe0 exit_to_user_mode_prepare+0x3e/0x60 syscall_exit_to_user_mode+0x1d/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xae As dm-error has no support for zones, btrfs will run it's zone emulation mode on this device. The zone emulation mode emulates conventional zones, so bail out if the zone bitmap that gets populated on mount sees the zone as sequential while we're thinking it's a conventional zone when creating a block group. Note: this scenario is unlikely in a real wold application and can only happen by this (ab)use of device-mapper targets. CC: stable@vger.kernel.org # 5.12+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-04btrfs: fix unmountable seed device after fstrimAnand Jain1-1/+5
The following test case reproduces an issue of wrongly freeing in-use blocks on the readonly seed device when fstrim is called on the rw sprout device. As shown below. Create a seed device and add a sprout device to it: $ mkfs.btrfs -fq -dsingle -msingle /dev/loop0 $ btrfstune -S 1 /dev/loop0 $ mount /dev/loop0 /btrfs $ btrfs dev add -f /dev/loop1 /btrfs BTRFS info (device loop0): relocating block group 290455552 flags system BTRFS info (device loop0): relocating block group 1048576 flags system BTRFS info (device loop0): disk added /dev/loop1 $ umount /btrfs Mount the sprout device and run fstrim: $ mount /dev/loop1 /btrfs $ fstrim /btrfs $ umount /btrfs Now try to mount the seed device, and it fails: $ mount /dev/loop0 /btrfs mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error. Block 5292032 is missing on the readonly seed device: $ dmesg -kt | tail <snip> BTRFS error (device loop0): bad tree block start, want 5292032 have 0 BTRFS warning (device loop0): couldn't read-tree root BTRFS error (device loop0): open_ctree failed From the dump-tree of the seed device (taken before the fstrim). Block 5292032 belonged to the block group starting at 5242880: $ btrfs inspect dump-tree -e /dev/loop0 | grep -A1 BLOCK_GROUP <snip> item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24 block group used 114688 chunk_objectid 256 flags METADATA <snip> From the dump-tree of the sprout device (taken before the fstrim). fstrim used block-group 5242880 to find the related free space to free: $ btrfs inspect dump-tree -e /dev/loop1 | grep -A1 BLOCK_GROUP <snip> item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24 block group used 32768 chunk_objectid 256 flags METADATA <snip> BPF kernel tracing the fstrim command finds the missing block 5292032 within the range of the discarded blocks as below: kprobe:btrfs_discard_extent { printf("freeing start %llu end %llu num_bytes %llu:\n", arg1, arg1+arg2, arg2); } freeing start 5259264 end 5406720 num_bytes 147456 <snip> Fix this by avoiding the discard command to the readonly seed device. Reported-by: Chris Murphy <lists@colorremedies.com> CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-03Merge tag 'trace-v5.13' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New feature: - A new "func-no-repeats" option in tracefs/options directory. When set the function tracer will detect if the current function being traced is the same as the previous one, and instead of recording it, it will keep track of the number of times that the function is repeated in a row. And when another function is recorded, it will write a new event that shows the function that repeated, the number of times it repeated and the time stamp of when the last repeated function occurred. Enhancements: - In order to implement the above "func-no-repeats" option, the ring buffer timestamp can now give the accurate timestamp of the event as it is being recorded, instead of having to record an absolute timestamp for all events. This helps the histogram code which no longer needs to waste ring buffer space. - New validation logic to make sure all trace events that access dereferenced pointers do so in a safe way, and will warn otherwise. Fixes: - No longer limit the PIDs of tasks that are recorded for "saved_cmdlines" to PID_MAX_DEFAULT (32768), as systemd now allows for a much larger range. This caused the mapping of PIDs to the task names to be dropped for all tasks with a PID greater than 32768. - Change trace_clock_global() to never block. This caused a deadlock. Clean ups: - Typos, prototype fixes, and removing of duplicate or unused code. - Better management of ftrace_page allocations" * tag 'trace-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (32 commits) tracing: Restructure trace_clock_global() to never block tracing: Map all PIDs to command lines ftrace: Reuse the output of the function tracer for func_repeats tracing: Add "func_no_repeats" option for function tracing tracing: Unify the logic for function tracing options tracing: Add method for recording "func_repeats" events tracing: Add "last_func_repeats" to struct trace_array tracing: Define new ftrace event "func_repeats" tracing: Define static void trace_print_time() ftrace: Simplify the calculation of page number for ftrace_page->records some more ftrace: Store the order of pages allocated in ftrace_page tracing: Remove unused argument from "ring_buffer_time_stamp() tracing: Remove duplicate struct declaration in trace_events.h tracing: Update create_system_filter() kernel-doc comment tracing: A minor cleanup for create_system_filter() kernel: trace: Mundane typo fixes in the file trace_events_filter.c tracing: Fix various typos in comments scripts/recordmcount.pl: Make vim and emacs indent the same scripts/recordmcount.pl: Make indent spacing consistent tracing: Add a verifier to check string pointers for trace events ...
2021-05-03Merge branch 'work.file' of ↵Linus Torvalds1-20/+19
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull receive_fd update from Al Viro: "Cleanup of receive_fd mess" * 'work.file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: split receive_fd_replace from __receive_fd
2021-05-03cifs: use echo_interval even when connection not ready.Shyam Prasad N1-11/+1
When the tcp connection is not ready to send requests, we keep retrying echo with an interval of zero. This seems unnecessary, and this fix changes the interval between echoes to what is specified as echo_interval. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-03cifs: detect dead connections only when echoes are enabled.Shyam Prasad N1-0/+1
We can detect server unresponsiveness only if echoes are enabled. Echoes can be disabled under two scenarios: 1. The connection is low on credits, so we've disabled echoes/oplocks. 2. The connection has not seen any request till now (other than negotiate/sess-setup), which is when we enable these two, based on the credits available. So this fix will check for dead connection, only when echo is enabled. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> CC: <stable@vger.kernel.org> # v5.8+ Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-03smb3.1.1: allow dumping keys for multiuser mountsSteve French1-20/+46
When mounted multiuser it is hard to dump keys for the other sessions which makes it hard to debug using network traces (e.g. using wireshark). Suggested-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-03smb3.1.1: allow dumping GCM256 keys to improve debugging of encrypted sharesSteve French2-0/+52
Previously we were only able to dump CCM or GCM-128 keys (see "smbinfo keys" e.g.) to allow network debugging (e.g. wireshark) of mounts to SMB3.1.1 encrypted shares. But with the addition of GCM-256 support, we have to be able to dump 32 byte instead of 16 byte keys which requires adding an additional ioctl for that. Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-03cifs: add shutdown supportSteve French9-2/+121
Various filesystem support the shutdown ioctl which is used by various xfstests. The shutdown ioctl sets a flag on the superblock which prevents open, unlink, symlink, hardlink, rmdir, create etc. on the file system until unmount and remounted. The two flags supported in this patch are: FSOP_GOING_FLAGS_LOGFLUSH and FSOP_GOING_FLAGS_NOLOGFLUSH which require very little other than blocking new operations (since we do not cache writes to metadata on the client with cifs.ko). FSOP_GOING_FLAGS_DEFAULT is not supported yet, but could be added in the future but would need to call syncfs or equivalent to write out pending data on the mount. With this patch various xfstests now work including tests 043 through 046 for example. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2021-05-03cifs: Deferred close for filesRohith Surabattula6-3/+187
When file is closed, SMB2 close request is not sent to server immediately and is deferred for acregmax defined interval. When file is reopened by same process for read or write, the file handle is reused if an oplock is held. When client receives a oplock/lease break, file is closed immediately if reference count is zero, else oplock is downgraded. Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-02Merge tag 'for-linus-5.13-ofs-1' of ↵Linus Torvalds3-104/+54
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "orangefs: implement orangefs_readahead mm/readahead.c/read_pages was quite a bit different back when I put my open-coded readahead logic into orangefs_readpage. That logic seemed to work as designed back then, it is a trainwreck now. This implements orangefs_readahead using the new xarray and readahead_expand features and removes all my open-coded readahead logic. This results in an extreme read performance improvement, these sample numbers are from my test VM: Here's an example of what's upstream in 5.11.8-200.fc33.x86_64: 30+0 records in 30+0 records out 125829120 bytes (126 MB, 120 MiB) copied, 5.77943 s, 21.8 MB/s And here's this version of orangefs_readahead on top of 5.12.0-rc4: 30+0 records in 30+0 records out 125829120 bytes (126 MB, 120 MiB) copied, 0.325919 s, 386 MB/s There are four xfstest regressions with this patch. David Howells and Matthew Wilcox have been helping me work with this code" * tag 'for-linus-5.13-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: leave files in the page cache for a few micro seconds at least Orangef: implement orangefs_readahead.
2021-05-02Merge branch 'work.misc' of ↵Linus Torvalds15-85/+35
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff all over the place" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: useful constants: struct qstr for ".." hostfs_open(): don't open-code file_dentry() whack-a-mole: kill strlen_user() (again) autofs: should_expire() argument is guaranteed to be positive apparmor:match_mn() - constify devpath argument buffer: a small optimization in grow_buffers get rid of autofs_getpath() constify dentry argument of dentry_path()/dentry_path_raw()
2021-05-02Merge branch 'work.ecryptfs' of ↵Linus Torvalds2-105/+75
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull exryptfs updates from Al Viro: "The interesting part here is (ecryptfs) lock_parent() fixes - its treatment of ->d_parent had been very wrong. The rest is trivial cleanups" * 'work.ecryptfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ecryptfs: ecryptfs_dentry_info->crypt_stat is never used ecryptfs: get rid of unused accessors ecryptfs: saner API for lock_parent() ecryptfs: get rid of pointless dget/dput in ->symlink() and ->link()
2021-05-01Merge tag 'landlock_v34' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull Landlock LSM from James Morris: "Add Landlock, a new LSM from Mickaël Salaün. Briefly, Landlock provides for unprivileged application sandboxing. From Mickaël's cover letter: "The goal of Landlock is to enable to restrict ambient rights (e.g. global filesystem access) for a set of processes. Because Landlock is a stackable LSM [1], it makes possible to create safe security sandboxes as new security layers in addition to the existing system-wide access-controls. This kind of sandbox is expected to help mitigate the security impact of bugs or unexpected/malicious behaviors in user-space applications. Landlock empowers any process, including unprivileged ones, to securely restrict themselves. Landlock is inspired by seccomp-bpf but instead of filtering syscalls and their raw arguments, a Landlock rule can restrict the use of kernel objects like file hierarchies, according to the kernel semantic. Landlock also takes inspiration from other OS sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil. In this current form, Landlock misses some access-control features. This enables to minimize this patch series and ease review. This series still addresses multiple use cases, especially with the combined use of seccomp-bpf: applications with built-in sandboxing, init systems, security sandbox tools and security-oriented APIs [2]" The cover letter and v34 posting is here: https://lore.kernel.org/linux-security-module/20210422154123.13086-1-mic@digikod.net/ See also: https://landlock.io/ This code has had extensive design discussion and review over several years" Link: https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-ca.com/ [1] Link: https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.net/ [2] * tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: landlock: Enable user space to infer supported features landlock: Add user and kernel documentation samples/landlock: Add a sandbox manager example selftests/landlock: Add user space tests landlock: Add syscall implementations arch: Wire up Landlock syscalls fs,security: Add sb_delete hook landlock: Support filesystem access-control LSM: Infrastructure management of the superblock landlock: Add ptrace restrictions landlock: Set up the security framework and manage credentials landlock: Add ruleset and domain management landlock: Add object management
2021-05-01afs: Fix speculative status fetchesDavid Howells6-2/+23
The generic/464 xfstest causes kAFS to emit occasional warnings of the form: kAFS: vnode modified {100055:8a} 30->31 YFS.StoreData64 (c=6015) This indicates that the data version received back from the server did not match the expected value (the DV should be incremented monotonically for each individual modification op committed to a vnode). What is happening is that a lookup call is doing a bulk status fetch speculatively on a bunch of vnodes in a directory besides getting the status of the vnode it's actually interested in. This is racing with a StoreData operation (though it could also occur with, say, a MakeDir op). On the client, a modification operation locks the vnode, but the bulk status fetch only locks the parent directory, so no ordering is imposed there (thereby avoiding an avenue to deadlock). On the server, the StoreData op handler doesn't lock the vnode until it's received all the request data, and downgrades the lock after committing the data until it has finished sending change notifications to other clients - which allows the status fetch to occur before it has finished. This means that: - a status fetch can access the target vnode either side of the exclusive section of the modification - the status fetch could start before the modification, yet finish after, and vice-versa. - the status fetch and the modification RPCs can complete in either order. - the status fetch can return either the before or the after DV from the modification. - the status fetch might regress the locally cached DV. Some of these are handled by the previous fix[1], but that's not sufficient because it checks the DV it received against the DV it cached at the start of the op, but the DV might've been updated in the meantime by a locally generated modification op. Fix this by the following means: (1) Keep track of when we're performing a modification operation on a vnode. This is done by marking vnode parameters with a 'modification' note that causes the AFS_VNODE_MODIFYING flag to be set on the vnode for the duration. (2) Alter the speculation race detection to ignore speculative status fetches if either the vnode is marked as being modified or the data version number is not what we expected. Note that whilst the "vnode modified" warning does get recovered from as it causes the client to refetch the status at the next opportunity, it will also invalidate the pagecache, so changes might get lost. Fixes: a9e5c87ca744 ("afs: Fix speculative status fetch going out of order wrt to modifications") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-and-reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/160605082531.252452.14708077925602709042.stgit@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/linux-fsdevel/161961335926.39335.2552653972195467566.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-02.gitignore: prefix local generated files with a slashMasahiro Yamada1-2/+2
The pattern prefixed with '/' matches files in the same directory, but not ones in sub-directories. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Andra Paraschiv <andraprs@amazon.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Gabriel Krisman Bertazi <krisman@collabora.com>
2021-04-30Merge tag 'ext4_for_linus' of ↵Linus Torvalds23-239/+1096
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "New features for ext4 this cycle include support for encrypted casefold, ensure that deleted file names are cleared in directory blocks by zeroing directory entries when they are unlinked or moved as part of a hash tree node split. We also improve the block allocator's performance on a freshly mounted file system by prefetching block bitmaps. There are also the usual cleanups and bug fixes, including fixing a page cache invalidation race when there is mixed buffered and direct I/O and the block size is less than page size, and allow the dax flag to be set and cleared on inline directories" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits) ext4: wipe ext4_dir_entry2 upon file deletion ext4: Fix occasional generic/418 failure fs: fix reporting supported extra file attributes for statx() ext4: allow the dax flag to be set and cleared on inline directories ext4: fix debug format string warning ext4: fix trailing whitespace ext4: fix various seppling typos ext4: fix error return code in ext4_fc_perform_commit() ext4: annotate data race in jbd2_journal_dirty_metadata() ext4: annotate data race in start_this_handle() ext4: fix ext4_error_err save negative errno into superblock ext4: fix error code in ext4_commit_super ext4: always panic when errors=panic is specified ext4: delete redundant uptodate check for buffer ext4: do not set SB_ACTIVE in ext4_orphan_cleanup() ext4: make prefetch_block_bitmaps default ext4: add proc files to monitor new structures ext4: improve cr 0 / cr 1 group scanning ext4: add MB_NUM_ORDERS macro ext4: add mballoc stats proc file ...
2021-04-30Merge tag 'dlm-5.13' of ↵Linus Torvalds9-142/+202
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This includes more dlm networking cleanups and improvements for making dlm shutdowns more robust" * tag 'dlm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: fs: dlm: fix missing unlock on error in accept_from_sock() fs: dlm: add shutdown hook fs: dlm: flush swork on shutdown fs: dlm: remove unaligned memory access handling fs: dlm: check on minimum msglen size fs: dlm: simplify writequeue handling fs: dlm: use GFP_ZERO for page buffer fs: dlm: change allocation limits fs: dlm: add check if dlm is currently running fs: dlm: add errno handling to check callback fs: dlm: set subclass for othercon sock_mutex fs: dlm: set connected bit after accept fs: dlm: fix mark setting deadlock fs: dlm: fix debugfs dump
2021-04-30Merge tag 'fuse-update-5.13' of ↵Linus Torvalds8-57/+97
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Fix a page locking bug in write (introduced in 2.6.26) - Allow sgid bit to be killed in setacl() - Miscellaneous fixes and cleanups * tag 'fuse-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: cuse: simplify refcount cuse: prevent clone virtiofs: fix userns virtiofs: remove useless function virtiofs: split requests that exceed virtqueue size virtiofs: fix memory leak in virtio_fs_probe() fuse: invalidate attrs when page writeback completes fuse: add a flag FUSE_SETXATTR_ACL_KILL_SGID to kill SGID fuse: extend FUSE_SETXATTR request fuse: fix matching of FUSE_DEV_IOC_CLONE command fuse: fix a typo fuse: don't zero pages twice fuse: fix typo for fuse_conn.max_pages comment fuse: fix write deadlock
2021-04-30Merge tag 'ovl-update-5.13' of ↵Linus Torvalds8-67/+124
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: - Fix a regression introduced in 5.2 that resulted in valid overlayfs mounts being rejected with ELOOP (Too many levels of symbolic links) - Fix bugs found by various tools - Miscellaneous improvements and cleanups * tag 'ovl-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: add debug print to ovl_do_getxattr() ovl: invalidate readdir cache on changes to dir with origin ovl: allow upperdir inside lowerdir ovl: show "userxattr" in the mount data ovl: trivial typo fixes in the file inode.c ovl: fix misspellings using codespell tool ovl: do not copy attr several times ovl: remove ovl_map_dev_ino() return value ovl: fix error for ovl_fill_super() ovl: fix missing revert_creds() on error path ovl: fix leaked dentry ovl: restrict lower null uuid for "xino=auto" ovl: check that upperdir path is not on a read-only mount ovl: plumb through flush method
2021-04-30Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio"Brian Geffon1-4/+1
This reverts commit cd544fd1dc9293c6702fab6effa63dac1cc67e99. As discussed in [1] this commit was a no-op because the mapping type was checked in vma_to_resize before move_vma is ever called. This meant that vm_ops->mremap() would never be called on such mappings. Furthermore, we've since expanded support of MREMAP_DONTUNMAP to non-anonymous mappings, and these special mappings are still protected by the existing check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EINVAL. 1. https://lkml.org/lkml/2020/12/28/2340 Link: https://lkml.kernel.org/r/20210323182520.2712101-2-bgeffon@google.com Signed-off-by: Brian Geffon <bgeffon@google.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Alejandro Colomar <alx.manpages@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Michael S . Tsirkin" <mst@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30iomap: use filemap_range_needs_writeback() for O_DIRECT readsJens Axboe1-8/+16
For reads, use the better variant of checking for the need to call filemap_write_and_wait_range() when doing O_DIRECT. This avoids falling back to the slow path for IOCB_NOWAIT, if there are no pages to wait for (or write out). Link: https://lkml.kernel.org/r/20210224164455.1096727-4-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30vfs: fs_parser: clean up kernel-doc warningsRandy Dunlap1-1/+1
Fix kernel-doc notation function arguments to eliminate two kernel-doc warnings: fs_parser.c:322: warning: Excess function parameter 'name' description in 'validate_constant_table' fs_parser.c:367: warning: Function parameter or member 'name' not described in 'fs_validate_description' Link: https://lkml.kernel.org/r/20210407033743.9701-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30ocfs2/dlm: remove unused functionJiapeng Chong1-7/+0
Fix the following clang warning: fs/ocfs2/dlm/dlmrecovery.c:129:20: warning: unused function 'dlm_reset_recovery' [-Wunused-function]. Link: https://lkml.kernel.org/r/1618382761-5784-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30ocfs2: fix a typoBhaskar Chowdhury1-1/+1
s/cluter/cluster/ Link: https://lkml.kernel.org/r/20210324072931.5056-1-unixbhaskar@gmail.com Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30ocfs2: map flags directly in flags_to_o2dlm()Joseph Qi1-18/+18
Use macro map_flag() is tricky and coccicheck outputs the following warning: fs/ocfs2/stack_o2cb.c:69:5-16: Unneeded variable: "o2dlm_flags" So map flags directly in flags_to_o2dlm() to make coccicheck happy. And remove BUG_ON() here as well to simplify code since it runs well a long time. Link: https://lkml.kernel.org/r/1616138664-35935-1-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30ocfs2: replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTEYang Li1-1/+1
Fix the following coccicheck warning: fs/ocfs2/blockcheck.c:232:0-23: WARNING: blockcheck_fops should be defined with DEFINE_DEBUGFS_ATTRIBUTE Link: https://lkml.kernel.org/r/1614155230-57292-1-git-send-email-yang.lee@linux.alibaba.com Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30io_uring: Fix memory leak in io_sqe_buffers_register()Zqiang1-2/+2
unreferenced object 0xffff8881123bf0a0 (size 32): comm "syz-executor557", pid 8384, jiffies 4294946143 (age 12.360s) backtrace: [<ffffffff81469b71>] kmalloc_node include/linux/slab.h:579 [inline] [<ffffffff81469b71>] kvmalloc_node+0x61/0xf0 mm/util.c:587 [<ffffffff815f0b3f>] kvmalloc include/linux/mm.h:795 [inline] [<ffffffff815f0b3f>] kvmalloc_array include/linux/mm.h:813 [inline] [<ffffffff815f0b3f>] kvcalloc include/linux/mm.h:818 [inline] [<ffffffff815f0b3f>] io_rsrc_data_alloc+0x4f/0xc0 fs/io_uring.c:7164 [<ffffffff815f26d8>] io_sqe_buffers_register+0x98/0x3d0 fs/io_uring.c:8383 [<ffffffff815f84a7>] __io_uring_register+0xf67/0x18c0 fs/io_uring.c:9986 [<ffffffff81609222>] __do_sys_io_uring_register fs/io_uring.c:10091 [inline] [<ffffffff81609222>] __se_sys_io_uring_register fs/io_uring.c:10071 [inline] [<ffffffff81609222>] __x64_sys_io_uring_register+0x112/0x230 fs/io_uring.c:10071 [<ffffffff842f616a>] do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 [<ffffffff84400068>] entry_SYSCALL_64_after_hwframe+0x44/0xae Fix data->tags memory leak, through io_rsrc_data_free() to release data memory space. Reported-by: syzbot+0f32d05d8b6cd8d7ea3e@syzkaller.appspotmail.com Signed-off-by: Zqiang <qiang.zhang@windriver.com> Link: https://lore.kernel.org/r/20210430082515.13886-1-qiang.zhang@windriver.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-29Merge tag 'kbuild-v5.13' of ↵Linus Torvalds1-6/+1
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Evaluate $(call cc-option,...) etc. only for build targets - Add CONFIG_VMLINUX_MAP to generate .map file when linking vmlinux - Remove unnecessary --gcc-toolchains Clang flag because the --prefix flag finds the toolchains - Do not pass Clang's --prefix flag when using the integrated as - Check the assembler version in Kconfig time - Add new CONFIG options, AS_VERSION, AS_IS_GNU, AS_IS_LLVM to clean up some dependencies in Kconfig - Fix invalid Module.symvers creation when building only modules without vmlinux - Fix false-positive modpost warnings when CONFIG_TRIM_UNUSED_KSYMS is set, but there is no module to build - Refactor module installation Makefile - Support zstd for module compression - Convert alpha and ia64 to use generic shell scripts to generate the syscall headers - Add a new elfnote to indicate if the kernel was built with LTO, which will be used by pahole - Flatten the directory structure under include/config/ so CONFIG options and filenames match - Change the deb source package name from linux-$(KERNELRELEASE) to linux-upstream * tag 'kbuild-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (42 commits) kbuild: Add $(KBUILD_HOSTLDFLAGS) to 'has_libelf' test kbuild: deb-pkg: change the source package name to linux-upstream tools: do not include scripts/Kbuild.include kbuild: redo fake deps at include/config/*.h kbuild: remove TMPO from try-run MAINTAINERS: add pattern for dummy-tools kbuild: add an elfnote for whether vmlinux is built with lto ia64: syscalls: switch to generic syscallhdr.sh ia64: syscalls: switch to generic syscalltbl.sh alpha: syscalls: switch to generic syscallhdr.sh alpha: syscalls: switch to generic syscalltbl.sh sysctl: use min() helper for namecmp() kbuild: add support for zstd compressed modules kbuild: remove CONFIG_MODULE_COMPRESS kbuild: merge scripts/Makefile.modsign to scripts/Makefile.modinst kbuild: move module strip/compression code into scripts/Makefile.modinst kbuild: refactor scripts/Makefile.modinst kbuild: rename extmod-prefix to extmod_prefix kbuild: check module name conflict for external modules as well kbuild: show the target directory for depmod log ...
2021-04-29io_uring: Fix premature return from loop and memory leakColin Ian King1-4/+8
Currently the -EINVAL error return path is leaking memory allocated to data. Fix this by not returning immediately but instead setting the error return variable to -EINVAL and breaking out of the loop. Kudos to Pavel Begunkov for suggesting a correct fix. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20210429104602.62676-1-colin.king@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-29io_uring: fix unchecked error in switch_start()Pavel Begunkov1-1/+3
io_rsrc_node_switch_start() can fail, don't forget to check returned error code. Reported-by: syzbot+a4715dd4b7c866136f79@syzkaller.appspotmail.com Fixes: eae071c9b4cef ("io_uring: prepare fixed rw for dynanic buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c4c06e2f3f0c8e43bd8d0a266c79055bcc6b6e60.1619693112.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-29io_uring: allow empty slots for reg buffersPavel Begunkov1-7/+29
Allow empty reg buffer slots any request using which should fail. This allows users to not register all buffers in advance, but do it lazily and/or on demand via updates. That is achieved by setting iov_base and iov_len to zero for registration and/or buffer updates. Empty buffer can't have a non-zero tag. Implementation details: to not add extra overhead to io_import_fixed(), create a dummy buffer crafted to fail any request using it, and set it to all empty buffer slots. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7e95e4d700082baaf010c648c72ac764c9cc8826.1619611868.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-29io_uring: add more build check for uapiPavel Begunkov1-0/+7
Add a couple of BUILD_BUG_ON() checking some rsrc uapi structs and SQE flags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ff960df4d5026b9fb5bfd80994b9d3667d3926da.1619536280.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-29io_uring: dont overlap internal and user req flagsPavel Begunkov1-6/+3
CQE flags take one byte that we store in req->flags together with other REQ_F_* internal flags. CQE flags are copied directly into req and then verified that requires some handling on failures, e.g. to make sure that that copy doesn't set some of the internal flags. Move all internal flags to take bits after the first byte, so we don't need extra handling and make it safer overall. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b8b5b02d1ab9d786fcc7db4a3fe86db6b70b8987.1619536280.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-29io_uring: fix drain with rsrc CQEsPavel Begunkov1-0/+1
Resource emitted CQEs are not bound to requests, so fix up counters used for DRAIN/defer logic. Fixes: b60c8dce33895 ("io_uring: preparation for rsrc tagging") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2b32f5f0a40d5928c3466d028f936e167f0654be.1619536280.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-29Merge tag 'net-next-5.13' of ↵Linus Torvalds4-51/+24
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - bpf: - allow bpf programs calling kernel functions (initially to reuse TCP congestion control implementations) - enable task local storage for tracing programs - remove the need to store per-task state in hash maps, and allow tracing programs access to task local storage previously added for BPF_LSM - add bpf_for_each_map_elem() helper, allowing programs to walk all map elements in a more robust and easier to verify fashion - sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT redirection - lpm: add support for batched ops in LPM trie - add BTF_KIND_FLOAT support - mostly to allow use of BTF on s390 which has floats in its headers files - improve BPF syscall documentation and extend the use of kdoc parsing scripts we already employ for bpf-helpers - libbpf, bpftool: support static linking of BPF ELF files - improve support for encapsulation of L2 packets - xdp: restructure redirect actions to avoid a runtime lookup, improving performance by 4-8% in microbenchmarks - xsk: build skb by page (aka generic zerocopy xmit) - improve performance of software AF_XDP path by 33% for devices which don't need headers in the linear skb part (e.g. virtio) - nexthop: resilient next-hop groups - improve path stability on next-hops group changes (incl. offload for mlxsw) - ipv6: segment routing: add support for IPv4 decapsulation - icmp: add support for RFC 8335 extended PROBE messages - inet: use bigger hash table for IP ID generation - tcp: deal better with delayed TX completions - make sure we don't give up on fast TCP retransmissions only because driver is slow in reporting that it completed transmitting the original - tcp: reorder tcp_congestion_ops for better cache locality - mptcp: - add sockopt support for common TCP options - add support for common TCP msg flags - include multiple address ids in RM_ADDR - add reset option support for resetting one subflow - udp: GRO L4 improvements - improve 'forward' / 'frag_list' co-existence with UDP tunnel GRO, allowing the first to take place correctly even for encapsulated UDP traffic - micro-optimize dev_gro_receive() and flow dissection, avoid retpoline overhead on VLAN and TEB GRO - use less memory for sysctls, add a new sysctl type, to allow using u8 instead of "int" and "long" and shrink networking sysctls - veth: allow GRO without XDP - this allows aggregating UDP packets before handing them off to routing, bridge, OvS, etc. - allow specifing ifindex when device is moved to another namespace - netfilter: - nft_socket: add support for cgroupsv2 - nftables: add catch-all set element - special element used to define a default action in case normal lookup missed - use net_generic infra in many modules to avoid allocating per-ns memory unnecessarily - xps: improve the xps handling to avoid potential out-of-bound accesses and use-after-free when XPS change race with other re-configuration under traffic - add a config knob to turn off per-cpu netdev refcnt to catch underflows in testing Device APIs: - add WWAN subsystem to organize the WWAN interfaces better and hopefully start driving towards more unified and vendor- independent APIs - ethtool: - add interface for reading IEEE MIB stats (incl. mlx5 and bnxt support) - allow network drivers to dump arbitrary SFP EEPROM data, current offset+length API was a poor fit for modern SFP which define EEPROM in terms of pages (incl. mlx5 support) - act_police, flow_offload: add support for packet-per-second policing (incl. offload for nfp) - psample: add additional metadata attributes like transit delay for packets sampled from switch HW (and corresponding egress and policy-based sampling in the mlxsw driver) - dsa: improve support for sandwiched LAGs with bridge and DSA - netfilter: - flowtable: use direct xmit in topologies with IP forwarding, bridging, vlans etc. - nftables: counter hardware offload support - Bluetooth: - improvements for firmware download w/ Intel devices - add support for reading AOSP vendor capabilities - add support for virtio transport driver - mac80211: - allow concurrent monitor iface and ethernet rx decap - set priority and queue mapping for injected frames - phy: add support for Clause-45 PHY Loopback - pci/iov: add sysfs MSI-X vector assignment interface to distribute MSI-X resources to VFs (incl. mlx5 support) New hardware/drivers: - dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit interfaces. - dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and BCM63xx switches - Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches - ath11k: support for QCN9074 a 802.11ax device - Bluetooth: Broadcom BCM4330 and BMC4334 - phy: Marvell 88X2222 transceiver support - mdio: add BCM6368 MDIO mux bus controller - r8152: support RTL8153 and RTL8156 (USB Ethernet) chips - mana: driver for Microsoft Azure Network Adapter (MANA) - Actions Semi Owl Ethernet MAC - can: driver for ETAS ES58X CAN/USB interfaces Pure driver changes: - add XDP support to: enetc, igc, stmmac - add AF_XDP support to: stmmac - virtio: - page_to_skb() use build_skb when there's sufficient tailroom (21% improvement for 1000B UDP frames) - support XDP even without dedicated Tx queues - share the Tx queues with the stack when necessary - mlx5: - flow rules: add support for mirroring with conntrack, matching on ICMP, GTP, flex filters and more - support packet sampling with flow offloads - persist uplink representor netdev across eswitch mode changes - allow coexistence of CQE compression and HW time-stamping - add ethtool extended link error state reporting - ice, iavf: support flow filters, UDP Segmentation Offload - dpaa2-switch: - move the driver out of staging - add spanning tree (STP) support - add rx copybreak support - add tc flower hardware offload on ingress traffic - ionic: - implement Rx page reuse - support HW PTP time-stamping - octeon: support TC hardware offloads - flower matching on ingress and egress ratelimitting. - stmmac: - add RX frame steering based on VLAN priority in tc flower - support frame preemption (FPE) - intel: add cross time-stamping freq difference adjustment - ocelot: - support forwarding of MRP frames in HW - support multiple bridges - support PTP Sync one-step timestamping - dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like learning, flooding etc. - ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350, SC7280 SoCs) - mt7601u: enable TDLS support - mt76: - add support for 802.3 rx frames (mt7915/mt7615) - mt7915 flash pre-calibration support - mt7921/mt7663 runtime power management fixes" * tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits) net: selftest: fix build issue if INET is disabled net: netrom: nr_in: Remove redundant assignment to ns net: tun: Remove redundant assignment to ret net: phy: marvell: add downshift support for M88E1240 net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255 net/sched: act_ct: Remove redundant ct get and check icmp: standardize naming of RFC 8335 PROBE constants bpf, selftests: Update array map tests for per-cpu batched ops bpf: Add batched ops support for percpu array bpf: Implement formatted output helpers with bstr_printf seq_file: Add a seq_bprintf function sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues net:nfc:digital: Fix a double free in digital_tg_recv_dep_req net: fix a concurrency bug in l2tp_tunnel_register() net/smc: Remove redundant assignment to rc mpls: Remove redundant assignment to err llc2: Remove redundant assignment to rc net/tls: Remove redundant initialization of record rds: Remove redundant assignment to nr_sig dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0 ...