summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2020-05-29powerpc/64s: Disable sanitisers for C syscall/interrupt entry/exit codeDaniel Axtens1-0/+3
syzkaller is picking up a bunch of crashes that look like this: Unrecoverable exception 380 at c00000000037ed60 (msr=8000000000001031) Oops: Unrecoverable exception, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 874 Comm: syz-executor.0 Not tainted 5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e #0 NIP: c00000000037ed60 LR: c00000000004bac8 CTR: c000000000030990 REGS: c0000000555a7230 TRAP: 0380 Not tainted (5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e) MSR: 8000000000001031 <SF,ME,IR,DR,LE> CR: 48222882 XER: 20000000 CFAR: c00000000004bac4 IRQMASK: 0 GPR00: c00000000004bb68 c0000000555a74c0 c0000000024b3500 0000000000000005 GPR04: 0000000000000000 0000000000000000 c00000000004bb88 c008000000910000 GPR08: 00000000000b0000 c00000000004bac8 0000000000016000 c000000002503500 GPR12: c000000000030990 c000000003190000 00000000106a5898 00000000106a0000 GPR16: 00000000106a5890 c000000007a92000 c000000008180e00 c000000007a8f700 GPR20: c000000007a904b0 0000000010110000 c00000000259d318 5deadbeef0000100 GPR24: 5deadbeef0000122 c000000078422700 c000000009ee88b8 c000000078422778 GPR28: 0000000000000001 800000000280b033 0000000000000000 c0000000555a75a0 NIP [c00000000037ed60] __sanitizer_cov_trace_pc+0x40/0x50 LR [c00000000004bac8] interrupt_exit_kernel_prepare+0x118/0x310 Call Trace: [c0000000555a74c0] [c00000000004bb68] interrupt_exit_kernel_prepare+0x1b8/0x310 (unreliable) [c0000000555a7530] [c00000000000f9a8] interrupt_return+0x118/0x1c0 --- interrupt: 900 at __sanitizer_cov_trace_pc+0x0/0x50 ...<random previous call chain>... This is caused by __sanitizer_cov_trace_pc() causing an SLB fault after MSR[RI] has been cleared by __hard_EE_RI_disable(), which we can not recover from. Do not instrument the new syscall/interrupt entry/exit code with KCOV, GCOV or UBSAN. Reported-by: syzbot-ppc64 <ozlabsyz@au1.ibm.com> Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C") Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-05-29xfrm: fix a NULL-ptr deref in xfrm_local_errorXin Long1-1/+2
This patch is to fix a crash: [ ] kasan: GPF could be caused by NULL-ptr deref or user memory access [ ] general protection fault: 0000 [#1] SMP KASAN PTI [ ] RIP: 0010:ipv6_local_error+0xac/0x7a0 [ ] Call Trace: [ ] xfrm6_local_error+0x1eb/0x300 [ ] xfrm_local_error+0x95/0x130 [ ] __xfrm6_output+0x65f/0xb50 [ ] xfrm6_output+0x106/0x46f [ ] udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel] [ ] vxlan_xmit_one+0xbc6/0x2c60 [vxlan] [ ] vxlan_xmit+0x6a0/0x4276 [vxlan] [ ] dev_hard_start_xmit+0x165/0x820 [ ] __dev_queue_xmit+0x1ff0/0x2b90 [ ] ip_finish_output2+0xd3e/0x1480 [ ] ip_do_fragment+0x182d/0x2210 [ ] ip_output+0x1d0/0x510 [ ] ip_send_skb+0x37/0xa0 [ ] raw_sendmsg+0x1b4c/0x2b80 [ ] sock_sendmsg+0xc0/0x110 This occurred when sending a v4 skb over vxlan6 over ipsec, in which case skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries to get ipv6 info from a ipv4 sk. This issue was actually fixed by Commit 628e341f319f ("xfrm: make local error reporting more robust"), but brought back by Commit 844d48746e4b ("xfrm: choose protocol family by skb protocol"). So to fix it, we should call xfrm6_local_error() only when skb->protocol is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6. Fixes: 844d48746e4b ("xfrm: choose protocol family by skb protocol") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-05-29Merge branch 'fixes' of ↵Ingo Molnar1-38/+48
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into x86/urgent Pick up FPU register dump fixes from Al Viro. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-29Merge tag 'drm-misc-fixes-2020-05-28' of ↵Dave Airlie1-3/+3
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Two ingenic fixes, one for a wrong cast, the other for a typo in a comparison Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20200528110944.hanv4qgc6w7whnj3@gilmour.lan
2020-05-28net: be more gentle about silly gso requests coming from userEric Dumazet1-8/+9
Recent change in virtio_net_hdr_to_skb() broke some packetdrill tests. When --mss=XXX option is set, packetdrill always provide gso_type & gso_size for its inbound packets, regardless of packet size. if (packet->tcp && packet->mss) { if (packet->ipv4) gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4; else gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6; gso.gso_size = packet->mss; } Since many other programs could do the same, relax virtio_net_hdr_to_skb() to no longer return an error, but instead ignore gso settings. This keeps Willem intent to make sure no malicious packet could reach gso stack. Note that TCP stack has a special logic in tcp_set_skb_tso_segs() to clear gso_size for small packets. Fixes: 6dd912f82680 ("net: check untrusted gso_size at kernel entry") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-4/+19
Merge misc fixes from Andrew Morton: "5 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: include/asm-generic/topology.h: guard cpumask_of_node() macro argument fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info() mm: remove VM_BUG_ON(PageSlab()) from page_mapcount() mm,thp: stop leaking unreleased file pages mm/z3fold: silence kmemleak false positives of slots
2020-05-28sctp: check assoc before SCTP_ADDR_{MADE_PRIM, ADDED} eventJonas Falkevik1-0/+3
Make sure SCTP_ADDR_{MADE_PRIM,ADDED} are sent only for associations that have been established. These events are described in rfc6458#section-6.1 SCTP_PEER_ADDR_CHANGE: This tag indicates that an address that is part of an existing association has experienced a change of state (e.g., a failure or return to service of the reachability of an endpoint via a specific transport address). Signed-off-by: Jonas Falkevik <jonas.falkevik@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28Merge branch 'for-linus' of ↵Linus Torvalds13-72/+88
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Just a few random driver fixups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: synaptics - add a second working PNP_ID for Lenovo T470s Input: applespi - replace zero-length array with flexible-array Input: axp20x-pek - always register interrupt handlers Input: lm8333 - update contact email Input: synaptics-rmi4 - fix error return code in rmi_driver_probe() Input: synaptics-rmi4 - really fix attn_data use-after-free Input: i8042 - add ThinkPad S230u to i8042 reset list Revert "Input: i8042 - add ThinkPad S230u to i8042 nomux list" Input: dlink-dir685-touchkeys - fix a typo in driver name Input: xpad - add custom init packet for Xbox One S controllers Input: evdev - call input_flush_device() on release(), not flush() Input: i8042 - add ThinkPad S230u to i8042 nomux list Input: usbtouchscreen - add support for BonXeon TP Input: cros_ec_keyb - use cros_ec_cmd_xfer_status helper Input: mms114 - fix handling of mms345l Input: elants_i2c - support palm detection
2020-05-28x86/ioperm: Prevent a memory leak when fork failsJay Lang3-15/+15
In the copy_process() routine called by _do_fork(), failure to allocate a PID (or further along in the function) will trigger an invocation to exit_thread(). This is done to clean up from an earlier call to copy_thread_tls(). Naturally, the child task is passed into exit_thread(), however during the process, io_bitmap_exit() nullifies the parent's io_bitmap rather than the child's. As copy_thread_tls() has been called ahead of the failure, the reference count on the calling thread's io_bitmap is incremented as we would expect. However, io_bitmap_exit() doesn't accept any arguments, and thus assumes it should trash the current thread's io_bitmap reference rather than the child's. This is pretty sneaky in practice, because in all instances but this one, exit_thread() is called with respect to the current task and everything works out. A determined attacker can issue an appropriate ioctl (i.e. KDENABIO) to get a bitmap allocated, and force a clone3() syscall to fail by passing in a zeroed clone_args structure. The kernel handles the erroneous struct and the buggy code path is followed, and even though the parent's reference to the io_bitmap is trashed, the child still holds a reference and thus the structure will never be freed. Fix this by tweaking io_bitmap_exit() and its subroutines to accept a task_struct argument which to operate on. Fixes: ea5f1cd7ab49 ("x86/ioperm: Remove bitmap if all permissions dropped") Signed-off-by: Jay Lang <jaytlang@mit.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable#@vger.kernel.org Link: https://lkml.kernel.org/r/20200524162742.253727-1-jaytlang@mit.edu
2020-05-28Merge tag 'csky-for-linus-5.7-rc8' of git://github.com/c-sky/csky-linuxLinus Torvalds4-71/+66
Pull csky fixes from Guo Ren: "Another four fixes for csky: - fix req_syscall debug - fix abiv2 syscall_trace - fix preempt enable - clean up regs usage in entry.S" * tag 'csky-for-linus-5.7-rc8' of git://github.com/c-sky/csky-linux: csky: Fixup CONFIG_DEBUG_RSEQ csky: Coding convention in entry.S csky: Fixup abiv2 syscall_trace break a4 & a5 csky: Fixup CONFIG_PREEMPT panic
2020-05-28Revert "block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT"Jens Axboe1-7/+4
This reverts commit c58c1f83436b501d45d4050fd1296d71a9760bcb. io_uring does do the right thing for this case, and we're still returning -EAGAIN to userspace for the cases we don't support. Revert this change to avoid doing endless spins of resubmits. Cc: stable@vger.kernel.org # v5.6 Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-28include/asm-generic/topology.h: guard cpumask_of_node() macro argumentArnd Bergmann1-1/+1
drivers/hwmon/amd_energy.c:195:15: error: invalid operands to binary expression ('void' and 'int') (channel - data->nr_cpus)); ~~~~~~~~~^~~~~~~~~~~~~~~~~ include/asm-generic/topology.h:51:42: note: expanded from macro 'cpumask_of_node' #define cpumask_of_node(node) ((void)node, cpu_online_mask) ^~~~ include/linux/cpumask.h:618:72: note: expanded from macro 'cpumask_first_and' #define cpumask_first_and(src1p, src2p) cpumask_next_and(-1, (src1p), (src2p)) ^~~~~ Fixes: f0b848ce6fe9 ("cpumask: Introduce cpumask_of_{node,pcibus} to replace {node,pcibus}_to_cpumask") Fixes: 8abee9566b7e ("hwmon: Add amd_energy driver to report energy counters") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Guenter Roeck <linux@roeck-us.net> Link: http://lkml.kernel.org/r/20200527134623.930247-1-arnd@arndb.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-28fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info()Alexander Potapenko1-1/+1
KMSAN reported uninitialized data being written to disk when dumping core. As a result, several kilobytes of kmalloc memory may be written to the core file and then read by a non-privileged user. Reported-by: sam <sunhaoyl@outlook.com> Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200419100848.63472-1-glider@google.com Link: https://github.com/google/kmsan/issues/76 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-28mm: remove VM_BUG_ON(PageSlab()) from page_mapcount()Konstantin Khlebnikov1-2/+13
Replace superfluous VM_BUG_ON() with comment about correct usage. Technically reverts commit 1d148e218a0d ("mm: add VM_BUG_ON_PAGE() to page_mapcount()"), but context lines have changed. Function isolate_migratepages_block() runs some checks out of lru_lock when choose pages for migration. After checking PageLRU() it checks extra page references by comparing page_count() and page_mapcount(). Between these two checks page could be removed from lru, freed and taken by slab. As a result this race triggers VM_BUG_ON(PageSlab()) in page_mapcount(). Race window is tiny. For certain workload this happens around once a year. page:ffffea0105ca9380 count:1 mapcount:0 mapping:ffff88ff7712c180 index:0x0 compound_mapcount: 0 flags: 0x500000000008100(slab|head) raw: 0500000000008100 dead000000000100 dead000000000200 ffff88ff7712c180 raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(PageSlab(page)) ------------[ cut here ]------------ kernel BUG at ./include/linux/mm.h:628! invalid opcode: 0000 [#1] SMP NOPTI CPU: 77 PID: 504 Comm: kcompactd1 Tainted: G W 4.19.109-27 #1 Hardware name: Yandex T175-N41-Y3N/MY81-EX0-Y3N, BIOS R05 06/20/2019 RIP: 0010:isolate_migratepages_block+0x986/0x9b0 The code in isolate_migratepages_block() was added in commit 119d6d59dcc0 ("mm, compaction: avoid isolating pinned pages") before adding VM_BUG_ON into page_mapcount(). This race has been predicted in 2015 by Vlastimil Babka (see link below). [akpm@linux-foundation.org: comment tweaks, per Hugh] Fixes: 1d148e218a0d ("mm: add VM_BUG_ON_PAGE() to page_mapcount()") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/159032779896.957378.7852761411265662220.stgit@buzz Link: https://lore.kernel.org/lkml/557710E1.6060103@suse.cz/ Link: https://lore.kernel.org/linux-mm/158937872515.474360.5066096871639561424.stgit@buzz/T/ (v1) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-28mm,thp: stop leaking unreleased file pagesHugh Dickins1-0/+1
When collapse_file() calls try_to_release_page(), it has already isolated the page: so if releasing buffers happens to fail (as it sometimes does), remember to putback_lru_page(): otherwise that page is left unreclaimable and unfreeable, and the file extent uncollapsible. Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@surriel.com> Cc: <stable@vger.kernel.org> [5.4+] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2005231837500.1766@eggly.anvils Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-28mm/z3fold: silence kmemleak false positives of slotsQian Cai1-0/+3
Kmemleak reported many leaks while under memory pressue in, slots = alloc_slots(pool, gfp); which is referenced by "zhdr" in init_z3fold_page(), zhdr->slots = slots; However, "zhdr" could be gone without freeing slots as the later will be freed separately when the last "handle" off of "handles" array is freed. It will be within "slots" which is always aligned. unreferenced object 0xc000000fdadc1040 (size 104): comm "oom04", pid 140476, jiffies 4295359280 (age 3454.970s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: z3fold_zpool_malloc+0x7b0/0xe10 alloc_slots at mm/z3fold.c:214 (inlined by) init_z3fold_page at mm/z3fold.c:412 (inlined by) z3fold_alloc at mm/z3fold.c:1161 (inlined by) z3fold_zpool_malloc at mm/z3fold.c:1735 zpool_malloc+0x34/0x50 zswap_frontswap_store+0x60c/0xda0 zswap_frontswap_store at mm/zswap.c:1093 __frontswap_store+0x128/0x330 swap_writepage+0x58/0x110 pageout+0x16c/0xa40 shrink_page_list+0x1ac8/0x25c0 shrink_inactive_list+0x270/0x730 shrink_lruvec+0x444/0xf30 shrink_node+0x2a4/0x9c0 do_try_to_free_pages+0x158/0x640 try_to_free_pages+0x1bc/0x5f0 __alloc_pages_slowpath.constprop.60+0x4dc/0x15a0 __alloc_pages_nodemask+0x520/0x650 alloc_pages_vma+0xc0/0x420 handle_mm_fault+0x1174/0x1bf0 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vitaly Wool <vitaly.wool@konsulko.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: http://lkml.kernel.org/r/20200522220052.2225-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-28x86/dma: Fix max PFN arithmetic overflow on 32 bit systemsAlexander Dahl1-1/+1
The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is 4 294 967 296 or 0x100000000 which is no problem on 64 bit systems. The patch does not change the later overall result of 0x100000 for MAX_DMA32_PFN (after it has been shifted by PAGE_SHIFT). The new calculation yields the same result, but does not require 64 bit arithmetic. On 32 bit systems the old calculation suffers from an arithmetic overflow in that intermediate term in braces: 4UL aka unsigned long int is 4 byte wide and an arithmetic overflow happens (the 0x100000000 does not fit in 4 bytes), the in braces result is truncated to zero, the following right shift does not alter that, so MAX_DMA32_PFN evaluates to 0 on 32 bit systems. That wrong value is a problem in a comparision against MAX_DMA32_PFN in the init code for swiotlb in pci_swiotlb_detect_4gb() to decide if swiotlb should be active. That comparison yields the opposite result, when compiling on 32 bit systems. This was not possible before 1b7e03ef7570 ("x86, NUMA: Enable emulation on 32bit too") when that MAX_DMA32_PFN was first made visible to x86_32 (and which landed in v3.0). In practice this wasn't a problem, unless CONFIG_SWIOTLB is active on x86-32. However if one has set CONFIG_IOMMU_INTEL, since c5a5dc4cbbf4 ("iommu/vt-d: Don't switch off swiotlb if bounce page is used") there's a dependency on CONFIG_SWIOTLB, which was not necessarily active before. That landed in v5.4, where we noticed it in the fli4l Linux distribution. We have CONFIG_IOMMU_INTEL active on both 32 and 64 bit kernel configs there (I could not find out why, so let's just say historical reasons). The effect is at boot time 64 MiB (default size) were allocated for bounce buffers now, which is a noticeable amount of memory on small systems like pcengines ALIX 2D3 with 256 MiB memory, which are still frequently used as home routers. We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4 (LTS) in fli4l and got that kernel messages for example: Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 2018.02.8)) #1 SMP Mon Nov 26 23:40:00 CET 2018 … Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem) … PCI-DMA: Using software bounce buffering for IO (SWIOTLB) software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB) The initial analysis and the suggested fix was done by user 'sourcejedi' at stackoverflow and explicitly marked as GPLv2 for inclusion in the Linux kernel: https://unix.stackexchange.com/a/520525/50007 The new calculation, which does not suffer from that overflow, is the same as for arch/mips now as suggested by Robin Murphy. The fix was tested by fli4l users on round about two dozen different systems, including both 32 and 64 bit archs, bare metal and virtualized machines. [ bp: Massage commit message. ] Fixes: 1b7e03ef7570 ("x86, NUMA: Enable emulation on 32bit too") Reported-by: Alan Jenkins <alan.christopher.jenkins@gmail.com> Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Alexander Dahl <post@lespocky.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://unix.stackexchange.com/q/520065/50007 Link: https://web.nettworks.org/bugs/browse/FFL-2560 Link: https://lkml.kernel.org/r/20200526175749.20742-1-post@lespocky.de
2020-05-28bonding: Fix reference count leak in bond_sysfs_slave_add.Qiushi Wu1-1/+3
kobject_init_and_add() takes reference even when it fails. If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Previous commit "b8eb718348b8" fixed a similar problem. Fixes: 07699f9a7c8d ("bonding: add sysfs /slave dir for bond slave devices.") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller3-6/+6
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Uninitialized when used in __nf_conntrack_update(), from Nathan Chancellor. 2) Comparison of unsigned expression in nf_confirm_cthelper(). 3) Remove 'const' type qualifier with no effect. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc againPetr Mladek1-0/+1
The commit 0ebeea8ca8a4d1d453a ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work") caused that bpf_probe_read{, str}() functions were not longer available on architectures where the same logical address might have different content in kernel and user memory mapping. These architectures should use probe_read_{user,kernel}_str helpers. For backward compatibility, the problematic functions are still available on architectures where the user and kernel address spaces are not overlapping. This is defined CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE. At the moment, these backward compatible functions are enabled only on x86_64, arm, and arm64. Let's do it also on powerpc that has the non overlapping address space as well. Fixes: 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work") Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/lkml/20200527122844.19524-1-pmladek@suse.com
2020-05-28Merge branch 'nvme-5.7' of git://git.infradead.org/nvme into block-5.7Jens Axboe1-4/+7
Pull NVMe poll fix from Christoph. * 'nvme-5.7' of git://git.infradead.org/nvme: nvme-pci: avoid race between nvme_reap_pending_cqes() and nvme_poll()
2020-05-28arm64/kernel: Fix return value when cpu_online() fails in __cpu_up()Nobuhiro Iwamatsu1-1/+1
If boot_secondary() was successful, and cpu_online() was an error in __cpu_up(), -EIO was returned, but 0 is returned by commit d22b115cbfbb7 ("arm64/kernel: Simplify __cpu_up() by bailing out early"). Therefore, bringup_wait_for_ap() causes the primary core to wait for a long time, which may cause boot failure. This commit sets -EIO to return code under the same conditions. Fixes: d22b115cbfbb ("arm64/kernel: Simplify __cpu_up() by bailing out early") Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Tested-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp> Acked-by: Will Deacon <will@kernel.org> Cc: Gavin Shan <gshan@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20200527233457.2531118-1-nobuhiro1.iwamatsu@toshiba.co.jp [catalin.marinas@arm.com: return -EIO at the end of the function] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-05-28Merge tag 'amd-drm-fixes-5.7-2020-05-27' of ↵Dave Airlie2-7/+2
git://people.freedesktop.org/~agd5f/linux into drm-fixes amd-drm-fixes-5.7-2020-05-27: amdgpu: - Display atomic test fix - Fix soft hang in display vupdate code Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200527222700.4378-1-alexander.deucher@amd.com
2020-05-28csky: Fixup CONFIG_DEBUG_RSEQGuo Ren1-5/+10
Put the rseq_syscall check point at the prologue of the syscall will break the a0 ... a7. This will casue system call bug when DEBUG_RSEQ is enabled. So move it to the epilogue of syscall, but before syscall_trace. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-05-28csky: Coding convention in entry.SGuo Ren4-40/+42
There is no fixup or feature in the patch, we only cleanup with: - Remove unnecessary reg used (r11, r12), just use r9 & r10 & syscallid regs as temp useage. - Add _TIF_SYSCALL_WORK and _TIF_WORK_MASK to gather macros. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-05-28csky: Fixup abiv2 syscall_trace break a4 & a5Guo Ren2-2/+6
Current implementation could destory a4 & a5 when strace, so we need to get them from pt_regs by SAVE_ALL. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-05-28csky: Fixup CONFIG_PREEMPT panicGuo Ren3-27/+11
log: [    0.13373200] Calibrating delay loop... [    0.14077600] ------------[ cut here ]------------ [    0.14116700] WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:3790 preempt_count_add+0xc8/0x11c [    0.14348000] DEBUG_LOCKS_WARN_ON((preempt_count() < 0))Modules linked in: [    0.14395100] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0 #7 [    0.14410800] [    0.14427400] Call Trace: [    0.14450700] [<807cd226>] dump_stack+0x8a/0xe4 [    0.14473500] [<80072792>] __warn+0x10e/0x15c [    0.14495900] [<80072852>] warn_slowpath_fmt+0x72/0xc0 [    0.14518600] [<800a5240>] preempt_count_add+0xc8/0x11c [    0.14544900] [<807ef918>] _raw_spin_lock+0x28/0x68 [    0.14572600] [<800e0eb8>] vprintk_emit+0x84/0x2d8 [    0.14599000] [<800e113a>] vprintk_default+0x2e/0x44 [    0.14625100] [<800e2042>] vprintk_func+0x12a/0x1d0 [    0.14651300] [<800e1804>] printk+0x30/0x48 [    0.14677600] [<80008052>] lockdep_init+0x12/0xb0 [    0.14703800] [<80002080>] start_kernel+0x558/0x7f8 [    0.14730000] [<800052bc>] csky_start+0x58/0x94 [    0.14756600] irq event stamp: 34 [    0.14775100] hardirqs last  enabled at (33): [<80067370>] ret_from_exception+0x2c/0x72 [    0.14793700] hardirqs last disabled at (34): [<800e0eae>] vprintk_emit+0x7a/0x2d8 [    0.14812300] softirqs last  enabled at (32): [<800655b0>] __do_softirq+0x578/0x6d8 [    0.14830800] softirqs last disabled at (25): [<8007b3b8>] irq_exit+0xec/0x128 The preempt_count of reg could be destroyed after csky_do_IRQ without reload from memory. After reference to other architectures (arm64, riscv), we move preempt entry into ret_from_exception and disable irq at the beginning of ret_from_exception instead of RESTORE_ALL. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Reported-by: Lu Baoquan <lu.baoquan@intellif.com>
2020-05-27IB/ipoib: Fix double free of skb in case of multicast traffic in CM modeValentine Fatiev4-12/+26
When connected mode is set, and we have connected and datagram traffic in parallel, ipoib might crash with double free of datagram skb. The current mechanism assumes that the order in the completion queue is the same as the order of sent packets for all QPs. Order is kept only for specific QP, in case of mixed UD and CM traffic we have few QPs (one UD and few CM's) in parallel. The problem: ---------------------------------------------------------- Transmit queue: ----------------- UD skb pointer kept in queue itself, CM skb kept in spearate queue and uses transmit queue as a placeholder to count the number of total transmitted packets. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127 ------------------------------------------------------------ NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ........... ------------------------------------------------------------ ^ ^ tail head Completion queue (problematic scenario) - the order not the same as in the transmit queue: 1 2 3 4 5 6 7 8 9 ------------------------------------ ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5 ------------------------------------ 1. CM1 'wc' processing - skb freed in cm separate ring. - tx_tail of transmit queue increased although UD2 is not freed. Now driver assumes UD2 index is already freed and it could be used for new transmitted skb. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127 ------------------------------------------------------------ NL NL UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ........... ------------------------------------------------------------ ^ ^ ^ (Bad)tail head (Bad - Could be used for new SKB) In this case (due to heavy load) UD2 skb pointer could be replaced by new transmitted packet UD_NEW, as the driver assumes its free. At this point we will have to process two 'wc' with same index but we have only one pointer to free. During second attempt to free the same skb we will have NULL pointer exception. 2. UD2 'wc' processing - skb freed according the index we got from 'wc', but it was already overwritten by mistake. So actually the skb that was released is the skb of the new transmitted packet and not the original one. 3. UD_NEW 'wc' processing - attempt to free already freed skb. NUll pointer exception. The fix: ----------------------------------------------------------------------- The fix is to stop using the UD ring as a placeholder for CM packets, the cyclic ring variables tx_head and tx_tail will manage the UD tx_ring, a new cyclic variables global_tx_head and global_tx_tail are introduced for managing and counting the overall outstanding sent packets, then the send queue will be stopped and waken based on these variables only. Note that no locking is needed since global_tx_head is updated in the xmit flow and global_tx_tail is updated in the NAPI flow only. A previous attempt tried to use one variable to count the outstanding sent packets, but it did not work since xmit and NAPI flows can run at the same time and the counter will be updated wrongly. Thus, we use the same simple cyclic head and tail scheme that we have today for the UD tx_ring. Fixes: 2c104ea68350 ("IB/ipoib: Get rid of the tx_outstanding variable in all modes") Link: https://lore.kernel.org/r/20200527134705.480068-1-leon@kernel.org Signed-off-by: Valentine Fatiev <valentinef@mellanox.com> Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27drm/amd/display: Fix potential integer wraparound resulting in a hangAric Cyr1-0/+2
[Why] If VUPDATE_END is before VUPDATE_START the delay calculated can become very large, causing a soft hang. [How] Take the absolute value of the difference between START and END. Signed-off-by: Aric Cyr <aric.cyr@amd.com> Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2020-05-27drm/amd/display: drop cursor position check in atomic testSimon Ser1-7/+0
get_cursor_position already handles the case where the cursor has negative off-screen coordinates by not setting dc_cursor_position.enabled. Signed-off-by: Simon Ser <contact@emersion.fr> Fixes: 626bf90fe03f ("drm/amd/display: add basic atomic check for cursor plane") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2020-05-27net: dsa: declare lockless TX feature for slave portsVladimir Oltean1-0/+1
Be there a platform with the following layout: Regular NIC | +----> DSA master for switch port | +----> DSA master for another switch port After changing DSA back to static lockdep class keys in commit 1a33e10e4a95 ("net: partially revert dynamic lockdep key changes"), this kernel splat can be seen: [ 13.361198] ============================================ [ 13.366524] WARNING: possible recursive locking detected [ 13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted [ 13.377874] -------------------------------------------- [ 13.383201] swapper/0/0 is trying to acquire lock: [ 13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.397879] [ 13.397879] but task is already holding lock: [ 13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.413593] [ 13.413593] other info that might help us debug this: [ 13.420140] Possible unsafe locking scenario: [ 13.420140] [ 13.426075] CPU0 [ 13.428523] ---- [ 13.430969] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.435946] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.440924] [ 13.440924] *** DEADLOCK *** [ 13.440924] [ 13.446860] May be due to missing lock nesting notation [ 13.446860] [ 13.453668] 6 locks held by swapper/0/0: [ 13.457598] #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400 [ 13.466593] #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560 [ 13.474803] #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10 [ 13.483886] #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.492793] #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.503094] #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.512000] [ 13.512000] stack backtrace: [ 13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 [ 13.530421] Call trace: [ 13.532871] dump_backtrace+0x0/0x1d8 [ 13.536539] show_stack+0x24/0x30 [ 13.539862] dump_stack+0xe8/0x150 [ 13.543271] __lock_acquire+0x1030/0x1678 [ 13.547290] lock_acquire+0xf8/0x458 [ 13.550873] _raw_spin_lock+0x44/0x58 [ 13.554543] __dev_queue_xmit+0x84c/0xbe0 [ 13.558562] dev_queue_xmit+0x24/0x30 [ 13.562232] dsa_slave_xmit+0xe0/0x128 [ 13.565988] dev_hard_start_xmit+0xf4/0x448 [ 13.570182] __dev_queue_xmit+0x808/0xbe0 [ 13.574200] dev_queue_xmit+0x24/0x30 [ 13.577869] neigh_resolve_output+0x15c/0x220 [ 13.582237] ip6_finish_output2+0x244/0xb10 [ 13.586430] __ip6_finish_output+0x1dc/0x298 [ 13.590709] ip6_output+0x84/0x358 [ 13.594116] mld_sendpack+0x2bc/0x560 [ 13.597786] mld_ifc_timer_expire+0x210/0x390 [ 13.602153] call_timer_fn+0xcc/0x400 [ 13.605822] run_timer_softirq+0x588/0x6e0 [ 13.609927] __do_softirq+0x118/0x590 [ 13.613597] irq_exit+0x13c/0x148 [ 13.616918] __handle_domain_irq+0x6c/0xc0 [ 13.621023] gic_handle_irq+0x6c/0x160 [ 13.624779] el1_irq+0xbc/0x180 [ 13.627927] cpuidle_enter_state+0xb4/0x4d0 [ 13.632120] cpuidle_enter+0x3c/0x50 [ 13.635703] call_cpuidle+0x44/0x78 [ 13.639199] do_idle+0x228/0x2c8 [ 13.642433] cpu_startup_entry+0x2c/0x48 [ 13.646363] rest_init+0x1ac/0x280 [ 13.649773] arch_call_rest_init+0x14/0x1c [ 13.653878] start_kernel+0x490/0x4bc Lockdep keys themselves were added in commit ab92d68fc22f ("net: core: add generic lockdep keys"), and it's very likely that this splat existed since then, but I have no real way to check, since this stacked platform wasn't supported by mainline back then. >From Taehee's own words: This patch was considered that all stackable devices have LLTX flag. But the dsa doesn't have LLTX, so this splat happened. After this patch, dsa shares the same lockdep class key. On the nested dsa interface architecture, which you illustrated, the same lockdep class key will be used in __dev_queue_xmit() because dsa doesn't have LLTX. So that lockdep detects deadlock because the same lockdep class key is used recursively although actually the different locks are used. There are some ways to fix this problem. 1. using NETIF_F_LLTX flag. If possible, using the LLTX flag is a very clear way for it. But I'm so sorry I don't know whether the dsa could have LLTX or not. 2. using dynamic lockdep again. It means that each interface uses a separate lockdep class key. So, lockdep will not detect recursive locking. But this way has a problem that it could consume lockdep class key too many. Currently, lockdep can have 8192 lockdep class keys. - you can see this number with the following command. cat /proc/lockdep_stats lock-classes: 1251 [max: 8192] ... The [max: 8192] means that the maximum number of lockdep class keys. If too many lockdep class keys are registered, lockdep stops to work. So, using a dynamic(separated) lockdep class key should be considered carefully. In addition, updating lockdep class key routine might have to be existing. (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key()) 3. Using lockdep subclass. A lockdep class key could have 8 subclasses. The different subclass is considered different locks by lockdep infrastructure. But "lock-classes" is not counted by subclasses. So, it could avoid stopping lockdep infrastructure by an overflow of lockdep class keys. This approach should also have an updating lockdep class key routine. (lockdep_set_subclass()) 4. Using nonvalidate lockdep class key. The lockdep infrastructure supports nonvalidate lockdep class key type. It means this lockdep is not validated by lockdep infrastructure. So, the splat will not happen but lockdep couldn't detect real deadlock case because lockdep really doesn't validate it. I think this should be used for really special cases. (lockdep_set_novalidate_class()) Further discussion here: https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/ There appears to be no negative side-effect to declaring lockless TX for the DSA virtual interfaces, which means they handle their own locking. So that's what we do to make the splat go away. Patch tested in a wide variety of cases: unicast, multicast, PTP, etc. Fixes: ab92d68fc22f ("net: core: add generic lockdep keys") Suggested-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27copy_xstate_to_kernel(): don't leave parts of destination uninitializedAl Viro1-38/+48
copy the corresponding pieces of init_fpstate into the gaps instead. Cc: stable@kernel.org Tested-by: Alexander Potapenko <glider@google.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-27net: dsa: felix: send VLANs on CPU port as egress-taggedVladimir Oltean1-2/+6
As explained in other commits before (b9cd75e66895 and 87b0f983f66f), ocelot switches have a single egress-untagged VLAN per port, and the driver would deny adding a second one while an egress-untagged VLAN already exists. But on the CPU port (where the VLAN configuration is implicit, because there is no net device for the bridge to control), the DSA core attempts to add a VLAN using the same flags as were used for the front-panel port. This would make adding any untagged VLAN fail due to the CPU port rejecting the configuration: bridge vlan add dev swp0 vid 100 pvid untagged [ 1865.854253] mscc_felix 0000:00:00.5: Port already has a native VLAN: 1 [ 1865.860824] mscc_felix 0000:00:00.5: Failed to add VLAN 100 to port 5: -16 (note that port 5 is the CPU port and not the front-panel swp0). So this hardware will send all VLANs as tagged towards the CPU. Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27bridge: multicast: work around clang bugArnd Bergmann1-1/+2
Clang-10 and clang-11 run into a corner case of the register allocator on 32-bit ARM, leading to excessive stack usage from register spilling: net/bridge/br_multicast.c:2422:6: error: stack frame size of 1472 bytes in function 'br_multicast_get_stats' [-Werror,-Wframe-larger-than=] Work around this by marking one of the internal functions as noinline_for_stack. Link: https://bugs.llvm.org/show_bug.cgi?id=45802#c9 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27nvme-pci: avoid race between nvme_reap_pending_cqes() and nvme_poll()Dongli Zhang1-4/+7
There may be a race between nvme_reap_pending_cqes() and nvme_poll(), e.g., when doing live reset while polling the nvme device. CPU X CPU Y nvme_poll() nvme_dev_disable() -> nvme_stop_queues() -> nvme_suspend_io_queues() -> nvme_suspend_queue() -> spin_lock(&nvmeq->cq_poll_lock); -> nvme_reap_pending_cqes() -> nvme_process_cq() -> nvme_process_cq() In the above scenario, the nvme_process_cq() for the same queue may be running on both CPU X and CPU Y concurrently. It is much more easier to reproduce the issue when CONFIG_PREEMPT is enabled in kernel. When CONFIG_PREEMPT is disabled, it would take longer time for nvme_stop_queues()-->blk_mq_quiesce_queue() to wait for grace period. This patch protects nvme_process_cq() with nvmeq->cq_poll_lock in nvme_reap_pending_cqes(). Fixes: fa46c6fb5d61 ("nvme/pci: move cqe check after device shutdown") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-05-27vsock: fix timeout in vsock_accept()Stefano Garzarella1-1/+1
The accept(2) is an "input" socket interface, so we should use SO_RCVTIMEO instead of SO_SNDTIMEO to set the timeout. So this patch replace sock_sndtimeo() with sock_rcvtimeo() to use the right timeout in the vsock_accept(). Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27nfp: flower: fix used time of merge flow statisticsHeinrich Kuhn1-1/+2
Prior to this change the correct value for the used counter is calculated but not stored nor, therefore, propagated to user-space. In use-cases such as OVS use-case at least this results in active flows being removed from the hardware datapath. Which results in both unnecessary flow tear-down and setup, and packet processing on the host. This patch addresses the problem by saving the calculated used value which allows the value to propagate to user-space. Found by inspection. Fixes: aa6ce2ea0c93 ("nfp: flower: support stats update for merge flows") Signed-off-by: Heinrich Kuhn <heinrich.kuhn@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net/sched: fix infinite loop in sch_fq_pieDavide Caratti2-2/+23
this command hangs forever: # tc qdisc add dev eth0 root fq_pie flows 65536 watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [tc:1028] [...] CPU: 1 PID: 1028 Comm: tc Not tainted 5.7.0-rc6+ #167 RIP: 0010:fq_pie_init+0x60e/0x8b7 [sch_fq_pie] Code: 4c 89 65 50 48 89 f8 48 c1 e8 03 42 80 3c 30 00 0f 85 2a 02 00 00 48 8d 7d 10 4c 89 65 58 48 89 f8 48 c1 e8 03 42 80 3c 30 00 <0f> 85 a7 01 00 00 48 8d 7d 18 48 c7 45 10 46 c3 23 00 48 89 f8 48 RSP: 0018:ffff888138d67468 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: 1ffff9200018d2b2 RBX: ffff888139c1c400 RCX: ffffffffffffffff RDX: 000000000000c5e8 RSI: ffffc900000e5000 RDI: ffffc90000c69590 RBP: ffffc90000c69580 R08: fffffbfff79a9699 R09: fffffbfff79a9699 R10: 0000000000000700 R11: fffffbfff79a9698 R12: ffffc90000c695d0 R13: 0000000000000000 R14: dffffc0000000000 R15: 000000002347c5e8 FS: 00007f01e1850e40(0000) GS:ffff88814c880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000067c340 CR3: 000000013864c000 CR4: 0000000000340ee0 Call Trace: qdisc_create+0x3fd/0xeb0 tc_modify_qdisc+0x3be/0x14a0 rtnetlink_rcv_msg+0x5f3/0x920 netlink_rcv_skb+0x121/0x350 netlink_unicast+0x439/0x630 netlink_sendmsg+0x714/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5b4/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x9a/0x370 entry_SYSCALL_64_after_hwframe+0x44/0xa9 we can't accept 65536 as a valid number for 'nflows', because the loop on 'idx' in fq_pie_init() will never end. The extack message is correct, but it doesn't say that 0 is not a valid number for 'flows': while at it, fix this also. Add a tdc selftest to check correct validation of 'flows'. CC: Ivan Vecera <ivecera@redhat.com> Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27Merge tag 'fsnotify_for_v5.7-rc8' of ↵Linus Torvalds2-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify FAN_DIR_MODIFY disabling from Jan Kara: "A single patch that disables FAN_DIR_MODIFY support that was merged in this merge window. When discussing further functionality we realized it may be more logical to guard it with a feature flag or to call things slightly differently (or maybe not) so let's not set the API in stone for now." * tag 'fsnotify_for_v5.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: turn off support for FAN_DIR_MODIFY
2020-05-27Merge branch 'for-5.7-fixes' of ↵Linus Torvalds6-30/+29
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Reverted stricter synchronization for cgroup recursive stats which was prepping it for event counter usage which never got merged. The change was causing performation regressions in some cases. - Restore bpf-based device-cgroup operation even when cgroup1 device cgroup is disabled. - An out-param init fix. * 'for-5.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: device_cgroup: Cleanup cgroup eBPF device filter code xattr: fix uninitialized out-param Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race window"
2020-05-27RDMA/core: Fix double destruction of uobjectJason Gunthorpe2-8/+14
Fix use after free when user user space request uobject concurrently for the same object, within the RCU grace period. In that case, remove_handle_idr_uobject() is called twice and we will have an extra put on the uobject which cause use after free. Fix it by leaving the uobject write locked after it was removed from the idr. Call to rdma_lookup_put_uobject with UVERBS_LOOKUP_DESTROY instead of UVERBS_LOOKUP_WRITE will do the work. refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 1381 at lib/refcount.c:28 refcount_warn_saturate+0xfe/0x1a0 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 1381 Comm: syz-executor.0 Not tainted 5.5.0-rc3 #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x94/0xce panic+0x234/0x56f __warn+0x1cc/0x1e1 report_bug+0x200/0x310 fixup_bug.part.11+0x32/0x80 do_error_trap+0xd3/0x100 do_invalid_op+0x31/0x40 invalid_op+0x1e/0x30 RIP: 0010:refcount_warn_saturate+0xfe/0x1a0 Code: 0f 0b eb 9b e8 23 f6 6d ff 80 3d 6c d4 19 03 00 75 8d e8 15 f6 6d ff 48 c7 c7 c0 02 55 bd c6 05 57 d4 19 03 01 e8 a2 58 49 ff <0f> 0b e9 6e ff ff ff e8 f6 f5 6d ff 80 3d 42 d4 19 03 00 0f 85 5c RSP: 0018:ffffc90002df7b98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88810f6a193c RCX: ffffffffba649009 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88811b0283cc RBP: 0000000000000003 R08: ffffed10236060e3 R09: ffffed10236060e3 R10: 0000000000000001 R11: ffffed10236060e2 R12: ffff88810f6a193c R13: ffffc90002df7d60 R14: 0000000000000000 R15: ffff888116ae6a08 uverbs_uobject_put+0xfd/0x140 __uobj_perform_destroy+0x3d/0x60 ib_uverbs_close_xrcd+0x148/0x170 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465b49 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f759d122c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000073bfa8 RCX: 0000000000465b49 RDX: 000000000000000c RSI: 0000000020000080 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f759d1236bc R13: 00000000004ca27c R14: 000000000070de40 R15: 00000000ffffffff Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: 0x39400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: 7452a3c745a2 ("IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate") Link: https://lore.kernel.org/r/20200527135534.482279-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27fanotify: turn off support for FAN_DIR_MODIFYAmir Goldstein2-3/+2
FAN_DIR_MODIFY has been enabled by commit 44d705b0370b ("fanotify: report name info for FAN_DIR_MODIFY event") in 5.7-rc1. Now we are planning further extensions to the fanotify API and during that we realized that FAN_DIR_MODIFY may behave slightly differently to be more consistent with extensions we plan. So until we finalize these extensions, let's not bind our hands with exposing FAN_DIR_MODIFY to userland. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-27Merge branch 'exec-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull execve fix from Eric Biederman: "While working on my exec cleanups I found a bug in exec that winds up miscomputing the ambient credentials during exec. Andy appears to have to been confused as to why credentials are computed for both the script and the interpreter From the original patch description: [3] Linux very confusingly processes both the script and the interpreter if applicable, for reasons that elude me. The results from thinking about a script's file capabilities and/or setuid bits are mostly discarded. The only value in struct cred that gets changed in cap_bprm_set_creds that I could find that might persist between the script and the interpreter was cap_ambient. Which is fixed with this trivial change" * 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: exec: Always set cap_ambient in cap_bprm_set_creds
2020-05-27x86: Hide the archdata.iommu field behind generic IOMMU_APIKrzysztof Kozlowski1-1/+1
There is a generic, kernel wide configuration symbol for enabling the IOMMU specific bits: CONFIG_IOMMU_API. Implementations (including INTEL_IOMMU and AMD_IOMMU driver) select it so use it here as well. This makes the conditional archdata.iommu field consistent with other platforms and also fixes any compile test builds of other IOMMU drivers, when INTEL_IOMMU or AMD_IOMMU are not selected). For the case when INTEL_IOMMU/AMD_IOMMU and COMPILE_TEST are not selected, this should create functionally equivalent code/choice. With COMPILE_TEST this field could appear if other IOMMU drivers are chosen but neither INTEL_IOMMU nor AMD_IOMMU are not. Reported-by: kbuild test robot <lkp@intel.com> Fixes: e93a1695d7fb ("iommu: Enable compile testing for some of drivers") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200518120855.27822-2-krzk@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-27ia64: Hide the archdata.iommu field behind generic IOMMU_APIKrzysztof Kozlowski1-1/+1
There is a generic, kernel wide configuration symbol for enabling the IOMMU specific bits: CONFIG_IOMMU_API. Implementations (including INTEL_IOMMU driver) select it so use it here as well. This makes the conditional archdata.iommu field consistent with other platforms and also fixes any compile test builds of other IOMMU drivers, when INTEL_IOMMU is not selected). For the case when INTEL_IOMMU and COMPILE_TEST are not selected, this should create functionally equivalent code/choice. With COMPILE_TEST this field could appear if other IOMMU drivers are chosen but INTEL_IOMMU not. Reported-by: kbuild test robot <lkp@intel.com> Fixes: e93a1695d7fb ("iommu: Enable compile testing for some of drivers") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20200518120855.27822-1-krzk@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-27Merge tag 'gpio-fixes-for-v5.7' of ↵Linus Walleij3-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into fixes gpio fixes for v5.7 - fix mutex and spinlock ordering in gpio-mlxbf2 - fix the return value checks on devm_platform_ioremap_resource in gpio-pxa and gpio-bcm-kona
2020-05-27netfilter: nf_conntrack_pptp: fix compilation warning with W=1 buildPablo Neira Ayuso2-2/+2
>> include/linux/netfilter/nf_conntrack_pptp.h:13:20: warning: 'const' type qualifier on return type has no effect [-Wignored-qualifiers] extern const char *const pptp_msg_name(u_int16_t msg); ^~~~~~ Reported-by: kbuild test robot <lkp@intel.com> Fixes: 4c559f15efcc ("netfilter: nf_conntrack_pptp: prevent buffer overflows in debug code") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: conntrack: comparison of unsigned in cthelper confirmationPablo Neira Ayuso1-1/+1
net/netfilter/nf_conntrack_core.c: In function nf_confirm_cthelper: net/netfilter/nf_conntrack_core.c:2117:15: warning: comparison of unsigned expression in < 0 is always false [-Wtype-limits] 2117 | if (protoff < 0 || (frag_off & htons(~0x7)) != 0) | ^ ipv6_skip_exthdr() returns a signed integer. Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: 703acd70f249 ("netfilter: nfnetlink_cthelper: unbreak userspace helper support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: conntrack: Pass value of ctinfo to __nf_conntrack_updateNathan Chancellor1-3/+3
Clang warns: net/netfilter/nf_conntrack_core.c:2068:21: warning: variable 'ctinfo' is uninitialized when used here [-Wuninitialized] nf_ct_set(skb, ct, ctinfo); ^~~~~~ net/netfilter/nf_conntrack_core.c:2024:2: note: variable 'ctinfo' is declared here enum ip_conntrack_info ctinfo; ^ 1 warning generated. nf_conntrack_update was split up into nf_conntrack_update and __nf_conntrack_update, where the assignment of ctinfo is in nf_conntrack_update but it is used in __nf_conntrack_update. Pass the value of ctinfo from nf_conntrack_update to __nf_conntrack_update so that uninitialized memory is not used and everything works properly. Fixes: ee04805ff54a ("netfilter: conntrack: make conntrack userspace helpers work again") Link: https://github.com/ClangBuiltLinux/linux/issues/1039 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27ceph: flush release queue when handling caps for unknown inodeJeff Layton1-1/+1
It's possible for the VFS to completely forget about an inode, but for it to still be sitting on the cap release queue. If the MDS sends the client a cap message for such an inode, it just ignores it today, which can lead to a stall of up to 5s until the cap release queue is flushed. If we get a cap message for an inode that can't be located, then go ahead and flush the cap release queue. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/45532 Fixes: 1e9c2eb6811e ("ceph: delete stale dentry when last reference is dropped") Reported-and-Tested-by: Andrej Filipčič <andrej.filipcic@ijs.si> Suggested-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>