summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2021-04-07ethtool: Remove link_mode param and derive link params from driverDanielle Ratson1-1/+8
Some drivers clear the 'ethtool_link_ksettings' struct in their get_link_ksettings() callback, before populating it with actual values. Such drivers will set the new 'link_mode' field to zero, resulting in user space receiving wrong link mode information given that zero is a valid value for the field. Another problem is that some drivers (notably tun) can report random values in the 'link_mode' field. This can result in a general protection fault when the field is used as an index to the 'link_mode_params' array [1]. This happens because such drivers implement their set_link_ksettings() callback by simply overwriting their private copy of 'ethtool_link_ksettings' struct with the one they get from the stack, which is not always properly initialized. Fix these problems by removing 'link_mode' from 'ethtool_link_ksettings' and instead have drivers call ethtool_params_from_link_mode() with the current link mode. The function will derive the link parameters (e.g., speed) from the link mode and fill them in the 'ethtool_link_ksettings' struct. v3: * Remove link_mode parameter and derive the link parameters in the driver instead of passing link_mode parameter to ethtool and derive it there. v2: * Introduce 'cap_link_mode_supported' instead of adding a validity field to 'ethtool_link_ksettings' struct. [1] general protection fault, probably for non-canonical address 0xdffffc00f14cc32c: 0000 [#1] PREEMPT SMP KASAN KASAN: probably user-memory-access in range [0x000000078a661960-0x000000078a661967] CPU: 0 PID: 8452 Comm: syz-executor360 Not tainted 5.11.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__ethtool_get_link_ksettings+0x1a3/0x3a0 net/ethtool/ioctl.c:446 Code: b7 3e fa 83 fd ff 0f 84 30 01 00 00 e8 16 b0 3e fa 48 8d 3c ed 60 d5 69 8a 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 +38 d0 7c 08 84 d2 0f 85 b9 RSP: 0018:ffffc900019df7a0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff888026136008 RCX: 0000000000000000 RDX: 00000000f14cc32c RSI: ffffffff873439ca RDI: 000000078a661960 RBP: 00000000ffff8880 R08: 00000000ffffffff R09: ffff88802613606f R10: ffffffff873439bc R11: 0000000000000000 R12: 0000000000000000 R13: ffff88802613606c R14: ffff888011d0c210 R15: ffff888011d0c210 FS: 0000000000749300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004b60f0 CR3: 00000000185c2000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: linkinfo_prepare_data+0xfd/0x280 net/ethtool/linkinfo.c:37 ethnl_default_notify+0x1dc/0x630 net/ethtool/netlink.c:586 ethtool_notify+0xbd/0x1f0 net/ethtool/netlink.c:656 ethtool_set_link_ksettings+0x277/0x330 net/ethtool/ioctl.c:620 dev_ethtool+0x2b35/0x45d0 net/ethtool/ioctl.c:2842 dev_ioctl+0x463/0xb70 net/core/dev_ioctl.c:440 sock_do_ioctl+0x148/0x2d0 net/socket.c:1060 sock_ioctl+0x477/0x6a0 net/socket.c:1177 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: c8907043c6ac9 ("ethtool: Get link mode in use instead of speed and duplex parameters") Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07Merge tag 'mlx5-fixes-2021-04-06' of ↵David S. Miller1-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-04-06 This series provides some fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07ethtool: fix kdoc in headersJakub Kicinski2-2/+13
Fix remaining issues with kdoc in the ethtool headers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07ethtool: document reserved fields in the uAPIJakub Kicinski1-1/+21
Add a note on expected handling of reserved fields, and references to all kdocs. This fixes a bunch of kdoc warnings. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07ethtool: un-kdocify extended link stateJakub Kicinski2-23/+7
Extended link state structures and enums use kdoc headers but then do not describe any of the members. Convert to normal comments. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06net/mlx5: Fix PBMC register mappingAya Levin1-1/+1
Add reserved mapping to cover all the register in order to avoid setting arbitrary values to newer FW which implements the reserved fields. Fixes: 50b4a3c23646 ("net/mlx5: PPTB and PBMC register firmware command support") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06net/mlx5: Fix PPLM register mappingAya Levin1-0/+2
Add reserved mapping to cover all the register in order to avoid setting arbitrary values to newer FW which implements the reserved fields. Fixes: a58837f52d43 ("net/mlx5e: Expose FEC feilds and related capability bit") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06net/mlx5: Fix placement of log_max_flow_counterRaed Salem1-3/+3
The cited commit wrongly placed log_max_flow_counter field of mlx5_ifc_flow_table_prop_layout_bits, align it to the HW spec intended placement. Fixes: 16f1c5bb3ed7 ("net/mlx5: Check device capability for maximum flow counters") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06virtio_net: Do not pull payload in skb->headEric Dumazet1-5/+9
Xuan Zhuo reported that commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") brought a ~10% performance drop. The reason for the performance drop was that GRO was forced to chain sk_buff (using skb_shinfo(skb)->frag_list), which uses more memory but also cause packet consumers to go over a lot of overhead handling all the tiny skbs. It turns out that virtio_net page_to_skb() has a wrong strategy : It allocates skbs with GOOD_COPY_LEN (128) bytes in skb->head, then copies 128 bytes from the page, before feeding the packet to GRO stack. This was suboptimal before commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") because GRO was using 2 frags per MSS, meaning we were not packing MSS with 100% efficiency. Fix is to pull only the ethernet header in page_to_skb() Then, we change virtio_net_hdr_to_skb() to pull the missing headers, instead of assuming they were already pulled by callers. This fixes the performance regression, but could also allow virtio_net to accept packets with more than 128bytes of headers. Many thanks to Xuan Zhuo for his report, and his tests/help. Fixes: 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://www.spinics.net/lists/netdev/msg731397.html Co-Developed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-0/+2
Alexei Starovoitov says: ==================== pull-request: bpf 2021-04-01 The following pull-request contains BPF updates for your *net* tree. We've added 11 non-merge commits during the last 8 day(s) which contain a total of 10 files changed, 151 insertions(+), 26 deletions(-). The main changes are: 1) xsk creation fixes, from Ciara. 2) bpf_get_task_stack fix, from Dave. 3) trampoline in modules fix, from Jiri. 4) bpf_obj_get fix for links and progs, from Lorenz. 5) struct_ops progs must be gpl compatible fix, from Toke. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31Revert "net: correct sk_acceptq_is_full()"Eric Dumazet1-1/+5
This reverts commit f211ac154577ec9ccf07c15f18a6abf0d9bdb4ab. We had similar attempt in the past, and we reverted it. History: 64a146513f8f12ba204b7bf5cb7e9505594ead42 [NET]: Revert incorrect accept queue backlog changes. 8488df894d05d6fa41c2bd298c335f944bb0e401 [NET]: Fix bugs in "Whether sock accept queue is full" checking I am adding a fat comment so that future attempts will be much harder. Fixes: f211ac154577 ("net: correct sk_acceptq_is_full()") Cc: iuyacan <yacanliu@163.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31Merge branch 'master' of ↵David S. Miller2-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2021-03-31 1) Fix ipv4 pmtu checks for xfrm anf vti interfaces. From Eyal Birger. 2) There are situations where the socket passed to xfrm_output_resume() is not the same as the one attached to the skb. Use the socket passed to xfrm_output_resume() to avoid lookup failures when xfrm is used with VRFs. From Evan Nimmo. 3) Make the xfrm_state_hash_generation sequence counter per network namespace because but its write serialization lock is also per network namespace. Write protection is insufficient otherwise. From Ahmed S. Darwish. 4) Fixup sctp featue flags when used with esp offload. From Xin Long. 5) xfrm BEET mode doesn't support fragments for inner packets. This is a limitation of the protocol, so no fix possible. Warn at least to notify the user about that situation. From Xin Long. 6) Fix NULL pointer dereference on policy lookup when namespaces are uses in combination with esp offload. 7) Fix incorrect transformation on esp offload when packets get segmented at layer 3. 8) Fix some user triggered usages of WARN_ONCE in the xfrm compat layer. From Dmitry Safonov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30net: ensure mac header is set in virtio_net_hdr_to_skb()Eric Dumazet1-0/+2
Commit 924a9bc362a5 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct") added a call to dev_parse_header_protocol() but mac_header is not yet set. This means that eth_hdr() reads complete garbage, and syzbot complained about it [1] This patch resets mac_header earlier, to get more coverage about this change. Audit of virtio_net_hdr_to_skb() callers shows that this change should be safe. [1] BUG: KASAN: use-after-free in eth_header_parse_protocol+0xdc/0xe0 net/ethernet/eth.c:282 Read of size 2 at addr ffff888017a6200b by task syz-executor313/8409 CPU: 1 PID: 8409 Comm: syz-executor313 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 eth_header_parse_protocol+0xdc/0xe0 net/ethernet/eth.c:282 dev_parse_header_protocol include/linux/netdevice.h:3177 [inline] virtio_net_hdr_to_skb.constprop.0+0x99d/0xcd0 include/linux/virtio_net.h:83 packet_snd net/packet/af_packet.c:2994 [inline] packet_sendmsg+0x2325/0x52b0 net/packet/af_packet.c:3031 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 sock_no_sendpage+0xf3/0x130 net/core/sock.c:2860 kernel_sendpage.part.0+0x1ab/0x350 net/socket.c:3631 kernel_sendpage net/socket.c:3628 [inline] sock_sendpage+0xe5/0x140 net/socket.c:947 pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364 splice_from_pipe_feed fs/splice.c:418 [inline] __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562 splice_from_pipe fs/splice.c:597 [inline] generic_splice_sendpage+0xd4/0x140 fs/splice.c:746 do_splice_from fs/splice.c:767 [inline] do_splice+0xb7e/0x1940 fs/splice.c:1079 __do_splice+0x134/0x250 fs/splice.c:1144 __do_sys_splice fs/splice.c:1350 [inline] __se_sys_splice fs/splice.c:1332 [inline] __x64_sys_splice+0x198/0x250 fs/splice.c:1332 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 Fixes: 924a9bc362a5 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Balazs Nemeth <bnemeth@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30net: let skb_orphan_partial wake-up waiters.Paolo Abeni1-0/+9
Currently the mentioned helper can end-up freeing the socket wmem without waking-up any processes waiting for more write memory. If the partially orphaned skb is attached to an UDP (or raw) socket, the lack of wake-up can hang the user-space. Even for TCP sockets not calling the sk destructor could have bad effects on TSQ. Address the issue using skb_orphan to release the sk wmem before setting the new sock_efree destructor. Additionally bundle the whole ownership update in a new helper, so that later other potential users could avoid duplicate code. v1 -> v2: - use skb_orphan() instead of sort of open coding it (Eric) - provide an helper for the ownership change (Eric) Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-29can: uapi: can.h: mark union inside struct can_frame packedMarc Kleine-Budde1-1/+1
In commit ea7800565a12 ("can: add optional DLC element to Classical CAN frame structure") the struct can_frame::can_dlc was put into an anonymous union with another u8 variable. For various reasons some members in struct can_frame and canfd_frame including the first 8 byes of data are expected to have the same memory layout. This is enforced by a BUILD_BUG_ON check in af_can.c. Since the above mentioned commit this check fails on ARM kernels compiled with the ARM OABI (which means CONFIG_AEABI not set). In this case -mabi=apcs-gnu is passed to the compiler, which leads to a structure size boundary of 32, instead of 8 compared to CONFIG_AEABI enabled. This means the the union in struct can_frame takes 4 bytes instead of the expected 1. Rong Chen illustrates the problem with pahole in the ARM OABI case: | struct can_frame { | canid_t can_id; /* 0 4 */ | union { | __u8 len; /* 4 1 */ | __u8 can_dlc; /* 4 1 */ | }; /* 4 4 */ | __u8 __pad; /* 8 1 */ | __u8 __res0; /* 9 1 */ | __u8 len8_dlc; /* 10 1 */ | | /* XXX 5 bytes hole, try to pack */ | | __u8 data[8] | __attribute__((__aligned__(8))); /* 16 8 */ | | /* size: 24, cachelines: 1, members: 6 */ | /* sum members: 19, holes: 1, sum holes: 5 */ | /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */ | /* last cacheline: 24 bytes */ | } __attribute__((__aligned__(8))); Marking the anonymous union as __attribute__((packed)) fixes the BUILD_BUG_ON problem on these compilers. Fixes: ea7800565a12 ("can: add optional DLC element to Classical CAN frame structure") Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Rong Chen <rong.a.chen@intel.com> Link: https://lore.kernel.org/linux-can/2c82ec23-3551-61b5-1bd8-178c3407ee83@hartkopp.net/ Link: https://lore.kernel.org/r/20210325125850.1620-3-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-03-26bpf: Take module reference for trampoline in moduleJiri Olsa1-0/+2
Currently module can be unloaded even if there's a trampoline register in it. It's easily reproduced by running in parallel: # while :; do ./test_progs -t module_attach; done # while :; do rmmod bpf_testmod; sleep 0.5; done Taking the module reference in case the trampoline's ip is within the module code. Releasing it when the trampoline's ip is unregistered. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210326105900.151466-1-jolsa@kernel.org
2021-03-25Merge branch '40GbE' of ↵David S. Miller1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-03-25 This series contains updates to virtchnl header file and i40e driver. Norbert removes added padding from virtchnl RSS structures as this causes issues when iterating over the arrays. Mateusz adds Asym_Pause as supported to allow these settings to be set as the hardware supports it. Eryk fixes an issue where encountering a VF reset alongside releasing VFs could cause a call trace. Arkadiusz moves TC setup before resource setup as previously it was possible to enter with a null q_vector causing a kernel oops. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25sch_red: fix off-by-one checks in red_check_params()Eric Dumazet1-2/+2
This fixes following syzbot report: UBSAN: shift-out-of-bounds in ./include/net/red.h:237:23 shift exponent 32 is too large for 32-bit type 'unsigned int' CPU: 1 PID: 8418 Comm: syz-executor170 Not tainted 5.12.0-rc4-next-20210324-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327 red_set_parms include/net/red.h:237 [inline] choke_change.cold+0x3c/0xc8 net/sched/sch_choke.c:414 qdisc_create+0x475/0x12f0 net/sched/sch_api.c:1247 tc_modify_qdisc+0x4c8/0x1a50 net/sched/sch_api.c:1663 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x43f039 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffdfa725168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000400488 RCX: 000000000043f039 RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000004 RBP: 0000000000403020 R08: 0000000000400488 R09: 0000000000400488 R10: 0000000000400488 R11: 0000000000000246 R12: 00000000004030b0 R13: 0000000000000000 R14: 00000000004ac018 R15: 0000000000400488 Fixes: 8afa10cbe281 ("net_sched: red: Avoid illegal values") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds4-12/+35
Merge misc fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: mm (hugetlb, kasan, gup, selftests, z3fold, kfence, memblock, and highmem), squashfs, ia64, gcov, and mailmap" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mailmap: update Andrey Konovalov's email address mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP mm: memblock: fix section mismatch warning again kfence: make compatible with kmemleak gcov: fix clang-11+ support ia64: fix format strings for err_inject ia64: mca: allocate early mca with GFP_ATOMIC squashfs: fix xattr id and id lookup sanity checks squashfs: fix inode lookup sanity checks z3fold: prevent reclaim/free race for headless pages selftests/vm: fix out-of-tree build mm/mmu_notifiers: ensure range_end() is paired with range_start() kasan: fix per-page tags for non-page_alloc pages hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
2021-03-25virtchnl: Fix layout of RSS structuresNorbert Ciosek1-2/+0
Remove padding from RSS structures. Previous layout could lead to unwanted compiler optimizations in loops when iterating over key and lut arrays. Fixes: 65ece6de0114 ("virtchnl: Add missing explicit padding to structures") Signed-off-by: Norbert Ciosek <norbertx.ciosek@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-25mm: memblock: fix section mismatch warning againMike Rapoport1-2/+2
Commit 34dc2efb39a2 ("memblock: fix section mismatch warning") marked memblock_bottom_up() and memblock_set_bottom_up() as __init, but they could be referenced from non-init functions like memblock_find_in_range_node() on architectures that enable CONFIG_ARCH_KEEP_MEMBLOCK. For such builds kernel test robot reports: WARNING: modpost: vmlinux.o(.text+0x74fea4): Section mismatch in reference from the function memblock_find_in_range_node() to the function .init.text:memblock_bottom_up() The function memblock_find_in_range_node() references the function __init memblock_bottom_up(). This is often because memblock_find_in_range_node lacks a __init annotation or the annotation of memblock_bottom_up is wrong. Replace __init annotations with __init_memblock annotations so that the appropriate section will be selected depending on CONFIG_ARCH_KEEP_MEMBLOCK. Link: https://lore.kernel.org/lkml/202103160133.UzhgY0wt-lkp@intel.com Link: https://lkml.kernel.org/r/20210316171347.14084-1-rppt@kernel.org Fixes: 34dc2efb39a2 ("memblock: fix section mismatch warning") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25mm/mmu_notifiers: ensure range_end() is paired with range_start()Sean Christopherson1-5/+5
If one or more notifiers fails .invalidate_range_start(), invoke .invalidate_range_end() for "all" notifiers. If there are multiple notifiers, those that did not fail are expecting _start() and _end() to be paired, e.g. KVM's mmu_notifier_count would become imbalanced. Disallow notifiers that can fail _start() from implementing _end() so that it's unnecessary to either track which notifiers rejected _start(), or had already succeeded prior to a failed _start(). Note, the existing behavior of calling _start() on all notifiers even after a previous notifier failed _start() was an unintented "feature". Make it canon now that the behavior is depended on for correctness. As of today, the bug is likely benign: 1. The only caller of the non-blocking notifier is OOM kill. 2. The only notifiers that can fail _start() are the i915 and Nouveau drivers. 3. The only notifiers that utilize _end() are the SGI UV GRU driver and KVM. 4. The GRU driver will never coincide with the i195/Nouveau drivers. 5. An imbalanced kvm->mmu_notifier_count only causes soft lockup in the _guest_, and the guest is already doomed due to being an OOM victim. Fix the bug now to play nice with future usage, e.g. KVM has a potential use case for blocking memslot updates in KVM while an invalidation is in-progress, and failure to unblock would result in said updates being blocked indefinitely and hanging. Found by inspection. Verified by adding a second notifier in KVM that periodically returns -EAGAIN on non-blockable ranges, triggering OOM, and observing that KVM exits with an elevated notifier count. Link: https://lkml.kernel.org/r/20210311180057.1582638-1-seanjc@google.com Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") Signed-off-by: Sean Christopherson <seanjc@google.com> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: David Rientjes <rientjes@google.com> Cc: Ben Gardon <bgardon@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25kasan: fix per-page tags for non-page_alloc pagesAndrey Konovalov1-3/+15
To allow performing tag checks on page_alloc addresses obtained via page_address(), tag-based KASAN modes store tags for page_alloc allocations in page->flags. Currently, the default tag value stored in page->flags is 0x00. Therefore, page_address() returns a 0x00ffff... address for pages that were not allocated via page_alloc. This might cause problems. A particular case we encountered is a conflict with KFENCE. If a KFENCE-allocated slab object is being freed via kfree(page_address(page) + offset), the address passed to kfree() will get tagged with 0x00 (as slab pages keep the default per-page tags). This leads to is_kfence_address() check failing, and a KFENCE object ending up in normal slab freelist, which causes memory corruptions. This patch changes the way KASAN stores tag in page-flags: they are now stored xor'ed with 0xff. This way, KASAN doesn't need to initialize per-page flags for every created page, which might be slow. With this change, page_address() returns natively-tagged (with 0xff) pointers for pages that didn't have tags set explicitly. This patch fixes the encountered conflict with KFENCE and prevents more similar issues that can occur in the future. Link: https://lkml.kernel.org/r/1a41abb11c51b264511d9e71c303bb16d5cb367b.1615475452.git.andreyknvl@google.com Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappingsMiaohe Lin1-2/+13
The current implementation of hugetlb_cgroup for shared mappings could have different behavior. Consider the following two scenarios: 1.Assume initial css reference count of hugetlb_cgroup is 1: 1.1 Call hugetlb_reserve_pages with from = 1, to = 2. So css reference count is 2 associated with 1 file_region. 1.2 Call hugetlb_reserve_pages with from = 2, to = 3. So css reference count is 3 associated with 2 file_region. 1.3 coalesce_file_region will coalesce these two file_regions into one. So css reference count is 3 associated with 1 file_region now. 2.Assume initial css reference count of hugetlb_cgroup is 1 again: 2.1 Call hugetlb_reserve_pages with from = 1, to = 3. So css reference count is 2 associated with 1 file_region. Therefore, we might have one file_region while holding one or more css reference counts. This inconsistency could lead to imbalanced css_get() and css_put() pair. If we do css_put one by one (i.g. hole punch case), scenario 2 would put one more css reference. If we do css_put all together (i.g. truncate case), scenario 1 will leak one css reference. The imbalanced css_get() and css_put() pair would result in a non-zero reference when we try to destroy the hugetlb cgroup. The hugetlb cgroup directory is removed __but__ associated resource is not freed. This might result in OOM or can not create a new hugetlb cgroup in a busy workload ultimately. In order to fix this, we have to make sure that one file_region must hold exactly one css reference. So in coalesce_file_region case, we should release one css reference before coalescence. Also only put css reference when the entire file_region is removed. The last thing to note is that the caller of region_add() will only hold one reference to h_cg->css for the whole contiguous reservation region. But this area might be scattered when there are already some file_regions reside in it. As a result, many file_regions may share only one h_cg->css reference. In order to ensure that one file_region must hold exactly one css reference, we should do css_get() for each file_region and release the reference held by caller when they are done. [linmiaohe@huawei.com: fix imbalanced css_get and css_put pair for shared mappings] Link: https://lkml.kernel.org/r/20210316023002.53921-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20210301120540.37076-1-linmiaohe@huawei.com Fixes: 075a61d07a8e ("hugetlb_cgroup: add accounting for shared mappings") Reported-by: kernel test robot <lkp@intel.com> (auto build test ERROR) Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Wanpeng Li <liwp.linux@gmail.com> Cc: Mina Almasry <almasrymina@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds16-27/+104
Pull networking fixes from David Miller: "Various fixes, all over: 1) Fix overflow in ptp_qoriq_adjfine(), from Yangbo Lu. 2) Always store the rx queue mapping in veth, from Maciej Fijalkowski. 3) Don't allow vmlinux btf in map_create, from Alexei Starovoitov. 4) Fix memory leak in octeontx2-af from Colin Ian King. 5) Use kvalloc in bpf x86 JIT for storing jit'd addresses, from Yonghong Song. 6) Fix tx ptp stats in mlx5, from Aya Levin. 7) Check correct ip version in tun decap, fropm Roi Dayan. 8) Fix rate calculation in mlx5 E-Switch code, from arav Pandit. 9) Work item memork leak in mlx5, from Shay Drory. 10) Fix ip6ip6 tunnel crash with bpf, from Daniel Borkmann. 11) Lack of preemptrion awareness in macvlan, from Eric Dumazet. 12) Fix data race in pxa168_eth, from Pavel Andrianov. 13) Range validate stab in red_check_params(), from Eric Dumazet. 14) Inherit vlan filtering setting properly in b53 driver, from Florian Fainelli. 15) Fix rtnl locking in igc driver, from Sasha Neftin. 16) Pause handling fixes in igc driver, from Muhammad Husaini Zulkifli. 17) Missing rtnl locking in e1000_reset_task, from Vitaly Lifshits. 18) Use after free in qlcnic, from Lv Yunlong. 19) fix crash in fritzpci mISDN, from Tong Zhang. 20) Premature rx buffer reuse in igb, from Li RongQing. 21) Missing termination of ip[a driver message handler arrays, from Alex Elder. 22) Fix race between "x25_close" and "x25_xmit"/"x25_rx" in hdlc_x25 driver, from Xie He. 23) Use after free in c_can_pci_remove(), from Tong Zhang. 24) Uninitialized variable use in nl80211, from Jarod Wilson. 25) Off by one size calc in bpf verifier, from Piotr Krysiuk. 26) Use delayed work instead of deferrable for flowtable GC, from Yinjun Zhang. 27) Fix infinite loop in NPC unmap of octeontx2 driver, from Hariprasad Kelam. 28) Fix being unable to change MTU of dwmac-sun8i devices due to lack of fifo sizes, from Corentin Labbe. 29) DMA use after free in r8169 with WoL, fom Heiner Kallweit. 30) Mismatched prototypes in isdn-capi, from Arnd Bergmann. 31) Fix psample UAPI breakage, from Ido Schimmel" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (171 commits) psample: Fix user API breakage math: Export mul_u64_u64_div_u64 ch_ktls: fix enum-conversion warning octeontx2-af: Fix memory leak of object buf ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation net: bridge: don't notify switchdev for local FDB addresses net/sched: act_ct: clear post_ct if doing ct_clear net: dsa: don't assign an error value to tag_ops isdn: capi: fix mismatched prototypes net/mlx5: SF, do not use ecpu bit for vhca state processing net/mlx5e: Fix division by 0 in mlx5e_select_queue net/mlx5e: Fix error path for ethtool set-priv-flag net/mlx5e: Offload tuple rewrite for non-CT flows net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP net/mlx5: Add back multicast stats for uplink representor net: ipconfig: ic_dev can be NULL in ic_close_devs MAINTAINERS: Combine "QLOGIC QLGE 10Gb ETHERNET DRIVER" sections into one docs: networking: Fix a typo r8169: fix DMA being used after buffer free if WoL is enabled net: ipa: fix init header command validation ...
2021-03-24psample: Fix user API breakageIdo Schimmel1-4/+1
Cited commit added a new attribute before the existing group reference count attribute, thereby changing its value and breaking existing applications on new kernels. Before: # psample -l libpsample ERROR psample_group_foreach: failed to recv message: Operation not supported After: # psample -l Group Num Refcount Group Seq 1 1 0 Fix by restoring the value of the old attribute and remove the misleading comments from the enumerator to avoid future bugs. Cc: stable@vger.kernel.org Fixes: d8bed686ab96 ("net: psample: Add tunnel support") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Adiel Bidani <adielb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24xfrm: Fix NULL pointer dereference on policy lookupSteffen Klassert1-1/+1
When xfrm interfaces are used in combination with namespaces and ESP offload, we get a dst_entry NULL pointer dereference. This is because we don't have a dst_entry attached in the ESP offloading case and we need to do a policy lookup before the namespace transition. Fix this by expicit checking of skb_dst(skb) before accessing it. Fixes: f203b76d78092 ("xfrm: Add virtual xfrm interfaces") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-03-23mm/writeback: Add wait_on_page_writeback_killableMatthew Wilcox (Oracle)1-0/+1
This is the killable version of wait_on_page_writeback. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: kafs-testing@auristor.com cc: linux-afs@lists.infradead.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20210320054104.1300774-3-willy@infradead.org
2021-03-23fs/cachefiles: Remove wait_bit_key layout dependencyMatthew Wilcox (Oracle)1-1/+0
Cachefiles was relying on wait_page_key and wait_bit_key being the same layout, which is fragile. Now that wait_page_key is exposed in the pagemap.h header, we can remove that fragility A comment on the need to maintain structure layout equivalence was added by Linus[1] and that is no longer applicable. Fixes: 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: kafs-testing@auristor.com cc: linux-cachefs@redhat.com cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20210320054104.1300774-2-willy@infradead.org/ Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3510ca20ece0150af6b10c77a74ff1b5c198e3e2 [1]
2021-03-22net: xfrm: Use sequence counter with associated spinlockAhmed S. Darwish1-1/+1
A sequence counter write section must be serialized or its internal state can get corrupted. A plain seqcount_t does not contain the information of which lock must be held to guaranteee write side serialization. For xfrm_state_hash_generation, use seqcount_spinlock_t instead of plain seqcount_t. This allows to associate the spinlock used for write serialization with the sequence counter. It thus enables lockdep to verify that the write serialization lock is indeed held before entering the sequence counter write section. If lockdep is disabled, this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-03-22net: xfrm: Localize sequence counter per network namespaceAhmed S. Darwish1-1/+3
A sequence counter write section must be serialized or its internal state can get corrupted. The "xfrm_state_hash_generation" seqcount is global, but its write serialization lock (net->xfrm.xfrm_state_lock) is instantiated per network namespace. The write protection is thus insufficient. To provide full protection, localize the sequence counter per network namespace instead. This should be safe as both the seqcount read and write sections access data exclusively within the network namespace. It also lays the foundation for transforming "xfrm_state_hash_generation" data type from seqcount_t to seqcount_LOCKNAME_t in further commits. Fixes: b65e3d7be06f ("xfrm: state: add sequence count to detect hash resizes") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-03-21Merge tag 'usb-5.12-rc4' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB and Thunderbolt driver fixes from Greg KH: "Here are some small Thunderbolt and USB driver fixes for some reported issues: - thunderbolt fixes for minor problems - typec fixes for power issues - usb-storage quirk addition - usbip bugfix - dwc3 bugfix when stopping transfers - cdnsp bugfix for isoc transfers - gadget use-after-free fix All have been in linux-next this week with no reported issues" * tag 'usb-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: tcpm: Skip sink_cap query only when VDM sm is busy usb: dwc3: gadget: Prevent EP queuing while stopping transfers usb: typec: tcpm: Invoke power_supply_changed for tcpm-source-psy- usb: typec: Remove vdo[3] part of tps6598x_rx_identity_reg struct usb-storage: Add quirk to defeat Kindle's automatic unload usb: gadget: configfs: Fix KASAN use-after-free usbip: Fix incorrect double assignment to udc->ud.tcp_rx usb: cdnsp: Fixes incorrect value in ISOC TRB thunderbolt: Increase runtime PM reference count on DP tunnel discovery thunderbolt: Initialize HopID IDAs in tb_switch_alloc()
2021-03-21Merge tag 'locking-urgent-2021-03-21' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: - Get static calls & modules right. Hopefully. - WW mutex fixes * tag 'locking-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: static_call: Fix static_call_update() sanity check static_call: Align static_call_is_init() patching condition static_call: Fix static_call_set_init() locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini() locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling
2021-03-21Merge tag 'efi-urgent-2021-03-21' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: - another missing RT_PROP table related fix, to ensure that the efivarfs pseudo filesystem fails gracefully if variable services are unsupported - use the correct alignment for literal EFI GUIDs - fix a use after unmap issue in the memreserve code * tag 'efi-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: use 32-bit alignment for efi_guid_t literals firmware/efi: Fix a use after bug in efi_mem_reserve_persistent efivars: respect EFI_UNSUPPORTED return from firmware
2021-03-21Merge tag 'x86_urgent_for_v5.12-rc4' of ↵Linus Torvalds2-0/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "The freshest pile of shiny x86 fixes for 5.12: - Add the arch-specific mapping between physical and logical CPUs to fix devicetree-node lookups - Restore the IRQ2 ignore logic - Fix get_nr_restart_syscall() to return the correct restart syscall number. Split in a 4-patches set to avoid kABI breakage when backporting to dead kernels" * tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/of: Fix CPU devicetree-node lookups x86/ioapic: Ignore IRQ2 again x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall() x86: Move TS_COMPAT back to asm/thread_info.h kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
2021-03-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-0/+1
Alexei Starovoitov says: ==================== pull-request: bpf 2021-03-20 The following pull-request contains BPF updates for your *net* tree. We've added 5 non-merge commits during the last 3 day(s) which contain a total of 8 files changed, 155 insertions(+), 12 deletions(-). The main changes are: 1) Use correct nops in fexit trampoline, from Stanislav. 2) Fix BTF dump, from Jean-Philippe. 3) Fix umd memory leak, from Zqiang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-19Merge tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-blockLinus Torvalds1-25/+0
Pull io_uring fixes from Jens Axboe: "Quieter week this time, which was both expected and desired. About half of the below is fixes for this release, the other half are just fixes in general. In detail: - Fix the freezing of IO threads, by making the freezer not send them fake signals. Make them freezable by default. - Like we did for personalities, move the buffer IDR to xarray. Kills some code and avoids a use-after-free on teardown. - SQPOLL cleanups and fixes (Pavel) - Fix linked timeout race (Pavel) - Fix potential completion post use-after-free (Pavel) - Cleanup and move internal structures outside of general kernel view (Stefan) - Use MSG_SIGNAL for send/recv from io_uring (Stefan)" * tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-block: io_uring: don't leak creds on SQO attach error io_uring: use typesafe pointers in io_uring_task io_uring: remove structures from include/linux/io_uring.h io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls io_uring: fix sqpoll cancellation via task_work io_uring: add generic callback_head helpers io_uring: fix concurrent parking io_uring: halt SQO submission on ctx exit io_uring: replace sqd rw_semaphore with mutex io_uring: fix complete_post use ctx after free io_uring: fix ->flags races by linked timeouts io_uring: convert io_buffer_idr to XArray io_uring: allow IO worker threads to be frozen kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing
2021-03-19bpf: Fix umd memory leak in copy_process()Zqiang1-0/+1
The syzbot reported a memleak as follows: BUG: memory leak unreferenced object 0xffff888101b41d00 (size 120): comm "kworker/u4:0", pid 8, jiffies 4294944270 (age 12.780s) backtrace: [<ffffffff8125dc56>] alloc_pid+0x66/0x560 [<ffffffff81226405>] copy_process+0x1465/0x25e0 [<ffffffff81227943>] kernel_clone+0xf3/0x670 [<ffffffff812281a1>] kernel_thread+0x61/0x80 [<ffffffff81253464>] call_usermodehelper_exec_work [<ffffffff81253464>] call_usermodehelper_exec_work+0xc4/0x120 [<ffffffff812591c9>] process_one_work+0x2c9/0x600 [<ffffffff81259ab9>] worker_thread+0x59/0x5d0 [<ffffffff812611c8>] kthread+0x178/0x1b0 [<ffffffff8100227f>] ret_from_fork+0x1f/0x30 unreferenced object 0xffff888110ef5c00 (size 232): comm "kworker/u4:0", pid 8414, jiffies 4294944270 (age 12.780s) backtrace: [<ffffffff8154a0cf>] kmem_cache_zalloc [<ffffffff8154a0cf>] __alloc_file+0x1f/0xf0 [<ffffffff8154a809>] alloc_empty_file+0x69/0x120 [<ffffffff8154a8f3>] alloc_file+0x33/0x1b0 [<ffffffff8154ab22>] alloc_file_pseudo+0xb2/0x140 [<ffffffff81559218>] create_pipe_files+0x138/0x2e0 [<ffffffff8126c793>] umd_setup+0x33/0x220 [<ffffffff81253574>] call_usermodehelper_exec_async+0xb4/0x1b0 [<ffffffff8100227f>] ret_from_fork+0x1f/0x30 After the UMD process exits, the pipe_to_umh/pipe_from_umh and tgid need to be released. Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that populates bpffs.") Reported-by: syzbot+44908bb56d2bfe56b28e@syzkaller.appspotmail.com Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210317030915.2865-1-qiang.zhang@windriver.com
2021-03-19sch_red: Fix a typoBhaskar Chowdhury1-1/+1
s/recalcultion/recalculation/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-19Merge tag 'trace-v5.12-rc3' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull workqueue tracing fix from Steven Rostedt: "Fix workqueue trace event unsafe string reference After adding a verifier to test all strings printed in trace events to make sure they either point to a string on the ring buffer, or to read only core kernel memory, it triggered on a workqueue trace event. The trace event workqueue_queue_work references the allocated name of the workqueue in the output. If the workqueue is freed before the trace is read, then the trace will dereference freed memory. Update the trace event to use the __string(), __assign_str(), and __get_str() helpers to handle such cases" * tag 'trace-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: workqueue/tracing: Copy workqueue name to buffer in trace event
2021-03-19Merge tag 'efi-urgent-for-v5.12-rc3' of ↵Ingo Molnar1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/urgent Pull EFI fixes from Ard Biesheuvel: "- another missing RT_PROP table related fix, to ensure that the efivarfs pseudo filesystem fails gracefully if variable services are unsupported - use the correct alignment for literal EFI GUIDs - fix a use after unmap issue in the memreserve code" Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-19efi: use 32-bit alignment for efi_guid_t literalsArd Biesheuvel1-2/+4
Commit 494c704f9af0 ("efi: Use 32-bit alignment for efi_guid_t") updated the type definition of efi_guid_t to ensure that it always appears sufficiently aligned (the UEFI spec is ambiguous about this, but given the fact that its EFI_GUID type is defined in terms of a struct carrying a uint32_t, the natural alignment is definitely >= 32 bits). However, we missed the EFI_GUID() macro which is used to instantiate efi_guid_t literals: that macro is still based on the guid_t type, which does not have a minimum alignment at all. This results in warnings such as In file included from drivers/firmware/efi/mokvar-table.c:35: include/linux/efi.h:1093:34: warning: passing 1-byte aligned argument to 4-byte aligned parameter 2 of 'get_var' may result in an unaligned pointer access [-Walign-mismatch] status = get_var(L"SecureBoot", &EFI_GLOBAL_VARIABLE_GUID, NULL, &size, ^ include/linux/efi.h:1101:24: warning: passing 1-byte aligned argument to 4-byte aligned parameter 2 of 'get_var' may result in an unaligned pointer access [-Walign-mismatch] get_var(L"SetupMode", &EFI_GLOBAL_VARIABLE_GUID, NULL, &size, &setupmode); The distinction only matters on CPUs that do not support misaligned loads fully, but 32-bit ARM's load-multiple instructions fall into that category, and these are likely to be emitted by the compiler that built the firmware for loading word-aligned 128-bit GUIDs from memory So re-implement the initializer in terms of our own efi_guid_t type, so that the alignment becomes a property of the literal's type. Fixes: 494c704f9af0 ("efi: Use 32-bit alignment for efi_guid_t") Reported-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/1327 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2021-03-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2-5/+5
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Several patches to testore use of memory barriers instead of RCU to ensure consistent access to ruleset, from Mark Tomlinson. 2) Fix dump of expectation via ctnetlink, from Florian Westphal. 3) GRE helper works for IPv6, from Ludovic Senecaux. 4) Set error on unsupported flowtable flags. 5) Use delayed instead of deferrable workqueue in the flowtable, from Yinjun Zhang. 6) Fix spurious EEXIST in case of add-after-delete flowtable in the same batch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18Merge tag 'drm-fixes-2021-03-19' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-2/+4
Pull drm fixes from Dave Airlie: "Regular fixes pull, pretty small set of fixes, a couple of i915 and amdgpu, one ttm, one nouveau and one omap. Probably smaller than usual for this time, so we'll see if something pops up next week or if this will continue to stay small. Summary: ttm: - Make ttm_bo_unpin() not wraparound on too many unpins omap: - Fix coccicheck warning in omap amdgpu: - DCN 3.0 gamma fixes - DCN 2.1 corrupt screen fix i915: - Workaround async flip + VT-d frame corruption on HSW/BDW - Fix NMI watchdog crash due to uninitialized OA buffer use on gen12+ nouveau: - workaround oops with bo syncing" * tag 'drm-fixes-2021-03-19' of git://anongit.freedesktop.org/drm/drm: nouveau: Skip unvailable ttm page entries drm/amd/display: Remove MPC gamut remap logic for DCN30 drm/amd/display: Correct algorithm for reversed gamma drm/omap: dsi: fix unsigned expression compared with zero i915/perf: Start hrtimer only if sampling the OA buffer drm/i915: Workaround async flip + VT-d corruption on HSW/BDW drm/amd/display: Copy over soc values before bounding box creation drm/ttm: make ttm_bo_unpin more defensive
2021-03-19Merge tag 'drm-misc-fixes-2021-03-18' of ↵Dave Airlie1-2/+4
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.12-rc4: - Make ttm_bo_unpin() not wraparound on too many unpins. - Fix coccicheck warning in omap. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/a0e13bbb-6ba6-ff24-4db8-0e02e605de18@linux.intel.com
2021-03-18Merge tag 'vfio-v5.12-rc4' of git://github.com/awilliam/linux-vfioLinus Torvalds1-0/+11
Pull VFIO fixes from Alex Williamson: - Fix 32-bit issue with new unmap-all flag (Steve Sistare) - Various Kconfig changes for better coverage (Jason Gunthorpe) - Fix to batch pinning support (Daniel Jordan) * tag 'vfio-v5.12-rc4' of git://github.com/awilliam/linux-vfio: vfio/type1: fix vaddr_get_pfns() return in vfio_pin_page_external() vfio: Depend on MMU ARM: amba: Allow some ARM_AMBA users to compile with COMPILE_TEST vfio-platform: Add COMPILE_TEST to VFIO_PLATFORM vfio: IOMMU_API should be selected vfio/type1: fix unmap all on ILP32
2021-03-18Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2-7/+5
Pull virtio fixes from Michael Tsirkin: "Some fixes and cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost-vdpa: set v->config_ctx to NULL if eventfd_ctx_fdget() fails vhost-vdpa: fix use-after-free of v->config_ctx vhost: Fix vhost_vq_reset() vhost_vdpa: fix the missing irq_bypass_unregister_producer() invocation vdpa_sim: Skip typecasting from void* virtio: remove export for virtio_config_{enable, disable} virtio-mmio: Use to_virtio_mmio_device() to simply code vdpa: set the virtqueue num during register
2021-03-18workqueue/tracing: Copy workqueue name to buffer in trace eventSteven Rostedt (VMware)1-3/+3
The trace event "workqueue_queue_work" references an unsafe string in dereferencing the name of the workqueue. As the name is allocated, it could later be freed, and the pointer to that string could stay on the tracing buffer. If the trace buffer is read after the string is freed, it will reference an unsafe pointer. I added a new verifier to make sure that all strings referenced in the output of the trace buffer is safe to read and this triggered on the workqueue_queue_work trace event: workqueue_queue_work: work struct=00000000b2b235c7 function=gc_worker workqueue=(0xffff888100051160:events_power_efficient)[UNSAFE-MEMORY] req_cpu=256 cpu=1 workqueue_queue_work: work struct=00000000c344caec function=flush_to_ldisc workqueue=(0xffff888100054d60:events_unbound)[UNSAFE-MEMORY] req_cpu=256 cpu=4294967295 workqueue_queue_work: work struct=00000000b2b235c7 function=gc_worker workqueue=(0xffff888100051160:events_power_efficient)[UNSAFE-MEMORY] req_cpu=256 cpu=1 workqueue_queue_work: work struct=000000000b238b3f function=vmstat_update workqueue=(0xffff8881000c3760:mm_percpu_wq)[UNSAFE-MEMORY] req_cpu=1 cpu=1 Also, if this event is read via a user space application like perf or trace-cmd, the name would only be an address and useless information: workqueue_queue_work: work struct=0xffff953f80b4b918 function=disk_events_workfn workqueue=ffff953f8005d378 req_cpu=8192 cpu=5 Cc: Zqiang <qiang.zhang@windriver.com> Cc: Tejun Heo <tj@kernel.org> Fixes: 7bf9c4a88e3e3 ("workqueue: tracing the name of the workqueue instead of it's address") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18io_uring: remove structures from include/linux/io_uring.hStefan Metzmacher1-25/+0
Link: https://lore.kernel.org/r/8c1d14f3748105f4caeda01716d47af2fa41d11c.1615809009.git.metze@samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-4/+20
Daniel Borkmann says: ==================== pull-request: bpf 2021-03-18 The following pull-request contains BPF updates for your *net* tree. We've added 10 non-merge commits during the last 4 day(s) which contain a total of 14 files changed, 336 insertions(+), 94 deletions(-). The main changes are: 1) Fix fexit/fmod_ret trampoline for sleepable programs, and also fix a ftrace splat in modify_ftrace_direct() on address change, from Alexei Starovoitov. 2) Fix two oob speculation possibilities that allows unprivileged to leak mem via side-channel, from Piotr Krysiuk and Daniel Borkmann. 3) Fix libbpf's netlink handling wrt SOCK_CLOEXEC, from Kumar Kartikeya Dwivedi. 4) Fix libbpf's error handling on failure in getting section names, from Namhyung Kim. 5) Fix tunnel collect_md BPF selftest wrt Geneve option handling, from Hangbin Liu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>