summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2021-11-19cfg80211: move offchan_cac_event to a dedicated workLorenzo Bianconi1-10/+4
In order to make cfg80211_offchan_cac_abort() (renamed from cfg80211_offchan_cac_event) callable in other contexts and without so much locking restrictions, make it trigger a new work instead of operating directly. Do some other renames while at it to clarify. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/6145c3d0f30400a568023f67981981d24c7c6133.1635325205.git.lorenzo@kernel.org [rewrite commit log] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-19mac80211: introduce set_radar_offchan callbackLorenzo Bianconi1-0/+10
Similar to cfg80211, introduce set_radar_offchan callback in mac80211_ops in order to configure a dedicated offchannel chain available on some hw (e.g. mt7915) to perform offchannel CAC detection and avoid tx/rx downtime. Tested-by: Evelyn Tsai <evelyn.tsai@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/201110606d4f3a7dfdf31440e351f2e2c375d4f0.1634979655.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-19cfg80211: implement APIs for dedicated radar detection HWLorenzo Bianconi2-0/+38
If a dedicated (off-channel) radar detection hardware (chain) is available in the hardware/driver, allow this to be used by calling the NL80211_CMD_RADAR_DETECT command with a new flag attribute requesting off-channel radar detection is used. Offchannel CAC (channel availability check) avoids the CAC downtime when switching to a radar channel or when turning on the AP. Drivers advertise support for this using the new feature flag NL80211_EXT_FEATURE_RADAR_OFFCHAN. Tested-by: Evelyn Tsai <evelyn.tsai@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/7468e291ef5d05d692c1738d25b8f778d8ea5c3f.1634979655.git.lorenzo@kernel.org Link: https://lore.kernel.org/r/1e60e60fef00e14401adae81c3d49f3e5f307537.1634979655.git.lorenzo@kernel.org Link: https://lore.kernel.org/r/85fa50f57fc3adb2934c8d9ca0be30394de6b7e8.1634979655.git.lorenzo@kernel.org Link: https://lore.kernel.org/r/4b6c08671ad59aae0ac46fc94c02f31b1610eb72.1634979655.git.lorenzo@kernel.org Link: https://lore.kernel.org/r/241849ccaf2c228873c6f8495bf87b19159ba458.1634979655.git.lorenzo@kernel.org [remove offchan_mutex, fix cfg80211_stop_offchan_radar_detection(), remove gfp_t argument, fix documentation, fix tracing] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-18Merge tag 'regmap-no-bus-update-bits' of ↵Jakub Kicinski1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Mark Brown says: =================== regmap: Allow regmap_update_bits() to be offloaded with no bus Some hardware can do this so let's use that capability. =================== Link: https://lore.kernel.org/all/YZWDOidBOssP10yS@sirena.org.uk/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski40-1065/+2932
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-18Merge tag 'net-5.16-rc2' of ↵Linus Torvalds7-7/+35
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, mac80211. Current release - regressions: - devlink: don't throw an error if flash notification sent before devlink visible - page_pool: Revert "page_pool: disable dma mapping support...", turns out there are active arches who need it Current release - new code bugs: - amt: cancel delayed_work synchronously in amt_fini() Previous releases - regressions: - xsk: fix crash on double free in buffer pool - bpf: fix inner map state pruning regression causing program rejections - mac80211: drop check for DONT_REORDER in __ieee80211_select_queue, preventing mis-selecting the best effort queue - mac80211: do not access the IV when it was stripped - mac80211: fix radiotap header generation, off-by-one - nl80211: fix getting radio statistics in survey dump - e100: fix device suspend/resume Previous releases - always broken: - tcp: fix uninitialized access in skb frags array for Rx 0cp - bpf: fix toctou on read-only map's constant scalar tracking - bpf: forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs - tipc: only accept encrypted MSG_CRYPTO msgs - smc: transfer remaining wait queue entries during fallback, fix missing wake ups - udp: validate checksum in udp_read_sock() (when sockmap is used) - sched: act_mirred: drop dst for the direction from egress to ingress - virtio_net_hdr_to_skb: count transport header in UFO, prevent allowing bad skbs into the stack - nfc: reorder the logic in nfc_{un,}register_device, fix unregister - ipsec: check return value of ipv6_skip_exthdr - usb: r8152: add MAC passthrough support for more Lenovo Docks" * tag 'net-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits) ptp: ocp: Fix a couple NULL vs IS_ERR() checks net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock() net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound ipv6: check return value of ipv6_skip_exthdr e100: fix device suspend/resume devlink: Don't throw an error if flash notification sent before devlink visible page_pool: Revert "page_pool: disable dma mapping support..." ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port() octeontx2-af: debugfs: don't corrupt user memory NFC: add NCI_UNREG flag to eliminate the race NFC: reorder the logic in nfc_{un,}register_device NFC: reorganize the functions in nci_request tipc: check for null after calling kmemdup i40e: Fix display error code in dmesg i40e: Fix creation of first queue by omitting it if is not power of two i40e: Fix warning message and call stack during rmmod i40e driver i40e: Fix ping is lost after configuring ADq on VF i40e: Fix changing previously set num_queue_pairs for PFs i40e: Fix NULL ptr dereference on VSI filter sync i40e: Fix correct max_pkt_size on VF RX queue ...
2021-11-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-12/+1
Pull KVM fixes from Paolo Bonzini: "Selftest changes: - Cleanups for the perf test infrastructure and mapping hugepages - Avoid contention on mmap_sem when the guests start to run - Add event channel upcall support to xen_shinfo_test x86 changes: - Fixes for Xen emulation - Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache - Fixes for migration of 32-bit nested guests on 64-bit hypervisor - Compilation fixes - More SEV cleanups Generic: - Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS and num_online_cpus(). Most architectures were only using one of the two" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus() KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus() KVM: x86: Assume a 64-bit hypercall for guests with protected state selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore riscv: kvm: fix non-kernel-doc comment block KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror() KVM: SEV: Drop a redundant setting of sev->asid during initialization KVM: SEV: WARN if SEV-ES is marked active but SEV is not KVM: SEV: Set sev_info.active after initial checks in sev_guest_init() KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache KVM: nVMX: Use a gfn_to_hva_cache for vmptrld KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check KVM: x86/xen: Use sizeof_field() instead of open-coding it KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12 KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO ...
2021-11-18Merge tag 'printk-for-5.16-fixup' of ↵Linus Torvalds2-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fixes from Petr Mladek: - Try to flush backtraces from other CPUs also on the local one. This was a regression caused by printk_safe buffers removal. - Remove header dependency warning. * tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: Remove printk.h inclusion in percpu.h printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
2021-11-18net: dsa: felix: restrict psfp rules on ingress portXiaoliang Yang1-1/+2
PSFP rules take effect on the streams from any port of VSC9959 switch. This patch use ingress port to limit the rule only active on this port. Each stream can only match two ingress source ports in VSC9959. Streams from lowest port gets the configuration of SFID pointed by MAC Table lookup and streams from highest port gets the configuration of (SFID+1) pointed by MAC Table lookup. This patch defines the PSFP rule on highest port as dummy rule, which means that it does not modify the MAC table. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: mscc: ocelot: use index to set vcap policerXiaoliang Yang1-1/+13
Policer was previously automatically assigned from the highest index to the lowest index from policer pool. But police action of tc flower now uses index to set an police entry. This patch uses the police index to set vcap policers, so that one policer can be shared by multiple rules. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: dsa: felix: support psfp filter on vsc9959Xiaoliang Yang2-0/+18
VSC9959 supports Per-Stream Filtering and Policing(PSFP) that complies with the IEEE 802.1Qci standard. The stream is identified by Null stream identification(DMAC and VLAN ID) defined in IEEE802.1CB. For PSFP, four tables need to be set up: stream table, stream filter table, stream gate table, and flow meter table. Identify the stream by parsing the tc flower keys and add it to the stream table. The stream filter table is automatically maintained, and its index is determined by SGID(flow gate index) and FMID(flow meter index). Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: mscc: ocelot: add gate and police action offload to PSFPXiaoliang Yang2-0/+6
PSFP support gate and police action. This patch add the gate and police action to flower parse action, check chain ID to determine which block to offload. Adding psfp callback functions to add, delete and update gate and police in PSFP table if hardware supports it. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: mscc: ocelot: add MAC table stream learn and lookup operationsXiaoliang Yang1-0/+22
ocelot_mact_learn_streamdata() can be used in VSC9959 to overwrite an FDB entry with stream data. The stream data includes SFID and SSID which can be used for PSFP and FRER set. ocelot_mact_lookup() can be used to check if the given {DMAC, VID} FDB entry is exist, and also can retrieve the DEST_IDX and entry type for the FDB entry. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18tcp: add missing htmldocs for skb->ll_node and sk->defer_listEric Dumazet2-0/+2
Add missing entries to fix these "make htmldocs" warnings. ./include/linux/skbuff.h:953: warning: Function parameter or member 'll_node' not described in 'sk_buff' ./include/net/sock.h:540: warning: Function parameter or member 'defer_list' not described in 'sock' Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18page_pool: Revert "page_pool: disable dma mapping support..."Yunsheng Lin2-2/+23
This reverts commit d00e60ee54b12de945b8493cf18c1ada9e422514. As reported by Guillaume in [1]: Enabling LPAE always enables CONFIG_ARCH_DMA_ADDR_T_64BIT in 32-bit systems, which breaks the bootup proceess when a ethernet driver is using page pool with PP_FLAG_DMA_MAP flag. As we were hoping we had no active consumers for such system when we removed the dma mapping support, and LPAE seems like a common feature for 32 bits system, so revert it. 1. https://www.spinics.net/lists/netdev/msg779890.html Fixes: d00e60ee54b1 ("page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Reported-by: "kernelci.org bot" <bot@kernelci.org> Tested-by: "kernelci.org bot" <bot@kernelci.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18Merge branch 'rework/printk_safe-removal' into for-linusPetr Mladek1-0/+4
2021-11-18Merge branch 'kvm-5.16-fixes' into kvm-masterPaolo Bonzini2-12/+1
* Fixes for Xen emulation * Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache * Fixes for migration of 32-bit nested guests on 64-bit hypervisor * Compilation fixes * More SEV cleanups
2021-11-18KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cacheDavid Woodhouse2-12/+1
In commit 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status") I removed the only user of these functions because it was basically impossible to use them safely. There are two stages to the GFN->PFN mapping; first through the KVM memslots to a userspace HVA and then through the page tables to translate that HVA to an underlying PFN. Invalidations of the former were being handled correctly, but no attempt was made to use the MMU notifiers to invalidate the cache when the HVA->GFN mapping changed. As a prelude to reinventing the gfn_to_pfn_cache with more usable semantics, rip it out entirely and untangle the implementation of the unsafe kvm_vcpu_map()/kvm_vcpu_unmap() functions from it. All current users of kvm_vcpu_map() also look broken right now, and will be dealt with separately. They broadly fall into two classes: * Those which map, access the data and immediately unmap. This is mostly gratuitous and could just as well use the existing user HVA, and could probably benefit from a gfn_to_hva_cache as they do so. * Those which keep the mapping around for a longer time, perhaps even using the PFN directly from the guest. These will need to be converted to the new gfn_to_pfn_cache and then kvm_vcpu_map() can be removed too. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211115165030.7422-8-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-17ipv4/raw: support binding to nonlocal addressesRiccardo Paolo Bestetti1-0/+12
Add support to inet v4 raw sockets for binding to nonlocal addresses through the IP_FREEBIND and IP_TRANSPARENT socket options, as well as the ipv4.ip_nonlocal_bind kernel parameter. Add helper function to inet_sock.h to check for bind address validity on the base of the address type and whether nonlocal address are enabled for the socket via any of the sockopts/sysctl, deduplicating checks in ipv4/ping.c, ipv4/af_inet.c, ipv6/af_inet6.c (for mapped v4->v6 addresses), and ipv4/raw.c. Add test cases with IP[V6]_FREEBIND verifying that both v4 and v6 raw sockets support binding to nonlocal addresses after the change. Add necessary support for the test cases to nettest. Signed-off-by: Riccardo Paolo Bestetti <pbl@bestov.io> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20211117090010.125393-1-pbl@bestov.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-17NFC: add NCI_UNREG flag to eliminate the raceLin Ma1-0/+1
There are two sites that calls queue_work() after the destroy_workqueue() and lead to possible UAF. The first site is nci_send_cmd(), which can happen after the nci_close_device as below nfcmrvl_nci_unregister_dev | nfc_genl_dev_up nci_close_device | flush_workqueue | del_timer_sync | nci_unregister_device | nfc_get_device destroy_workqueue | nfc_dev_up nfc_unregister_device | nci_dev_up device_del | nci_open_device | __nci_request | nci_send_cmd | queue_work !!! Another site is nci_cmd_timer, awaked by the nci_cmd_work from the nci_send_cmd. ... | ... nci_unregister_device | queue_work destroy_workqueue | nfc_unregister_device | ... device_del | nci_cmd_work | mod_timer | ... | nci_cmd_timer | queue_work !!! For the above two UAF, the root cause is that the nfc_dev_up can race between the nci_unregister_device routine. Therefore, this patch introduce NCI_UNREG flag to easily eliminate the possible race. In addition, the mutex_lock in nci_close_device can act as a barrier. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20211116152732.19238-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-17net: add missing include in include/net/gro.hEric Dumazet1-0/+1
This is needed for some arches, as reported by Geert Uytterhoeven, Randy Dunlap and Stephen Rothwell Fixes: 4721031c3559 ("net: move gro definitions to include/net/gro.h") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Randy Dunlap <rdunlap@infradead.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20211117100130.2368319-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-17net: do not inline netif_tx_lock()/netif_tx_unlock()Eric Dumazet1-37/+2
These are not fast path, there is no point in inlining them. Also provide netif_freeze_queues()/netif_unfreeze_queues() so that we can use them from dev_watchdog() in the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17net: annotate accesses to queue->trans_startEric Dumazet1-3/+13
In following patches, dev_watchdog() will no longer stop all queues. It will read queue->trans_start locklessly. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17net: use an atomic_long_t for queue->trans_timeoutEric Dumazet1-1/+1
tx_timeout_show() assumed dev_watchdog() would stop all the queues, to fetch queue->trans_timeout under protection of the queue->_xmit_lock. As we want to no longer disrupt transmits, we use an atomic_long_t instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: david decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17Merge tag 'for-net-next-2021-11-16' of ↵David S. Miller5-18/+115
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for AOSP Bluetooth Quality Report - Enables AOSP extension for Mediatek Chip (MT7921 & MT7922) - Rework of HCI command execution serialization ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17net: virtio_net_hdr_to_skb: count transport header in UFOJonathan Davies1-1/+6
virtio_net_hdr_to_skb does not set the skb's gso_size and gso_type correctly for UFO packets received via virtio-net that are a little over the GSO size. This can lead to problems elsewhere in the networking stack, e.g. ovs_vport_send dropping over-sized packets if gso_size is not set. This is due to the comparison if (skb->len - p_off > gso_size) not properly accounting for the transport layer header. p_off includes the size of the transport layer header (thlen), so skb->len - p_off is the size of the TCP/UDP payload. gso_size is read from the virtio-net header. For UFO, fragmentation happens at the IP level so does not need to include the UDP header. Hence the calculation could be comparing a TCP/UDP payload length with an IP payload length, causing legitimate virtio-net packets to have lack gso_type/gso_size information. Example: a UDP packet with payload size 1473 has IP payload size 1481. If the guest used UFO, it is not fragmented and the virtio-net header's flags indicate that it is a GSO frame (VIRTIO_NET_HDR_GSO_UDP), with gso_size = 1480 for an MTU of 1500. skb->len will be 1515 and p_off will be 42, so skb->len - p_off = 1473. Hence the comparison fails, and shinfo->gso_size and gso_type are not set as they should be. Instead, add the UDP header length before comparing to gso_size when using UFO. In this way, it is the size of the IP payload that is compared to gso_size. Fixes: 6dd912f82680 ("net: check untrusted gso_size at kernel entry") Signed-off-by: Jonathan Davies <jonathan.davies@nutanix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17Merge tag 'mlx5-fixes-2021-11-16' of ↵David S. Miller1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-fixes-2021-11-16 Please pull this mlx5 fixes series, or let me know in case of any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: document SMII and correct phylink's new validation mechanismRussell King (Oracle)1-1/+1
SMII has not been documented in the kernel, but information on this PHY interface mode has been recently found. Document it, and correct the recently introduced phylink handling for this interface mode. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/E1mmfVl-0075nP-14@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16net: align static siphash keysEric Dumazet1-0/+2
siphash keys use 16 bytes. Define siphash_aligned_key_t macro so that we can make sure they are not crossing a cache line boundary. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16net: use .data.once section in netdev_level_once()Eric Dumazet1-1/+1
Same rationale than prior patch : using the dedicated section avoid holes and pack all these bool values. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16once: use __section(".data.once")Eric Dumazet1-1/+1
.data.once contains nicely packed bool variables. It is used already by DO_ONCE_LITE(). Using it also in DO_ONCE() removes holes in .data section. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski1-1/+2
Daniel Borkmann says: ==================== pull-request: bpf 2021-11-16 We've added 12 non-merge commits during the last 5 day(s) which contain a total of 23 files changed, 573 insertions(+), 73 deletions(-). The main changes are: 1) Fix pruning regression where verifier went overly conservative rejecting previsouly accepted programs, from Alexei Starovoitov and Lorenz Bauer. 2) Fix verifier TOCTOU bug when using read-only map's values as constant scalars during verification, from Daniel Borkmann. 3) Fix a crash due to a double free in XSK's buffer pool, from Magnus Karlsson. 4) Fix libbpf regression when cross-building runqslower, from Jean-Philippe Brucker. 5) Forbid use of bpf_ktime_get_coarse_ns() and bpf_timer_*() helpers in tracing programs due to deadlock possibilities, from Dmitrii Banshchikov. 6) Fix checksum validation in sockmap's udp_read_sock() callback, from Cong Wang. 7) Various BPF sample fixes such as XDP stats in xdp_sample_user, from Alexander Lobakin. 8) Fix libbpf gen_loader error handling wrt fd cleanup, from Kumar Kartikeya Dwivedi. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: udp: Validate checksum in udp_read_sock() bpf: Fix toctou on read-only map's constant scalar tracking samples/bpf: Fix build error due to -isystem removal selftests/bpf: Add tests for restricted helpers bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs libbpf: Perform map fd cleanup for gen_loader in case of error samples/bpf: Fix incorrect use of strlen in xdp_redirect_cpu tools/runqslower: Fix cross-build samples/bpf: Fix summary per-sec stats in xdp_sample_user selftests/bpf: Check map in map pruning bpf: Fix inner map state pruning regression. xsk: Fix crash on double free in buffer pool ==================== Link: https://lore.kernel.org/r/20211116141134.6490-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16net/mlx5: E-Switch, Fix resetting of encap mode when entering switchdevPaul Blakey1-2/+2
E-Switch encap mode is relevant only when in switchdev mode. The RDMA driver can query the encap configuration via mlx5_eswitch_get_encap_mode(). Make sure it returns the currently used mode and not the set one. This reverts the cited commit which reset the encap mode on entering switchdev and fixes the original issue properly. Fixes: 9a64144d683a ("net/mlx5: E-Switch, Fix default encap mode") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-16Bluetooth: Ignore HCI_ERROR_CANCELLED_BY_HOST on adv set terminated eventArchie Pusaka1-0/+1
This event is received when the controller stops advertising, specifically for these three reasons: (a) Connection is successfully created (success). (b) Timeout is reached (error). (c) Number of advertising events is reached (error). (*) This event is NOT generated when the host stops the advertisement. Refer to the BT spec ver 5.3 vol 4 part E sec 7.7.65.18. Note that the section was revised from BT spec ver 5.0 vol 2 part E sec 7.7.65.18 which was ambiguous about (*). Some chips (e.g. RTL8822CE) send this event when the host stops the advertisement with status = HCI_ERROR_CANCELLED_BY_HOST (due to (*) above). This is treated as an error and the advertisement will be removed and userspace will be informed via MGMT event. On suspend, we are supposed to temporarily disable advertisements, and continue advertising on resume. However, due to the behavior above, the advertisements are removed instead. This patch returns early if HCI_ERROR_CANCELLED_BY_HOST is received. Btmon snippet of the unexpected behavior: @ MGMT Command: Remove Advertising (0x003f) plen 1 Instance: 1 < HCI Command: LE Set Extended Advertising Enable (0x08|0x0039) plen 6 Extended advertising: Disabled (0x00) Number of sets: 1 (0x01) Entry 0 Handle: 0x01 Duration: 0 ms (0x00) Max ext adv events: 0 > HCI Event: LE Meta Event (0x3e) plen 6 LE Advertising Set Terminated (0x12) Status: Operation Cancelled by Host (0x44) Handle: 1 Connection handle: 0 Number of completed extended advertising events: 5 > HCI Event: Command Complete (0x0e) plen 4 LE Set Extended Advertising Enable (0x08|0x0039) ncmd 2 Status: Success (0x00) Signed-off-by: Archie Pusaka <apusaka@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-11-16Bluetooth: hci_request: Remove bg_scan_update workLuiz Augusto von Dentz1-1/+0
This work is no longer necessary since all the code using it has been converted to use hci_passive_scan/hci_passive_scan_sync. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-11-16Bluetooth: hci_sync: Convert MGMT_OP_SET_CONNECTABLE to use cmd_syncLuiz Augusto von Dentz2-2/+2
This makes MGMT_OP_SET_CONNEABLE use hci_cmd_sync_queue instead of use a dedicated connetable_update work. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-11-16Bluetooth: hci_sync: Convert MGMT_OP_SET_DISCOVERABLE to use cmd_syncLuiz Augusto von Dentz2-2/+3
This makes MGMT_OP_SET_DISCOVERABLE use hci_cmd_sync_queue instead of use a dedicated discoverable_update work. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-11-16net: drop nopreempt requirement on sock_prot_inuse_add()Eric Dumazet1-2/+2
This is distracting really, let's make this simpler, because many callers had to take care of this by themselves, even if on x86 this adds more code than really needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: merge net->core.prot_inuse and net->core.sock_inuseEric Dumazet2-2/+2
net->core.sock_inuse is a per cpu variable (int), while net->core.prot_inuse is another per cpu variable of 64 integers. per cpu allocator tend to place them in very different places. Grouping them together makes sense, since it makes updates potentially faster, if hitting the same cache line. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: make sock_inuse_add() availableEric Dumazet1-0/+10
MPTCP hard codes it, let us instead provide this helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: inline sock_prot_inuse_add()Eric Dumazet1-3/+11
sock_prot_inuse_add() is very small, we can inline it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: gro: populate net/core/gro.cEric Dumazet2-0/+23
Move gro code and data from net/core/dev.c to net/core/gro.c to ease maintenance. gro_normal_list() and gro_normal_one() are inlined because they are called from both files. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: gro: move skb_gro_receive into net/core/gro.cEric Dumazet2-1/+2
net/core/gro.c will contain all core gro functions, to shrink net/core/skbuff.c and net/core/dev.c Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: gro: move skb_gro_receive_list to udp_offload.cEric Dumazet1-1/+0
This helper is used once, no need to keep it in fat net/core/skbuff.c Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: move gro definitions to include/net/gro.hEric Dumazet5-390/+394
include/linux/netdevice.h became too big, move gro stuff into include/net/gro.h Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: move early demux fields close to sk_refcntEric Dumazet1-3/+5
sk_rx_dst/sk_rx_dst_ifindex/sk_rx_dst_cookie are read in early demux, and currently spans two cache lines. Moving them close to sk_refcnt makes more sense, as only one cache line is needed. New layout for this hot cache line is : struct sock { struct sock_common __sk_common; /* 0 0x88 */ /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ struct dst_entry * sk_rx_dst; /* 0x88 0x8 */ int sk_rx_dst_ifindex; /* 0x90 0x4 */ u32 sk_rx_dst_cookie; /* 0x94 0x4 */ socket_lock_t sk_lock; /* 0x98 0x20 */ atomic_t sk_drops; /* 0xb8 0x4 */ int sk_rcvlowat; /* 0xbc 0x4 */ /* --- cacheline 3 boundary (192 bytes) --- */ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16tcp: defer skb freeing after socket lock is releasedEric Dumazet3-0/+15
tcp recvmsg() (or rx zerocopy) spends a fair amount of time freeing skbs after their payload has been consumed. A typical ~64KB GRO packet has to release ~45 page references, eventually going to page allocator for each of them. Currently, this freeing is performed while socket lock is held, meaning that there is a high chance that BH handler has to queue incoming packets to tcp socket backlog. This can cause additional latencies, because the user thread has to process the backlog at release_sock() time, and while doing so, additional frames can be added by BH handler. This patch adds logic to defer these frees after socket lock is released, or directly from BH handler if possible. Being able to free these skbs from BH handler helps a lot, because this avoids the usual alloc/free assymetry, when BH handler and user thread do not run on same cpu or NUMA node. One cpu can now be fully utilized for the kernel->user copy, and another cpu is handling BH processing and skb/page allocs/frees (assuming RFS is not forcing use of a single CPU) Tested: 100Gbit NIC Max throughput for one TCP_STREAM flow, over 10 runs MTU : 1500 Before: 55 Gbit After: 66 Gbit MTU : 4096+(headers) Before: 82 Gbit After: 95 Gbit Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16tcp: annotate data-races on tp->segs_in and tp->data_segs_inEric Dumazet1-2/+6
tcp_segs_in() can be called from BH, while socket spinlock is held but socket owned by user, eventually reading these fields from tcp_get_info() Found by code inspection, no need to backport this patch to older kernels. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16tcp: add RETPOLINE mitigation to sk_backlog_rcvEric Dumazet1-1/+7
Use INDIRECT_CALL_INET() to avoid an indirect call when/if CONFIG_RETPOLINE=y Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: forward_alloc_get depends on CONFIG_MPTCPEric Dumazet1-4/+7
(struct proto)->sk_forward_alloc is currently only used by MPTCP. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>