summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2021-08-27ipv6: add IFLA_INET6_RA_MTU to expose mtu valueRocco Yue2-0/+3
The kernel provides a "/proc/sys/net/ipv6/conf/<iface>/mtu" file, which can temporarily record the mtu value of the last received RA message when the RA mtu value is lower than the interface mtu, but this proc has following limitations: (1) when the interface mtu (/sys/class/net/<iface>/mtu) is updeated, mtu6 (/proc/sys/net/ipv6/conf/<iface>/mtu) will be updated to the value of interface mtu; (2) mtu6 (/proc/sys/net/ipv6/conf/<iface>/mtu) only affect ipv6 connection, and not affect ipv4. Therefore, when the mtu option is carried in the RA message, there will be a problem that the user sometimes cannot obtain RA mtu value correctly by reading mtu6. After this patch set, if a RA message carries the mtu option, you can send a netlink msg which nlmsg_type is RTM_GETLINK, and then by parsing the attribute of IFLA_INET6_RA_MTU to get the mtu value carried in the RA message received on the inet6 device. In addition, you can also get a link notification when ra_mtu is updated so it doesn't have to poll. In this way, if the MTU values that the device receives from the network in the PCO IPv4 and the RA IPv6 procedures are different, the user can obtain the correct ipv6 ra_mtu value and compare the value of ra_mtu and ipv4 mtu, then the device can use the lower MTU value for both IPv4 and IPv6. Signed-off-by: Rocco Yue <rocco.yue@mediatek.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210827150412.9267-1-rocco.yue@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-27Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/David S. Miller3-6/+48
ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2021-08-27 1) Remove an unneeded extra variable in esp4 esp_ssg_unref. From Corey Minyard. 2) Add a configuration option to change the default behaviour to block traffic if there is no matching policy. Joint work with Christian Langrock and Antony Antony. 3) Fix a shift-out-of-bounce bug reported from syzbot. From Pavel Skripkin. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski8-26/+527
drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26Merge tag 'net-5.14-rc8' of ↵Linus Torvalds2-8/+3
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from can and bpf. Closing three hw-dependent regressions. Any fixes of note are in the 'old code' category. Nothing blocking release from our perspective. Current release - regressions: - stmmac: revert "stmmac: align RX buffers" - usb: asix: ax88772: move embedded PHY detection as early as possible - usb: asix: do not call phy_disconnect() for ax88178 - Revert "net: really fix the build...", from Kalle to fix QCA6390 Current release - new code bugs: - phy: mediatek: add the missing suspend/resume callbacks Previous releases - regressions: - qrtr: fix another OOB Read in qrtr_endpoint_post - stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings Previous releases - always broken: - inet: use siphash in exception handling - ip_gre: add validation for csum_start - bpf: fix ringbuf helper function compatibility - rtnetlink: return correct error on changing device netns - e1000e: do not try to recover the NVM checksum on Tiger Lake" * tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits) Revert "net: really fix the build..." net: hns3: fix get wrong pfc_en when query PFC configuration net: hns3: fix GRO configuration error after reset net: hns3: change the method of getting cmd index in debugfs net: hns3: fix duplicate node in VLAN list net: hns3: fix speed unknown issue in bond 4 net: hns3: add waiting time before cmdq memory is released net: hns3: clear hardware resource when loading driver net: fix NULL pointer reference in cipso_v4_doi_free rtnetlink: Return correct error on changing device netns net: dsa: hellcreek: Adjust schedule look ahead window net: dsa: hellcreek: Fix incorrect setting of GCL cxgb4: dont touch blocked freelist bitmap after free ipv4: use siphash instead of Jenkins in fnhe_hashfun() ipv6: use siphash in rt6_exception_hash() can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters net: usb: asix: ax88772: fix boolconv.cocci warnings net/sched: ets: fix crash when flipping from 'strict' to 'quantum' qede: Fix memset corruption net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp ...
2021-08-26Revert "net: really fix the build..."Kalle Valo1-6/+1
This reverts commit ce78ffa3ef1681065ba451cfd545da6126f5ca88. Wren and Nicolas reported that ath11k was failing to initialise QCA6390 Wi-Fi 6 device with error: qcom_mhi_qrtr: probe of mhi0_IPCR failed with error -22 Commit ce78ffa3ef16 ("net: really fix the build..."), introduced in v5.14-rc5, caused this regression in qrtr. Most likely all ath11k devices are broken, but I only tested QCA6390. Let's revert the broken commit so that ath11k works again. Reported-by: Wren Turkal <wt@penguintechs.org> Reported-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210826172816.24478-1-kvalo@codeaurora.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26Merge tag 'mac80211-next-for-net-next-2021-08-26' of ↵David S. Miller2-1/+117
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== A few more things: * Use correct DFS domain for self-managed devices * some preparations for transmit power element handling and other 6 GHz regulatory handling * TWT support in AP mode in mac80211 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26sock: remove one redundant SKB_FRAG_PAGE_ORDER macroYunsheng Lin1-0/+1
Both SKB_FRAG_PAGE_ORDER are defined to the same value in net/core/sock.c and drivers/vhost/net.c. Move the SKB_FRAG_PAGE_ORDER definition to net/core/sock.h, as both net/core/sock.c and drivers/vhost/net.c include it, and it seems a reasonable file to put the macro. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26ieee80211: add definition for transmit power envelope elementWen Gong1-1/+39
IEEE Std 802.11ax™-2021 makes changes to the transmit power envelope element, adjust the code accordingly. Signed-off-by: Wen Gong <wgong@codeaurora.org> Link: https://lore.kernel.org/r/20210820122041.12157-7-wgong@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-08-26ieee80211: add definition of regulatory info in 6 GHz operation informationWen Gong1-0/+4
IEEE Std 802.11ax™-2021 added regulatory info subfield in HE operation element, add it to the header file. Signed-off-by: Wen Gong <wgong@codeaurora.org> Link: https://lore.kernel.org/r/20210820122041.12157-3-wgong@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-08-25mctp: Remove the repeated declarationShaokun Zhang1-1/+0
Function 'mctp_dev_get_rtnl' is declared twice, so remove the repeated declaration. Cc: Jeremy Kerr <jk@codeconstruct.com.au> Cc: Matt Johnston <matt@codeconstruct.com.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25net: dsa: tag_sja1105: stop asking the sja1105 driver in sja1105_xmit_tpidVladimir Oltean1-1/+0
Introduced in commit 38b5beeae7a4 ("net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously"), the sja1105_xmit_tpid function solved quite a different problem than our needs are now. Then, we used best-effort VLAN filtering and we were using the xmit_tpid to tunnel packets coming from an 8021q upper through the TX VLAN allocated by tag_8021q to that egress port. The need for a different VLAN protocol depending on switch revision came from the fact that this in itself was more of a hack to trick the hardware into accepting tunneled VLANs in the first place. Right now, we deny 8021q uppers (see sja1105_prechangeupper). Even if we supported them again, we would not do that using the same method of {tunneling the VLAN on egress, retagging the VLAN on ingress} that we had in the best-effort VLAN filtering mode. It seems rather simpler that we just allocate a VLAN in the VLAN table that is simply not used by the bridge at all, or by any other port. Anyway, I have 2 gripes with the current sja1105_xmit_tpid: 1. When sending packets on behalf of a VLAN-aware bridge (with the new TX forwarding offload framework) plus untagged (with the tag_8021q VLAN added by the tagger) packets, we can see that on SJA1105P/Q/R/S and later (which have a qinq_tpid of ETH_P_8021AD), some packets sent through the DSA master have a VLAN protocol of 0x8100 and others of 0x88a8. This is strange and there is no reason for it now. If we have a bridge and are therefore forced to send using that bridge's TPID, we can as well blend with that bridge's VLAN protocol for all packets. 2. The sja1105_xmit_tpid introduces a dependency on the sja1105 driver, because it looks inside dp->priv. It is desirable to keep as much separation between taggers and switch drivers as possible. Now it doesn't do that anymore. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25net: dsa: sja1105: drop untagged packets on the CPU and DSA portsVladimir Oltean1-0/+2
The sja1105 driver is a bit special in its use of VLAN headers as DSA tags. This is because in VLAN-aware mode, the VLAN headers use an actual TPID of 0x8100, which is understood even by the DSA master as an actual VLAN header. Furthermore, control packets such as PTP and STP are transmitted with no VLAN header as a DSA tag, because, depending on switch generation, there are ways to steer these control packets towards a precise egress port other than VLAN tags. Transmitting control packets as untagged means leaving a door open for traffic in general to be transmitted as untagged from the DSA master, and for it to traverse the switch and exit a random switch port according to the FDB lookup. This behavior is a bit out of line with other DSA drivers which have native support for DSA tagging. There, it is to be expected that the switch only accepts DSA-tagged packets on its CPU port, dropping everything that does not match this pattern. We perhaps rely a bit too much on the switches' hardware dropping on the CPU port, and place no other restrictions in the kernel data path to avoid that. For example, sja1105 is also a bit special in that STP/PTP packets are transmitted using "management routes" (sja1105_port_deferred_xmit): when sending a link-local packet from the CPU, we must first write a SPI message to the switch to tell it to expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route it towards port 3 when it gets it. This entry expires as soon as it matches a packet received by the switch, and it needs to be reinstalled for the next packet etc. All in all quite a ghetto mechanism, but it is all that the sja1105 switches offer for injecting a control packet. The driver takes a mutex for serializing control packets and making the pairs of SPI writes of a management route and its associated skb atomic, but to be honest, a mutex is only relevant as long as all parties agree to take it. With the DSA design, it is possible to open an AF_PACKET socket on the DSA master net device, and blast packets towards 01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use, it all goes kaput because management routes installed by the driver will match skbs sent by the DSA master, and not skbs generated by the driver itself. So they will end up being routed on the wrong port. So through the lens of that, maybe it would make sense to avoid that from happening by doing something in the network stack, like: introduce a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around dev_hard_start_xmit(), introduce the following check: if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa) kfree_skb(skb); Ok, maybe that is a bit drastic, but that would at least prevent a bunch of problems. For example, right now, even though the majority of DSA switches drop packets without DSA tags sent by the DSA master (and therefore the majority of garbage that user space daemons like avahi and udhcpcd and friends create), it is still conceivable that an aggressive user space program can open an AF_PACKET socket and inject a spoofed DSA tag directly on the DSA master. We have no protection against that; the packet will be understood by the switch and be routed wherever user space says. Furthermore: there are some DSA switches where we even have register access over Ethernet, using DSA tags. So even user space drivers are possible in this way. This is a huge hole. However, the biggest thing that bothers me is that udhcpcd attempts to ask for an IP address on all interfaces by default, and with sja1105, it will attempt to get a valid IP address on both the DSA master as well as on sja1105 switch ports themselves. So with IP addresses in the same subnet on multiple interfaces, the routing table will be messed up and the system will be unusable for traffic until it is configured manually to not ask for an IP address on the DSA master itself. It turns out that it is possible to avoid that in the sja1105 driver, at least very superficially, by requesting the switch to drop VLAN-untagged packets on the CPU port. With the exception of control packets, all traffic originated from tag_sja1105.c is already VLAN-tagged, so only STP and PTP packets need to be converted. For that, we need to uphold the equivalence between an untagged and a pvid-tagged packet, and to remember that the CPU port of sja1105 uses a pvid of 4095. Now that we drop untagged traffic on the CPU port, non-aggressive user space applications like udhcpcd stop bothering us, and sja1105 effectively becomes just as vulnerable to the aggressive kind of user space programs as other DSA switches are (ok, users can also create 8021q uppers on top of the DSA master in the case of sja1105, but in future patches we can easily deny that, but it still doesn't change the fact that VLAN-tagged packets can still be injected over raw sockets). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25mptcp: MP_FAIL suboption sendingGeliang Tang1-1/+4
This patch added the MP_FAIL suboption sending support. Add a new flag named send_mp_fail in struct mptcp_subflow_context. If this flag is set, send out MP_FAIL suboption. Add a new member fail_seq in struct mptcp_out_options to save the data sequence number to put into the MP_FAIL suboption. An MP_FAIL option could be included in a RST or on the subflow-level ACK. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25mptcp: shrink mptcp_out_options structPaolo Abeni1-9/+17
After the previous patch we can alias with a union several fields in mptcp_out_options. Such struct is stack allocated and memset() for each plain TCP out packet. Every saved byted counts. Before: pahole -EC mptcp_out_options # ... /* size: 136, cachelines: 3, members: 17 */ After: pahole -EC mptcp_out_options # ... /* size: 56, cachelines: 1, members: 9 */ Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25Merge branch '1GbE' of ↵David S. Miller1-0/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2021-08-24 Vinicius Costa Gomes says: This adds support for PCIe PTM (Precision Time Measurement) to the igc driver. PCIe PTM allows the NIC and Host clocks to be compared more precisely, improving the clock synchronization accuracy. Patch 1/4 reverts a commit that made pci_enable_ptm() private to the PCI subsystem, reverting makes it possible for it to be called from the drivers. Patch 2/4 adds the pcie_ptm_enabled() helper. Patch 3/4 calls pci_enable_ptm() from the igc driver. Patch 4/4 implements the PCIe PTM support. Exposing it via the .getcrosststamp() API implies that the time measurements are made synchronously with the ioctl(). The hardware was implemented so the most convenient way to retrieve that information would be asynchronously. So, to follow the expectations of the ioctl() we have to use less convenient ways, triggering an PCIe PTM dialog every time a ioctl() is received. Some questions are raised (also pointed out in the commit message): 1. Using convert_art_ns_to_tsc() is too x86 specific, there should be a common way to create a 'system_counterval_t' from a timestamp. 2. convert_art_ns_to_tsc() says that it should only be used when X86_FEATURE_TSC_KNOWN_FREQ is true, but during tests it works even when it returns false. Should that check be done? ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25net-next: When a bond have a massive amount of VLANs with IPv6 addresses, ↵Gilad Naaman1-0/+5
performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically. The source of most of the slow down is the `dev_addr_lists.c` module, which mainatins a linked list of HW addresses. When using IPv6, this list grows for each IPv6 address added on a VLAN, since each IPv6 address has a multicast HW address associated with it. When performing any modification to the involved links, this list is traversed many times, often for nothing, all while holding the RTNL lock. Instead, this patch adds an auxilliary rbtree which cuts down traversal time significantly. Performance can be seen with the following script: #!/bin/bash ip netns del test || true 2>/dev/null ip netns add test echo 1 | ip netns exec test tee /proc/sys/net/ipv6/conf/all/keep_addr_on_down > /dev/null set -e ip -n test link add foo type veth peer name bar ip -n test link add b1 type bond ip -n test link add florp type vrf table 10 ip -n test link set bar master b1 ip -n test link set foo up ip -n test link set bar up ip -n test link set b1 up ip -n test link set florp up VLAN_COUNT=1500 BASE_DEV=b1 echo Creating vlans ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test link add link $BASE_DEV name foo.\$i type vlan id \$i; done" echo Bringing them up ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test link set foo.\$i up; done" echo Assiging IPv6 Addresses ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test address add dev foo.\$i 2000::\$i/64; done" echo Attaching to VRF ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test link set foo.\$i master florp; done" On an Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz machine, the performance before the patch is (truncated): Creating vlans real 108.35 Bringing them up real 4.96 Assiging IPv6 Addresses real 19.22 Attaching to VRF real 458.84 After the patch: Creating vlans real 5.59 Bringing them up real 5.07 Assiging IPv6 Addresses real 5.64 Attaching to VRF real 25.37 Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Lu Wei <luwei32@huawei.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24PCI: Add pcie_ptm_enabled()Vinicius Costa Gomes1-0/+3
Add a predicate that returns if PCIe PTM (Precision Time Measurement) is enabled. It will only return true if it's enabled in all the ports in the path from the device to the root. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-24Revert "PCI: Make pci_enable_ptm() private"Vinicius Costa Gomes1-0/+7
Make pci_enable_ptm() accessible from the drivers. Exposing this to the driver enables the driver to use the 'ptm_enabled' field of 'pci_dev' to check if PTM is enabled or not. This reverts commit ac6c26da29c1 ("PCI: Make pci_enable_ptm() private"). Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-24ethtool: extend coalesce setting uAPI with CQE modeYufeng Mo1-2/+9
In order to support more coalesce parameters through netlink, add two new parameter kernel_coal and extack for .set_coalesce and .get_coalesce, then some extra info can return to user with the netlink API. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24ethtool: add two coalesce attributes for CQE modeYufeng Mo2-1/+12
Currently, there are many drivers who support CQE mode configuration, some configure it as a fixed when initialized, some provide an interface to change it by ethtool private flags. In order to make it more generic, add two new 'ETHTOOL_A_COALESCE_USE_CQE_TX' and 'ETHTOOL_A_COALESCE_USE_CQE_RX' coalesce attributes, then these parameters can be accessed by ethtool netlink coalesce uAPI. Also add an new structure kernel_ethtool_coalesce, then the new parameter can be added into this struct. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24netdevice: move xdp_rxq within netdev_rx_queueJakub Kicinski1-1/+1
Both struct netdev_rx_queue and struct xdp_rxq_info are cacheline aligned. This causes extra padding before and after the xdp_rxq member. Move the member upfront, so that it's naturally aligned. Before: /* size: 256, cachelines: 4, members: 6 */ /* sum members: 160, holes: 1, sum holes: 40 */ /* padding: 56 */ /* paddings: 1, sum paddings: 36 */ /* forced alignments: 1, forced holes: 1, sum forced holes: 40 */ After: /* size: 192, cachelines: 3, members: 6 */ /* padding: 32 */ /* paddings: 1, sum paddings: 36 */ /* forced alignments: 1 */ Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20210823180135.1153608-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24ipv6: correct comments about fib6_node sernumzhang kai1-2/+2
correct comments in set and get fn_sernum Signed-off-by: zhang kai <zhangkaiheb@126.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24net: dsa: let drivers state that they need VLAN filtering while standaloneVladimir Oltean1-0/+3
As explained in commit e358bef7c392 ("net: dsa: Give drivers the chance to veto certain upper devices"), the hellcreek driver uses some tricks to comply with the network stack expectations: it enforces port separation in standalone mode using VLANs. For untagged traffic, bridging between ports is prevented by using different PVIDs, and for VLAN-tagged traffic, it never accepts 8021q uppers with the same VID on two ports, so packets with one VLAN cannot leak from one port to another. That is almost fine*, and has worked because hellcreek relied on an implicit behavior of the DSA core that was changed by the previous patch: the standalone ports declare the 'rx-vlan-filter' feature as 'on [fixed]'. Since most of the DSA drivers are actually VLAN-unaware in standalone mode, that feature was actually incorrectly reflecting the hardware/driver state, so there was a desire to fix it. This leaves the hellcreek driver in a situation where it has to explicitly request this behavior from the DSA framework. We configure the ports as follows: - Standalone: 'rx-vlan-filter' is on. An 8021q upper on top of a standalone hellcreek port will go through dsa_slave_vlan_rx_add_vid and will add a VLAN to the hardware tables, giving the driver the opportunity to refuse it through .port_prechangeupper. - Bridged with vlan_filtering=0: 'rx-vlan-filter' is off. An 8021q upper on top of a bridged hellcreek port will not go through dsa_slave_vlan_rx_add_vid, because there will not be any attempt to offload this VLAN. The driver already disables VLAN awareness, so that upper should receive the traffic it needs. - Bridged with vlan_filtering=1: 'rx-vlan-filter' is on. An 8021q upper on top of a bridged hellcreek port will call dsa_slave_vlan_rx_add_vid, and can again be vetoed through .port_prechangeupper. *It is not actually completely fine, because if I follow through correctly, we can have the following situation: ip link add br0 type bridge vlan_filtering 0 ip link set lan0 master br0 # lan0 now becomes VLAN-unaware ip link set lan0 nomaster # lan0 fails to become VLAN-aware again, therefore breaking isolation This patch fixes that corner case by extending the DSA core logic, based on this requested attribute, to change the VLAN awareness state of the switch (port) when it leaves the bridge. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24mac80211: introduce individual TWT support in AP modeLorenzo Bianconi1-0/+12
Introduce TWT action frames parsing support to mac80211. Currently just individual TWT agreement are support in AP mode. Whenever the AP receives a TWT action frame from an associated client, after performing sanity checks, it will notify the underlay driver with requested parameters in order to check if they are supported and if there is enough room for a new agreement. The driver is expected to set the agreement result and report it to mac80211. Drivers supporting this have two new callbacks: - add_twt_setup (mandatory) - twt_teardown_request (optional) mac80211 will send an action frame reply according to the result reported by the driver. Tested-by: Peter Chiu <chui-hao.chiu@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/257512f2e22ba42b9f2624942a128dd8f141de4b.1629741512.git.lorenzo@kernel.org [use le16p_replace_bits(), minor cleanups, use (void *) casts, fix to use ieee80211_get_he_iftype_cap() correctly] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-08-24ieee80211: add TWT element definitionsLorenzo Bianconi1-0/+62
Introduce TWT definitions and TWT Information element structure in ieee80211.h Tested-by: Peter Chiu <chui-hao.chiu@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/71d8b581fe4b5abc5b92f8d77ac2de3e2f7591b6.1629741512.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-08-23Revert "media: dvb header files: move some headers to staging"Linus Torvalds3-0/+502
This reverts commit 819fbd3d8ef36c09576c2a0ffea503f5c46e9177. It turns out that some user-space applications use these uapi header files, so even though the only user of the interface is an old driver that was moved to staging, moving the header files causes unnecessary pain. Generally, we really don't want user space to use kernel headers directly (exactly because it causes pain when we re-organize), and instead copy them as needed. But these things happen, and the headers were in the uapi directory, so I guess it's not entirely unreasonable. Link: https://lore.kernel.org/lkml/4e3e0d40-df4a-94f8-7c2d-85010b0873c4@web.de/ Reported-by: Soeren Moch <smoch@web.de> Cc: stable@kernel.org # 5.13 Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-23Merge tag 'wireless-drivers-next-2021-08-22' of ↵David S. Miller1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.15 First set of patches for v5.15. This got delayed as I have been mostly offline for the last few weeks. The biggest change is removal of prism54 driver, otherwise just smaller changes. Major changes: ath5k, ath9k, ath10k, ath11k: * switch from 'pci_' to 'dma_' API brcmfmac * allow per-board firmware binaries * add support 43752 SDIO device prism54 * remove the obsoleted driver, everyone should be using p54 driver instead ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-23net: dsa: track unique bridge numbers across all DSA switch treesVladimir Oltean1-5/+3
Right now, cross-tree bridging setups work somewhat by mistake. In the case of cross-tree bridging with sja1105, all switch instances need to agree upon a common VLAN ID for forwarding a packet that belongs to a certain bridging domain. With TX forwarding offload, the VLAN ID is the bridge VLAN for VLAN-aware bridging, and the tag_8021q TX forwarding offload VID (a VLAN which has non-zero VBID bits) for VLAN-unaware bridging. The VBID for VLAN-unaware bridging is derived from the dp->bridge_num value calculated by DSA independently for each switch tree. If ports from one tree join one bridge, and ports from another tree join another bridge, DSA will assign them the same bridge_num, even though the bridges are different. If cross-tree bridging is supported, this is an issue. Modify DSA to calculate the bridge_num globally across all switch trees. This has the implication for a driver that the dp->bridge_num value that DSA will assign to its ports might not be contiguous, if there are boards with multiple DSA drivers instantiated. Additionally, all bridge_num values eat up towards each switch's ds->num_fwd_offloading_bridges maximum, which is potentially unfortunate, and can be seen as a limitation introduced by this patch. However, that is the lesser evil for now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-21brcmfmac: add 43752 SDIO ids and initializationAngus Ainslie1-0/+1
Add HW and SDIO ids for use with the SparkLan AP6275S Add the firmware mapping structures for the BRCM43752 chipset. The 43752 needs some things setup similar to the 43012 chipset. The WATERMARK shows better performance when initialized to the 4373 value. Signed-off-by: Angus Ainslie <angus@akkea.ca> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210812165218.2508258-2-angus@akkea.ca
2021-08-20Merge branch 'akpm' (patches from Andrew)Linus Torvalds3-18/+22
Merge misc fixes from Andrew Morton: "10 patches. Subsystems affected by this patch series: MAINTAINERS and mm (shmem, pagealloc, tracing, memcg, memory-failure, vmscan, kfence, and hugetlb)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: hugetlb: don't pass page cache pages to restore_reserve_on_error kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZE mm: vmscan: fix missing psi annotation for node_reclaim() mm/hwpoison: retry with shake_page() for unhandlable pages mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim MAINTAINERS: update ClangBuiltLinux IRC chat mmflags.h: add missing __GFP_ZEROTAGS and __GFP_SKIP_KASAN_POISON names mm/page_alloc: don't corrupt pcppage_migratetype Revert "mm: swap: check if swap backing device is congested or not" Revert "mm/shmem: fix shmem_swapin() race with swapoff"
2021-08-20kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZEMarco Elver1-3/+4
Originally the addr != NULL check was meant to take care of the case where __kfence_pool == NULL (KFENCE is disabled). However, this does not work for addresses where addr > 0 && addr < KFENCE_POOL_SIZE. This can be the case on NULL-deref where addr > 0 && addr < PAGE_SIZE or any other faulting access with addr < KFENCE_POOL_SIZE. While the kernel would likely crash, the stack traces and report might be confusing due to double faults upon KFENCE's attempt to unprotect such an address. Fix it by just checking that __kfence_pool != NULL instead. Link: https://lkml.kernel.org/r/20210818130300.2482437-1-elver@google.com Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Marco Elver <elver@google.com> Reported-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Acked-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> [5.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-20mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaimJohannes Weiner1-14/+15
We've noticed occasional OOM killing when memory.low settings are in effect for cgroups. This is unexpected and undesirable as memory.low is supposed to express non-OOMing memory priorities between cgroups. The reason for this is proportional memory.low reclaim. When cgroups are below their memory.low threshold, reclaim passes them over in the first round, and then retries if it couldn't find pages anywhere else. But when cgroups are slightly above their memory.low setting, page scan force is scaled down and diminished in proportion to the overage, to the point where it can cause reclaim to fail as well - only in that case we currently don't retry, and instead trigger OOM. To fix this, hook proportional reclaim into the same retry logic we have in place for when cgroups are skipped entirely. This way if reclaim fails and some cgroups were scanned with diminished pressure, we'll try another full-force cycle before giving up and OOMing. [akpm@linux-foundation.org: coding-style fixes] Link: https://lkml.kernel.org/r/20210817180506.220056-1-hannes@cmpxchg.org Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Leon Yang <lnyng@fb.com> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> [5.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-20mmflags.h: add missing __GFP_ZEROTAGS and __GFP_SKIP_KASAN_POISON namesMike Rapoport1-1/+3
printk("%pGg") outputs these two flags as hexadecimal number, rather than as a string, e.g: GFP_KERNEL|0x1800000 Fix this by adding missing names of __GFP_ZEROTAGS and __GFP_SKIP_KASAN_POISON flags to __def_gfpflag_names. Link: https://lkml.kernel.org/r/20210816133502.590-1-rppt@kernel.org Fixes: 013bb59dbb7c ("arm64: mte: handle tags zeroing at page allocation time") Fixes: c275c5c6d50a ("kasan: disable freed user page poisoning with HW tags") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-20Merge tag 'mac80211-next-for-net-next-2021-08-20' of ↵Jakub Kicinski5-0/+246
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Minor updates: * BSS coloring support * MEI commands for Intel platforms * various fixes/cleanups * tag 'mac80211-next-for-net-next-2021-08-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: cfg80211: fix BSS color notify trace enum confusion mac80211: Fix insufficient headroom issue for AMSDU mac80211: add support for BSS color change nl80211: add support for BSS coloring mac80211: Use flex-array for radiotap header bitmap mac80211: radiotap: Use BIT() instead of shifts mac80211: Remove unnecessary variable and label mac80211: include <linux/rbtree.h> mac80211: Fix monitor MTU limit so that A-MSDUs get through mac80211: remove unnecessary NULL check in ieee80211_register_hw() mac80211: Reject zero MAC address in sta_info_insert_check() nl80211: vendor-cmd: add Intel vendor commands for iwlmei usage ==================== Link: https://lore.kernel.org/r/20210820105329.48674-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-20net: bridge: vlan: convert mcast router global option to per-vlan entryNikolay Aleksandrov1-1/+1
The per-vlan router option controls the port/vlan and host vlan entries' mcast router config. The global option controlled only the host vlan config, but that is unnecessary and incosistent as it's not really a global vlan option, but rather bridge option to control host router config, so convert BRIDGE_VLANDB_GOPTS_MCAST_ROUTER to BRIDGE_VLANDB_ENTRY_MCAST_ROUTER which can be used to control both host vlan and port vlan mcast router config. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20net: mscc: ocelot: transmit the VLAN filtering restrictions via extackVladimir Oltean1-1/+2
We need to transmit more restrictions in future patches, convert this one to netlink extack. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20net: mscc: ocelot: transmit the "native VLAN" error via extackVladimir Oltean1-1/+1
We need to reject some more configurations in future patches, convert the existing one to netlink extack. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20Merge tag 'mlx5-updates-2021-08-19' of ↵David S. Miller1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-08-19 This series introduces the support for two new mlx5 features: 1) Sample offload for tunneled traffic 2) devlink rate objects support 1) From Chris Mi: Sample offload for tunneled traffic ===================================================== Background and solution ----------------------- Currently the sample offload actions send the encapsulated packet to software. This series de-capsulates the packet before performing the sampling and set the tunnel properties on the skb metadata fields to make the behavior consistent with OVS sFlow. If de-capsulating first, we can't use the same match like before in default table. So instantiate a post action instance to continue processing the action list. If HW can preserve reg_c, also use the post action instance. Post action infrastructure -------------------------- Some tc actions are modeled in hardware using multiple tables causing a tc action list split. For example, CT action is modeled by jumping to a ct table which is controlled by nf flow table. sFlow jumps in hardware to a sample table, which continues to a "default table" where it should continue processing the action list. Multi table actions are modeled in hardware using a unique fte_id. The fte_id is set before jumping to a table. Split actions continue to a post-action table where the matched fte_id value continues the execution the tc action list. This series also introduces post action infrastructure. Both ct and sample use it. Sample for tunnel in TC SW -------------------------- tc filter add dev vxlan1 protocol ip parent ffff: prio 3 \ flower src_mac 24:25:d0:e1:00:00 dst_mac 02:25:d0:13:01:02 \ enc_src_ip 192.168.1.14 enc_dst_ip 192.168.1.13 \ enc_dst_port 4789 enc_key_id 4 \ action sample rate 1 group 6 \ action tunnel_key unset \ action mirred egress redirect dev enp4s0f0_1 MLX5 sample HW offload ---------------------- For the following typical flow table: +-------------------------------+ + original flow table + +-------------------------------+ + original match + +-------------------------------+ + sample action + other actions + +-------------------------------+ We translate the tc filter with sample action to the following HW model: +---------------------+ + original flow table + +---------------------+ + original match + +---------------------+ | set fte_id (if reg_c preserve cap) | do decap v +------------------------------------------------+ + Flow Sampler Object + +------------------------------------------------+ + sample ratio + +------------------------------------------------+ + sample table id | default table id + +------------------------------------------------+ | | v v +-----------------------------+ +-------------------+ + sample table + + default table + +-----------------------------+ +-------------------+ + forward to management vport + | +-----------------------------+ | +-------+------+ | |reg_c preserve cap | |or decap action v v +-----------------+ +-------------+ + per vport table + + post action + +-----------------+ +-------------+ + original match + +-----------------+ + other actions + +-----------------+ 2) From Dmytro Linkin: devlink rate object support for mlx5_core driver ======================================================================= HIGH-LEVEL OVERVIEW Devlink leaf rate objects created per vport (VF/SF, and PF on BlueField) in switchdev mode on devlink port registration. Implement devlink ops callbacks to create/destroy rate groups, set TX rate values of the vport/group, assign vport to the group. Driver accepts TX rate values as fraction of 1Mbps. Refactor existing eswitch QoS infrastructure to be accessible by legacy NDO rate API and new devlink rate API. NDO rate API is not removed/disabled in switchdev mode to not break existing users. Rate values configured with NDO rate API are not visible for devlink infrastructure, therefore APIs should not be used simultaneously. IMPLEMENTATION DETAILS Driver provide two level rate hierarchy to manage bandwidth - group level and vport level. Initially each vport added to internal unlimited group created by default. Each rate element (vport or group) receive bandwidth relative to its parent element (for groups the parent is a physical link itself) in a Round Robin manner, where element get bandwidth value according to its weight. Example: Created four rate groups with tx_share limits: $ devlink port function rate add \ pci/0000:06:00.0/group_1 tx_share 30gbit $ devlink port function rate add \ pci/0000:06:00.0/group_2 tx_share 20gbit $ devlink port function rate add \ pci/0000:06:00.0/group_3 tx_share 20gbit $ devlink port function rate add \ pci/0000:06:00.0/group_4 tx_share 10gbit Weights created in HW for each group are relative to the bigest tx_share value, which is 30gbit: <group_1> 1.0 <group_2> 0.67 <group_3> 0.67 <group_4> 0.33 Assuming link speed is 50 Gbit/sec and each group can sustain such amount of traffic, maximum bandwidth is 50 / (1.0 + 0.67 + 0.67 + 0.33) = ~18.75 Gbit/sec. Normilized bandwidth values for groups: <group_1> 18.75 * 1.0 = 18.75 Gbit/sec <group_2> 18.75 * 0.67 = 12.5 Gbit/sec <group_3> 18.75 * 0.67 = 12.5 Gbit/sec <group_4> 18.75 * 0.33 = 6.25 Gbit/sec If in example above group_1 doesn't produce any traffic, then maximum bandwidth becomes 50 / (0.67 + 0.67 + 0.33) = ~30.0 Gbit/sec. Normalized values: <group_2> 30.0 * 0.67 = 20.0 Gbit/sec <group_3> 30.0 * 0.67 = 20.0 Gbit/sec <group_4> 30.0 * 0.33 = 10.0 Gbit/sec Same normalization applied to each vport in the group. Normalized values are internal, therefore driver provides QoS tracepoints for next events: * vport rate element creation/deletion: * vport rate element configuration; * group rate element creation/deletion; * group rate element configuration. PATCHES OVERVIEW 1 - Moving and isolation of eswitch QoS logic in separate file; 2 - Implement devlink leaf rate object support for vports; 3 - Implement rate groups creation/deletion; 4 - Implement TX rate management for the groups; 5 - Implement parent set for vports; 6 - Eswitch QoS tracepoints. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20Merge tag 'for-net-next-2021-08-19' of ↵David S. Miller1-2/+19
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for Foxconn Mediatek Chip - Add support for LG LGSBWAC92/TWCM-K505D - hci_h5 flow control fixes and suspend support - Switch to use lock_sock for SCO and RFCOMM - Various fixes for extended advertising - Reword Intel's setup on btusb unifying the supported generations ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19net/mlx5: E-switch, Introduce rate limiting groups APIDmytro Linkin1-1/+2
Extend eswitch API with rate limiting groups: - Define new struct mlx5_esw_rate_group that is used to hold all internal group data. - Implement functions that allow creation, destruction and cleanup of groups. - Assign all vports to internal unlimited zero group by default. This commit lays the groundwork for group rate limiting by implementing devlink_ops->rate_node_{new|del}() callbacks to support creating and deleting groups through devlink rate node objects. APIs that allows setting rates and adding/removing members are implemented in following patches. Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Huy Nguyen <huyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski9-12/+31
drivers/ptp/Kconfig: 55c8fca1dae1 ("ptp_pch: Restore dependency on PCI") e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19Merge tag 'net-5.14-rc7' of ↵Linus Torvalds1-7/+5
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from bpf, wireless and mac80211 trees. Current release - regressions: - tipc: call tipc_wait_for_connect only when dlen is not 0 - mac80211: fix locking in ieee80211_restart_work() Current release - new code bugs: - bpf: add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id() - ethernet: ice: fix perout start time rounding - wwan: iosm: prevent underflow in ipc_chnl_cfg_get() Previous releases - regressions: - bpf: clear zext_dst of dead insns - sch_cake: fix srchost/dsthost hashing mode - vrf: reset skb conntrack connection on VRF rcv - net/rds: dma_map_sg is entitled to merge entries Previous releases - always broken: - ethernet: bnxt: fix Tx path locking and races, add Rx path barriers" * tag 'net-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits) net: dpaa2-switch: disable the control interface on error path Revert "flow_offload: action should not be NULL when it is referenced" iavf: Fix ping is lost after untrusted VF had tried to change MAC i40e: Fix ATR queue selection r8152: fix the maximum number of PLA bp for RTL8153C r8152: fix writing USB_BP2_EN mptcp: full fully established support after ADD_ADDR mptcp: fix memory leak on address flush net/rds: dma_map_sg is entitled to merge entries net: mscc: ocelot: allow forwarding from bridge ports to the tag_8021q CPU port net: asix: fix uninit value bugs ovs: clear skb->tstamp in forwarding path net: mdio-mux: Handle -EPROBE_DEFER correctly net: mdio-mux: Don't ignore memory allocation errors net: mdio-mux: Delete unnecessary devm_kfree net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse sch_cake: fix srchost/dsthost hashing mode ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error path net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32 mac80211: fix locking in ieee80211_restart_work() ...
2021-08-19Merge tag 'linux-can-next-for-5.15-20210819' of ↵Jakub Kicinski1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== linux-can-next-for-5.15-20210819 The first patch is by me, for the mailmap file and maps the email address of two former ESD employees to a newly created role account. The next 3 patches are by Oleksij Rempel and add support for GPIO based switchable CAN bus termination. The next 3 patches are by Vincent Mailhol. The first one changes the CAN netlink interface to not bail out if the user switched off unsupported features. The next one adds Vincent as the maintainer of the etas_es58x driver and the last one cleans up the documentation of struct es58x_fd_tx_conf_msg. The next patch is by me, for the mcp251xfd driver and marks some instances of struct mcp251xfd_priv as const. Lad Prabhakar contributes 2 patches for the rcar_canfd driver, that add support for RZ/G2L family. The next 5 patches target the m_can/tcan45x5 driver. 2 are by me an fix trivial checkpatch warnings. The remaining 3 patches are by Matt Kline and improve the performance on the SPI based tcan4x5x chip by batching FIFO reads and writes. The last 7 patches are for the c_can driver. Dario Binacchi's patch converts the DT bindings to yaml, 2 patches by me fix a typo and rename a macro to properly represent the usage. The last 4 patches are again by Dario Binacchi and provide a performance improvement for the TX path by operating the TX mailboxes as a true FIFO. * tag 'linux-can-next-for-5.15-20210819' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (22 commits) can: c_can: cache frames to operate as a true FIFO can: c_can: support tx ring algorithm can: c_can: exit c_can_do_tx() early if no frames have been sent can: c_can: remove struct c_can_priv::priv field can: c_can: rename IF_RX -> IF_NAPI can: c_can: c_can_do_tx(): fix typo in comment dt-bindings: net: can: c_can: convert to json-schema can: m_can: Batch FIFO writes during CAN transmit can: m_can: Batch FIFO reads during CAN receive can: m_can: Disable IRQs on FIFO bus errors can: m_can: fix block comment style can: tcan4x5x: cdev_to_priv(): remove stray empty line can: rcar_canfd: Add support for RZ/G2L family dt-bindings: net: can: renesas,rcar-canfd: Document RZ/G2L SoC can: mcp251xfd: mark some instances of struct mcp251xfd_priv as const can: etas_es58x: clean-up documentation of struct es58x_fd_tx_conf_msg MAINTAINERS: add Vincent MAILHOL as maintainer for the ETAS ES58X CAN/USB driver can: netlink: allow user to turn off unsupported features can: dev: provide optional GPIO based termination support dt-bindings: can: fsl,flexcan: enable termination-* bindings ... ==================== Link: https://lore.kernel.org/r/20210819133913.657715-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19Revert "flow_offload: action should not be NULL when it is referenced"Ido Schimmel1-7/+5
This reverts commit 9ea3e52c5bc8bb4a084938dc1e3160643438927a. Cited commit added a check to make sure 'action' is not NULL, but 'action' is already dereferenced before the check, when calling flow_offload_has_one_action(). Therefore, the check does not make any sense and results in a smatch warning: include/net/flow_offload.h:322 flow_action_mixed_hw_stats_check() warn: variable dereferenced before check 'action' (see line 319) Fix by reverting this commit. Cc: gushengxian <gushengxian@yulong.com> Fixes: 9ea3e52c5bc8 ("flow_offload: action should not be NULL when it is referenced") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20210819105842.1315705-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19can: dev: provide optional GPIO based termination supportOleksij Rempel1-0/+8
For CAN buses to work, a termination resistor has to be present at both ends of the bus. This resistor is usually 120 Ohms, other values may be required for special bus topologies. This patch adds support for a generic GPIO based CAN termination. The resistor value has to be specified via device tree, and it can only be attached to or detached from the bus. By default the termination is not active. Link: https://lore.kernel.org/r/20210818071232.20585-4-o.rempel@pengutronix.de Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-08-19net: Fix offloading indirect devices dependency on qdisc order creationEli Cohen1-0/+1
Currently, when creating an ingress qdisc on an indirect device before the driver registered for callbacks, the driver will not have a chance to register its filter configuration callbacks. To fix that, modify the code such that it keeps track of all the ingress qdiscs that call flow_indr_dev_setup_offload(). When a driver calls flow_indr_dev_register(), go through the list of tracked ingress qdiscs and call the driver callback entry point so as to give it a chance to register its callback. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19net: mii: make mii_ethtool_gset() return voidPavel Skripkin1-1/+1
mii_ethtool_gset() does not return any errors. Since there are no users of this function that rely on its return value, it can be made void. Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18pipe: avoid unnecessary EPOLLET wakeups under normal loadsLinus Torvalds1-0/+2
I had forgotten just how sensitive hackbench is to extra pipe wakeups, and commit 3a34b13a88ca ("pipe: make pipe writes always wake up readers") ended up causing a quite noticeable regression on larger machines. Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's not clear that this matters in real life all that much, but as Mel points out, it's used often enough when comparing kernels and so the performance regression shows up like a sore thumb. It's easy enough to fix at least for the common cases where pipes are used purely for data transfer, and you never have any exciting poll usage at all. So set a special 'poll_usage' flag when there is polling activity, and make the ugly "EPOLLET has crazy legacy expectations" semantics explicit to only that case. I would love to limit it to just the broken EPOLLET case, but the pipe code can't see the difference between epoll and regular select/poll, so any non-read/write waiting will trigger the extra wakeup behavior. That is sufficient for at least the hackbench case. Apart from making the odd extra wakeup cases more explicitly about EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe write, not at the first write chunk. That is actually much saner semantics (as much as you can call any of the legacy edge-triggered expectations for EPOLLET "sane") since it means that you know the wakeup will happen once the write is done, rather than possibly in the middle of one. [ For stable people: I'm putting a "Fixes" tag on this, but I leave it up to you to decide whether you actually want to backport it or not. It likely has no impact outside of synthetic benchmarks - Linus ] Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/ Fixes: 3a34b13a88ca ("pipe: make pipe writes always wake up readers") Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Sandeep Patil <sspatil@android.com> Tested-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-18net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()Wei Wang2-1/+7
Add gfp_t mask as an input parameter to mem_cgroup_charge_skmem(), to give more control to the networking stack and enable it to change memcg charging behavior. In the future, the networking stack may decide to avoid oom-kills when fallbacks are more appropriate. One behavior change in mem_cgroup_charge_skmem() by this patch is to avoid force charging by default and let the caller decide when and if force charging is needed through the presence or absence of __GFP_NOFAIL. Signed-off-by: Wei Wang <weiwan@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18net: dsa: tag_sja1105: be dsa_loop-safeVladimir Oltean1-0/+18
Add support for tag_sja1105 running on non-sja1105 DSA ports, by making sure that every time we dereference dp->priv, we check the switch's dsa_switch_ops (otherwise we access a struct sja1105_port structure that is in fact something else). This adds an unconditional build-time dependency between sja1105 being built as module => tag_sja1105 must also be built as module. This was there only for PTP before. Some sane defaults must also take place when not running on sja1105 hardware. These are: - sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols depending on VLAN awareness and switch revision (when an encapsulated VLAN must be sent). Default to 0x8100. - sja1105_rcv_meta_state_machine: this aggregates PTP frames with their metadata timestamp frames. When running on non-sja1105 hardware, don't do that and accept all frames unmodified. - sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c which writes a management route over SPI. When not running on sja1105 hardware, bypass the SPI write and send the frame as-is. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>