summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2020-03-17net: axienet: let core reject the unsupported coalescing parametersJakub Kicinski1-21/+1
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver already correctly rejected all unsupported parameters. No functional changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: ll_temac: let core reject the unsupported coalescing parametersJakub Kicinski1-19/+2
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver already correctly rejected all unsupported parameters. No functional changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: davinci_emac: reject unsupported coalescing paramsJakub Kicinski1-0/+1
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: cpsw: reject unsupported coalescing paramsJakub Kicinski2-0/+2
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: tehuti: reject unsupported coalescing paramsJakub Kicinski1-0/+2
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: dwc-xlgmac: let core reject the unsupported coalescing parametersJakub Kicinski1-15/+2
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver already correctly rejected all unsupported parameters. While at it remove unnecessary zeroing on get. No functional changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: socionext: reject unsupported coalescing paramsJakub Kicinski1-0/+2
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: sfc: reject unsupported coalescing paramsJakub Kicinski2-6/+6
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. The check for use_adaptive_tx_coalesce will now be done by the core. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18netfilter: Introduce egress hookLukas Wunner7-8/+83
Commit e687ad60af09 ("netfilter: add netfilter ingress hook after handle_ing() under unique static key") introduced the ability to classify packets on ingress. Allow the same on egress. Position the hook immediately before a packet is handed to tc and then sent out on an interface, thereby mirroring the ingress order. This order allows marking packets in the netfilter egress hook and subsequently using the mark in tc. Another benefit of this order is consistency with a lot of existing documentation which says that egress tc is performed after netfilter hooks. Egress hooks already exist for the most common protocols, such as NF_INET_LOCAL_OUT or NF_ARP_OUT, and those are to be preferred because they are executed earlier during packet processing. However for more exotic protocols, there is currently no provision to apply netfilter on egress. A common workaround is to enslave the interface to a bridge and use ebtables, or to resort to tc. But when the ingress hook was introduced, consensus was that users should be given the choice to use netfilter or tc, whichever tool suits their needs best: https://lore.kernel.org/netdev/20150430153317.GA3230@salvia/ This hook is also useful for NAT46/NAT64, tunneling and filtering of locally generated af_packet traffic such as dhclient. There have also been occasional user requests for a netfilter egress hook in the past, e.g.: https://www.spinics.net/lists/netfilter/msg50038.html Performance measurements with pktgen surprisingly show a speedup rather than a slowdown with this commit: * Without this commit: Result: OK: 34240933(c34238375+d2558) usec, 100000000 (60byte,0frags) 2920481pps 1401Mb/sec (1401830880bps) errors: 0 * With this commit: Result: OK: 33997299(c33994193+d3106) usec, 100000000 (60byte,0frags) 2941410pps 1411Mb/sec (1411876800bps) errors: 0 * Without this commit + tc egress: Result: OK: 39022386(c39019547+d2839) usec, 100000000 (60byte,0frags) 2562631pps 1230Mb/sec (1230062880bps) errors: 0 * With this commit + tc egress: Result: OK: 37604447(c37601877+d2570) usec, 100000000 (60byte,0frags) 2659259pps 1276Mb/sec (1276444320bps) errors: 0 * With this commit + nft egress: Result: OK: 41436689(c41434088+d2600) usec, 100000000 (60byte,0frags) 2413320pps 1158Mb/sec (1158393600bps) errors: 0 Tested on a bare-metal Core i7-3615QM, each measurement was performed three times to verify that the numbers are stable. Commands to perform a measurement: modprobe pktgen echo "add_device lo@3" > /proc/net/pktgen/kpktgend_3 samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i 'lo@3' -n 100000000 Commands for testing tc egress: tc qdisc add dev lo clsact tc filter add dev lo egress protocol ip prio 1 u32 match ip dst 4.3.2.1/32 Commands for testing nft egress: nft add table netdev t nft add chain netdev t co \{ type filter hook egress device lo priority 0 \; \} nft add rule netdev t co ip daddr 4.3.2.1/32 drop All testing was performed on the loopback interface to avoid distorting measurements by the packet handling in the low-level Ethernet driver. Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-18netfilter: Generalize ingress hookLukas Wunner2-15/+32
Prepare for addition of a netfilter egress hook by generalizing the ingress hook introduced by commit e687ad60af09 ("netfilter: add netfilter ingress hook after handle_ing() under unique static key"). In particular, rename and refactor the ingress hook's static inlines such that they can be reused for an egress hook. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-18netfilter: Rename ingress hook include fileLukas Wunner2-1/+1
Prepare for addition of a netfilter egress hook by renaming <linux/netfilter_ingress.h> to <linux/netfilter_netdev.h>. The egress hook also necessitates a refactoring of the include file, but that is done in a separate commit to ease reviewing. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-16Merge branch 'tcp-fix-stretch-ACK-bugs-in-congestion-control-modules'David S. Miller4-65/+51
Pengcheng Yang says: ==================== tcp: fix stretch ACK bugs in congestion control modules "stretch ACKs" (caused by LRO, GRO, delayed ACKs or middleboxes) can cause serious performance shortfalls in common congestion control algorithms. Neal Cardwell submitted a series of patches starting with commit e73ebb0881ea ("tcp: stretch ACK fixes prep") to handle stretch ACKs and fixed stretch ACK bugs in Reno and CUBIC congestion control algorithms. This patch series continues to fix bic, scalable, veno and yeah congestion control algorithms to handle stretch ACKs. Changes in v2: - Provide [PATCH 0/N] to describe the modifications of this patch series ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16tcp: fix stretch ACK bugs in YeahPengcheng Yang1-30/+11
Change Yeah to properly handle stretch ACKs in additive increase mode by passing in the count of ACKed packets to tcp_cong_avoid_ai(). In addition, we re-implemented the scalable path using tcp_cong_avoid_ai() and removed the pkts_acked variable. Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16tcp: fix stretch ACK bugs in VenoPengcheng Yang1-4/+5
Change Veno to properly handle stretch ACKs in additive increase mode by passing in the count of ACKed packets to tcp_cong_avoid_ai(). Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16tcp: stretch ACK fixes in Veno prepPengcheng Yang1-21/+23
No code logic has been changed in this patch. Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16tcp: fix stretch ACK bugs in ScalablePengcheng Yang1-8/+9
Change Scalable to properly handle stretch ACKs in additive increase mode by passing in the count of ACKed packets to tcp_cong_avoid_ai(). In addition, because we are now precisely accounting for stretch ACKs, including delayed ACKs, we can now change TCP_SCALABLE_AI_CNT to 100. Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16tcp: fix stretch ACK bugs in BICPengcheng Yang1-5/+6
Changes BIC to properly handle stretch ACKs in additive increase mode by passing in the count of ACKed packets to tcp_cong_avoid_ai(). Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16sfc: fix XDP-redirect in this driverJesper Dangaard Brouer4-8/+15
XDP-redirect is broken in this driver sfc. XDP_REDIRECT requires tailroom for skb_shared_info when creating an SKB based on the redirected xdp_frame (both in cpumap and veth). The fix requires some initial explaining. The driver uses RX page-split when possible. It reserves the top 64 bytes in the RX-page for storing dma_addr (struct efx_rx_page_state). It also have the XDP recommended headroom of XDP_PACKET_HEADROOM (256 bytes). As it doesn't reserve any tailroom, it can still fit two standard MTU (1500) frames into one page. The sizeof struct skb_shared_info in 320 bytes. Thus drivers like ixgbe and i40e, reduce their XDP headroom to 192 bytes, which allows them to fit two frames with max 1536 bytes into a 4K page (192+1536+320=2048). The fix is to reduce this drivers headroom to 128 bytes and add the 320 bytes tailroom. This account for reserved top 64 bytes in the page, and still fit two frame in a page for normal MTUs. We must never go below 128 bytes of headroom for XDP, as one cacheline is for xdp_frame area and next cacheline is reserved for metadata area. Fixes: eb9a36be7f3e ("sfc: perform XDP processing on received packets") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16remoteproc: clean up notification configAlex Elder2-4/+2
Rearrange the config files for remoteproc and IPA to fix their interdependencies. First, have CONFIG_QCOM_Q6V5_MSS select QCOM_Q6V5_IPA_NOTIFY so the notification code is built regardless of whether IPA needs it. Next, represent QCOM_IPA as being dependent on QCOM_Q6V5_MSS rather than setting its value to match QCOM_Q6V5_COMMON (which is selected by QCOM_Q6V5_MSS). Drop all dependencies from QCOM_Q6V5_IPA_NOTIFY. The notification code will be built whenever QCOM_Q6V5_MSS is set, and it has no other dependencies. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16net: kcm: kcmproc.c: Fix RCU list suspicious usage warningMadhuparna Bhowmik1-1/+1
This path fixes the suspicious RCU usage warning reported by kernel test robot. net/kcm/kcmproc.c:#RCU-list_traversed_in_non-reader_section There is no need to use list_for_each_entry_rcu() in kcm_stats_seq_show() as the list is always traversed under knet->mutex held. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16qede: remove some unused code in function qede_selftest_receive_trafficZheng Zengkai1-12/+1
Remove set but not used variables 'sw_comp_cons' and 'hw_comp_cons' to fix gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/qlogic/qede/qede_ethtool.c: In function qede_selftest_receive_traffic: drivers/net/ethernet/qlogic/qede/qede_ethtool.c:1569:20: warning: variable sw_comp_cons set but not used [-Wunused-but-set-variable] drivers/net/ethernet/qlogic/qede/qede_ethtool.c: In function qede_selftest_receive_traffic: drivers/net/ethernet/qlogic/qede/qede_ethtool.c:1569:6: warning: variable hw_comp_cons set but not used [-Wunused-but-set-variable] After removing 'hw_comp_cons',the memory barrier 'rmb()' and its comments become useless, so remove them as well. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16net: sched: set the hw_stats_type in pedit loopJiri Pirko1-0/+1
For a single pedit action, multiple offload entries may be used. Set the hw_stats_type to all of them. Fixes: 44f865801741 ("sched: act: allow user to specify type of HW stats for a filter") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16Merge branch 'net-stmmac-Use-readl_poll_timeout-to-simplify-the-code'David S. Miller2-22/+8
Dejin Zheng says: ==================== net: stmmac: Use readl_poll_timeout() to simplify the code This patch sets just for replace the open-coded loop to the readl_poll_timeout() helper macro for simplify the code in stmmac driver. v2 -> v3: - return whatever error code by readl_poll_timeout() returned. v1 -> v2: - no changed. I am a newbie and sent this patch a month ago (February 6th). So far, I have not received any comments or suggestion. I think it may be lost somewhere in the world, so resend it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16net: stmmac: use readl_poll_timeout() function in dwmac4_dma_reset()Dejin Zheng1-11/+4
The dwmac4_dma_reset() function use an open coded of readl_poll_timeout(). Replace the open coded handling with the proper function. Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16net: stmmac: use readl_poll_timeout() function in init_systime()Dejin Zheng1-11/+4
The init_systime() function use an open coded of readl_poll_timeout(). Replace the open coded handling with the proper function. Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16chcr: remove set but not used variable 'status'YueHaibing1-2/+1
drivers/crypto/chelsio/chcr_ktls.c: In function chcr_ktls_cpl_set_tcb_rpl: drivers/crypto/chelsio/chcr_ktls.c:662:11: warning: variable status set but not used [-Wunused-but-set-variable] commit 8a30923e1598 ("cxgb4/chcr: Save tx keys and handle HW response") involved this unused variable, remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)Era Mayflower3-15/+157
Netlink support of extended packet number cipher suites, allows adding and updating XPN macsec interfaces. Added support in: * Creating interfaces with GCM-AES-XPN-128 and GCM-AES-XPN-256 suites. * Setting and getting 64bit packet numbers with of SAs. * Setting (only on SA creation) and getting ssci of SAs. * Setting salt when installing a SAK. Added 2 cipher suite identifiers according to 802.1AE-2018 table 14-1: * MACSEC_CIPHER_ID_GCM_AES_XPN_128 * MACSEC_CIPHER_ID_GCM_AES_XPN_256 In addition, added 2 new netlink attribute types: * MACSEC_SA_ATTR_SSCI * MACSEC_SA_ATTR_SALT Depends on: macsec: Support XPN frame handling - IEEE 802.1AEbw. Signed-off-by: Era Mayflower <mayflowerera@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16macsec: Support XPN frame handling - IEEE 802.1AEbwEra Mayflower2-39/+136
Support extended packet number cipher suites (802.1AEbw) frames handling. This does not include the needed netlink patches. * Added xpn boolean field to `struct macsec_secy`. * Added ssci field to `struct_macsec_tx_sa` (802.1AE figure 10-5). * Added ssci field to `struct_macsec_rx_sa` (802.1AE figure 10-5). * Added salt field to `struct macsec_key` (802.1AE 10.7 NOTE 1). * Created pn_t type for easy access to lower and upper halves. * Created salt_t type for easy access to the "ssci" and "pn" parts. * Created `macsec_fill_iv_xpn` function to create IV in XPN mode. * Support in PN recovery and preliminary replay check in XPN mode. In addition, according to IEEE 802.1AEbw figure 10-5, the PN of incoming frame can be 0 when XPN cipher suite is used, so fixed the function `macsec_validate_skb` to fail on PN=0 only if XPN is off. Signed-off-by: Era Mayflower <mayflowerera@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15Merge branch 'net-dsa-improve-serdes-integration'David S. Miller7-535/+667
Russell King says: ==================== net: dsa: improve serdes integration Depends on "net: mii clause 37 helpers". Andrew Lunn mentioned that the Serdes PCS found in Marvell DSA switches does not automatically update the switch MACs with the link parameters. Currently, the DSA code implements a work-around for this. This series improves the Serdes integration, making use of the recent phylink changes to support split MAC/PCS setups. One noticable improvement for userspace is that ethtool can now report the link partner's advertisement. This repost has no changes compared to the previous posting; however, the regression Andrew had found which exists even without this patch set has now been fixed by Andrew and merged into the net-next tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15net: dsa: mv88e6xxx: use PHY_DETECT in mac_link_up/mac_link_downRussell King1-24/+29
Use the status of the PHY_DETECT bit to determine whether we need to force the MAC settings in mac_link_up() and mac_link_down(). Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15net: dsa: mv88e6xxx: remove port_link_state functionsRussell King4-232/+0
The port_link_state method is only used by mv88e6xxx_port_setup_mac(), which is now only called during port setup, rather than also being called via phylink's mac_config method. Remove this now unnecessary optimisation, which allows us to remove the port_link_state methods as well. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15net: dsa: mv88e6xxx: combine port_set_speed and port_set_duplexRussell King4-145/+107
Setting the speed independently of duplex makes little sense; the two parameters result from negotiation or fixed setup, and may have inter- dependencies. Moreover, they are always controlled via the same register - having them split means we have to read-modify-write this register twice. Combine the two operations into a single port_set_speed_duplex() operation. Not only is this more efficient, it reduces the size of the code as well. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15net: dsa: mv88e6xxx: fix Serdes link changesRussell King2-21/+13
phylink_mac_change() is supposed to be called with a 'false' argument if the link has gone down since it was last reported up; this is to ensure that link events along with renegotiation events are always correctly reported to userspace. Read the BMSR once when we have an interrupt, and report the link latched status to phylink via phylink_mac_change(). phylink will deal automatically with re-reading the link state once it has processed the link-down event. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15net: dsa: mv88e6xxx: extend phylink to Serdes PHYsRussell King4-74/+480
Extend the mv88e6xxx phylink implementation down to Serdes PHYs, which handle the PCS layer of such links. - Implement phylink PCS link state reading, so that we can provide ethtool with the linkmodes and link speed in the expected manner. Note: this will only be called for in-band negotiation, which is only supported by the serdes interfaces. - Implement phylink PCS configuration, so that the in-band AN and advertisement can be configured. - Implement phylink PCS negotiation restart, so that the in-band AN can be restarted. - Implement phylink PCS link up, so that when operating out-of-band, the Serdes can be configured for the appropriate fixed speed mode. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15net: dsa: mv88e6xxx: configure interface settings in mac_configRussell King1-32/+35
Only configure the interface settings in mac_config(), leaving the speed and duplex settings to mac_link_up to deal with. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15net: dsa: mv88e6xxx: use BMCR definitions for serdes control registerRussell King2-14/+5
The SGMII/1000base-X serdes register set is a clause 22 register set offset at 0x2000 in the PHYXS device. Rather than inventing our own defintions, use those that already exist, and name the register MV88E6390_SGMII_BMCR. Also remove the unused MV88E6390_SGMII_STATUS definitions. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15net: dsa: warn if phylink_mac_link_state returns errorRussell King1-1/+6
Issue a warning to the kernel log if phylink_mac_link_state() returns an error. This should not occur, but let's make it visible. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15Merge branch 'net-mii-clause-37-helpers'David S. Miller1-18/+39
Russell King says: ==================== net: mii clause 37 helpers This is a re-post of two patches that are common to two series that I've sent in recent weeks; I'm re-posting them separately in the hope that they can be merged. No changes from either of the previous postings. These patches: 1. convert the existing (unused) mii_lpa_to_ethtool_lpa_x() function to a linkmode variant. 2. add a helper for clause 37 advertisements, supporting both the 1000baseX and defacto 2500baseX variants. Note that ethtool does not support half duplex for either of these, and we make no effort to do so. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15net: mii: add linkmode_adv_to_mii_adv_x()Russell King1-0/+20
Add a helper to convert a linkmode advertisement to a clause 37 advertisement value for 1000base-x and 2500base-x. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15net: mii: convert mii_lpa_to_ethtool_lpa_x() to linkmode variantRussell King1-18/+19
Add a LPA to linkmode decoder for 1000BASE-X protocols; this decoder only provides the modify semantics similar to other such decoders. This replaces the unused mii_lpa_to_ethtool_lpa_x() helper. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15netfilter: conntrack: re-visit sysctls in unprivileged namespacesFlorian Westphal1-11/+8
since commit b884fa46177659 ("netfilter: conntrack: unify sysctl handling") conntrack no longer exposes most of its sysctls (e.g. tcp timeouts settings) to network namespaces that are not owned by the initial user namespace. This patch exposes all sysctls even if the namespace is unpriviliged. compared to a 4.19 kernel, the newly visible and writeable sysctls are: net.netfilter.nf_conntrack_acct net.netfilter.nf_conntrack_timestamp .. to allow to enable accouting and timestamp extensions. net.netfilter.nf_conntrack_events .. to turn off conntrack event notifications. net.netfilter.nf_conntrack_checksum .. to disable checksum validation. net.netfilter.nf_conntrack_log_invalid .. to enable logging of packets deemed invalid by conntrack. newly visible sysctls that are only exported as read-only: net.netfilter.nf_conntrack_count .. current number of conntrack entries living in this netns. net.netfilter.nf_conntrack_max .. global upperlimit (maximum size of the table). net.netfilter.nf_conntrack_buckets .. size of the conntrack table (hash buckets). net.netfilter.nf_conntrack_expect_max .. maximum number of permitted expectations in this netns. net.netfilter.nf_conntrack_helper .. conntrack helper auto assignment. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15netfilter: nft_lookup: update element stateful expressionPablo Neira Ayuso1-0/+1
If the set element comes with an stateful expression, update it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15netfilter: nf_tables: add nft_set_elem_update_expr() helper functionPablo Neira Ayuso2-7/+13
This helper function runs the eval path of the stateful expression of an existing set element. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15netfilter: nf_tables: add elements with stateful expressionsPablo Neira Ayuso1-1/+20
Update nft_add_set_elem() to handle the NFTA_SET_ELEM_EXPR netlink attribute. This patch allows users to to add elements with stateful expressions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15netfilter: nf_tables: statify nft_expr_init()Pablo Neira Ayuso2-4/+2
Not exposed anymore to modules, statify this function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15netfilter: nf_tables: add nft_set_elem_expr_alloc()Pablo Neira Ayuso3-13/+36
Add helper function to create stateful expression. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15nft_set_pipapo: Prepare for single ranged field usageStefano Brivio3-8/+17
A few adjustments in nft_pipapo_init() are needed to allow usage of this set back-end for a single, ranged field. Provide a convenient NFT_PIPAPO_MIN_FIELDS definition that currently makes sure that the rbtree back-end is selected instead, for sets with a single field. This finally allows a fair comparison with rbtree sets, by defining NFT_PIPAPO_MIN_FIELDS as 0 and skipping rbtree back-end initialisation: ---------------.--------------------------.-------------------------. AMD Epyc 7402 | baselines, Mpps | Mpps, % over rbtree | 1 thread |__________________________|_________________________| 3.35GHz | | | | | | 768KiB L1D$ | netdev | hash | rbtree | | pipapo | ---------------| hook | no | single | pipapo |single field| type entries | drop | ranges | field |single field| AVX2 | ---------------|--------|--------|--------|------------|------------| net,port | | | | | | 1000 | 19.0 | 10.4 | 3.8 | 6.0 +58% | 9.6 +153% | ---------------|--------|--------|--------|------------|------------| port,net | | | | | | 100 | 18.8 | 10.3 | 5.8 | 9.1 +57% |11.6 +100% | ---------------|--------|--------|--------|------------|------------| net6,port | | | | | | 1000 | 16.4 | 7.6 | 1.8 | 2.8 +55% | 6.5 +261% | ---------------|--------|--------|--------|------------|------------| port,proto | | | | [1] | [1] | 30000 | 19.6 | 11.6 | 3.9 | 0.9 -77% | 2.7 -31% | ---------------|--------|--------|--------|------------|------------| port,proto | | | | | | 10000 | 19.6 | 11.6 | 4.4 | 2.1 -52% | 5.6 +27% | ---------------|--------|--------|--------|------------|------------| port,proto | | | | | | 4 threads 10000| 77.9 | 45.1 | 17.4 | 8.3 -52% |22.4 +29% | ---------------|--------|--------|--------|------------|------------| net6,port,mac | | | | | | 10 | 16.5 | 5.4 | 4.3 | 4.5 +5% | 8.2 +91% | ---------------|--------|--------|--------|------------|------------| net6,port,mac, | | | | | | proto 1000 | 16.5 | 5.7 | 1.9 | 2.8 +47% | 6.6 +247% | ---------------|--------|--------|--------|------------|------------| net,mac | | | | | | 1000 | 19.0 | 8.4 | 3.9 | 6.0 +54% | 9.9 +154% | ---------------'--------'--------'--------'------------'------------' [1] Causes switch of lookup table buckets for 'port' to 4-bit groups Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15nft_set_pipapo: Introduce AVX2-based lookup implementationStefano Brivio6-0/+1270
If the AVX2 set is available, we can exploit the repetitive characteristic of this algorithm to provide a fast, vectorised version by using 256-bit wide AVX2 operations for bucket loads and bitwise intersections. In most cases, this implementation consistently outperforms rbtree set instances despite the fact they are configured to use a given, single, ranged data type out of the ones used for performance measurements by the nft_concat_range.sh kselftest. That script, injecting packets directly on the ingoing device path with pktgen, reports, averaged over five runs on a single AMD Epyc 7402 thread (3.35GHz, 768 KiB L1D$, 12 MiB L2$), the figures below. CONFIG_RETPOLINE was not set here. Note that this is not a fair comparison over hash and rbtree set types: non-ranged entries (used to have a reference for hash types) would be matched faster than this, and matching on a single field only (which is the case for rbtree) is also significantly faster. However, it's not possible at the moment to choose this set type for non-ranged entries, and the current implementation also needs a few minor adjustments in order to match on less than two fields. ---------------.-----------------------------------.------------. AMD Epyc 7402 | baselines, Mpps | this patch | 1 thread |___________________________________|____________| 3.35GHz | | | | | | 768KiB L1D$ | netdev | hash | rbtree | | | ---------------| hook | no | single | | pipapo | type entries | drop | ranges | field | pipapo | AVX2 | ---------------|--------|--------|--------|--------|------------| net,port | | | | | | 1000 | 19.0 | 10.4 | 3.8 | 4.0 | 7.5 +87% | ---------------|--------|--------|--------|--------|------------| port,net | | | | | | 100 | 18.8 | 10.3 | 5.8 | 6.3 | 8.1 +29% | ---------------|--------|--------|--------|--------|------------| net6,port | | | | | | 1000 | 16.4 | 7.6 | 1.8 | 2.1 | 4.8 +128% | ---------------|--------|--------|--------|--------|------------| port,proto | | | | | | 30000 | 19.6 | 11.6 | 3.9 | 0.5 | 2.6 +420% | ---------------|--------|--------|--------|--------|------------| net6,port,mac | | | | | | 10 | 16.5 | 5.4 | 4.3 | 3.4 | 4.7 +38% | ---------------|--------|--------|--------|--------|------------| net6,port,mac, | | | | | | proto 1000 | 16.5 | 5.7 | 1.9 | 1.4 | 3.6 +26% | ---------------|--------|--------|--------|--------|------------| net,mac | | | | | | 1000 | 19.0 | 8.4 | 3.9 | 2.5 | 6.4 +156% | ---------------'--------'--------'--------'--------'------------' A similar strategy could be easily reused to implement specialised versions for other SIMD sets, and I plan to post at least a NEON version at a later time. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15nft_set_pipapo: Prepare for vectorised implementation: helpersStefano Brivio2-261/+285
Move most macros and helpers to a header file, so that they can be conveniently used by related implementations. No functional changes are intended here. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15nft_set_pipapo: Prepare for vectorised implementation: alignmentStefano Brivio1-25/+110
SIMD vector extension sets require stricter alignment than native instruction sets to operate efficiently (AVX, NEON) or for some instructions to work at all (AltiVec). Provide facilities to define arbitrary alignment for lookup tables and scratch maps. By defining byte alignment with NFT_PIPAPO_ALIGN, lt_aligned and scratch_aligned pointers become available. Additional headroom is allocated, and pointers to the possibly unaligned, originally allocated areas are kept so that they can be freed. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>