summaryrefslogtreecommitdiffstats
path: root/Documentation/networking
AgeCommit message (Collapse)AuthorFilesLines
2021-10-25net/mlx5: SF_DEV Add SF device trace pointsParav Pandit1-0/+21
Add SF device add and delete specific trace points. echo mlx5:mlx5_sf_dev_add >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_dev_del >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_vhca_event >> /sys/kernel/debug/tracing/set_event Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: SF, Add SF trace pointsParav Pandit1-0/+37
Add support for trace events for SFs to improve debugging. This covers (a) port add and free trace points (b) device level trace points (c) SF hardware context add, free trace points. (d) SF function activate/deacticate and state trace points SF events examples: echo mlx5:mlx5_sf_add >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_free >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_hwc_alloc >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_hwc_free >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_hwc_deferred_free >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_update_state >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_activate >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_deactivate >> /sys/kernel/debug/tracing/set_event Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Let user configure max_macs paramShay Drory1-0/+4
Currently, max_macs is taking 70Kbytes of memory per function. This size is not needed in all use cases, and is critical with large scale. Hence, allow user to configure the number of max_macs. For example, to reduce the number of max_macs to 1, execute:: $ devlink dev param set pci/0000:00:0b.0 name max_macs value 1 \ cmode driverinit $ devlink dev reload pci/0000:00:0b.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Let user configure event_eq_size paramShay Drory1-0/+4
Event EQ is an EQ which received the notification of almost all the events generated by the NIC. Currently, each event EQ is taking 512KB of memory. This size is not needed in most use cases, and is critical with large scale. Hence, allow user to configure the size of the event EQ. For example to reduce event EQ size to 64, execute:: $ devlink resource set pci/0000:00:0b.0 path /event_eq_size/ size 64 $ devlink dev reload pci/0000:00:0b.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Let user configure io_eq_size paramShay Drory1-0/+12
Currently, each I/O EQ is taking 128KB of memory. This size is not needed in all use cases, and is critical with large scale. Hence, allow user to configure the size of I/O EQs. For example, to reduce I/O EQ size to 64, execute: $ devlink resource set pci/0000:00:0b.0 path /io_eq_size/ size 64 $ devlink dev reload pci/0000:00:0b.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Print health buffer by log levelAya Levin1-0/+2
Add log macro which gets log level as a parameter. Use the severity read from the health buffer and the new log macro to log the health buffer with severity as log level. Prior to this patch, health buffer was printed in error log level regardless of its severity. Now the user may filter dmesg (--level) or change kernel log level to focus on different severity levels of firmware errors. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2-9/+10
Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18mctp: unify sockaddr_mctp typesJeremy Kerr1-5/+5
Use the more precise __kernel_sa_family_t for smctp_family, to match struct sockaddr. Also, use an unsigned int for the network member; negative networks don't make much sense. We're already using unsigned for mctp_dev and mctp_skb_cb, but need to change mctp_sock to suit. Fixes: 60fc63981693 ("mctp: Add sockaddr_mctp to uapi") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Acked-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-14ice: Print the api_patch as part of the fw.mgmt.apiBrett Creeley1-4/+5
Currently when a user uses "devlink dev info", the fw.mgmt.api will be the major.minor numbers as shown below: devlink dev info pci/0000:3b:00.0 pci/0000:3b:00.0: driver ice serial_number 00-01-00-ff-ff-00-00-00 versions: fixed: board.id K91258-000 running: fw.mgmt 6.1.2 fw.mgmt.api 1.7 <--- No patch number included fw.mgmt.build 0xd75e7d06 fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.app.name ICE OS Default Package fw.app 1.3.27.0 fw.app.bundle_id 0xc0000001 fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 stored: fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 There are many features in the driver that depend on the major, minor, and patch version of the FW. Without the patch number in the output for fw.mgmt.api debugging issues related to the FW API version is difficult. Also, using major.minor.patch aligns with the existing firmware version which uses a 3 digit value. Fix this by making the fw.mgmt.api print the major.minor.patch versions. Shown below is the result: devlink dev info pci/0000:3b:00.0 pci/0000:3b:00.0: driver ice serial_number 00-01-00-ff-ff-00-00-00 versions: fixed: board.id K91258-000 running: fw.mgmt 6.1.2 fw.mgmt.api 1.7.9 <--- patch number included fw.mgmt.build 0xd75e7d06 fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.app.name ICE OS Default Package fw.app 1.3.27.0 fw.app.bundle_id 0xc0000001 fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 stored: fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 Fixes: ff2e5c700e08 ("ice: add basic handler for devlink .info_get") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ipvs: add sysctl_run_estimation to support disable estimationDust Li1-0/+11
estimation_timer will iterate the est_list to do estimation for each ipvs stats. When there are lots of services, the list can be very large. We found that estimation_timer() run for more then 200ms on a machine with 104 CPU and 50K services. yunhong-cgl jiang report the same phenomenon before: https://www.spinics.net/lists/lvs-devel/msg05426.html In some cases(for example a large K8S cluster with many ipvs services), ipvs estimation may not be needed. So adding a sysctl blob to allow users to disable this completely. Default is: 1 (enable) Cc: yunhong-cgl jiang <xintian1976@gmail.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-06ethtool: Add transceiver module extended stateIdo Schimmel1-0/+10
Add an extended state and sub-state to describe link issues related to transceiver modules. The 'ETHTOOL_LINK_EXT_SUBSTATE_MODULE_CMIS_NOT_READY' extended sub-state tells user space that port is unable to gain a carrier because the CMIS Module State Machine did not reach the ModuleReady (Fully Operational) state. For example, if the module is stuck at ModuleLowPwr or ModuleFault state. In case of the latter, user space can read the fault reason from the module's EEPROM and potentially reset it. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-06ethtool: Add ability to control transceiver modules' power modeIdo Schimmel1-2/+69
Add a pair of new ethtool messages, 'ETHTOOL_MSG_MODULE_SET' and 'ETHTOOL_MSG_MODULE_GET', that can be used to control transceiver modules parameters and retrieve their status. The first parameter to control is the power mode of the module. It is only relevant for paged memory modules, as flat memory modules always operate in low power mode. When a paged memory module is in low power mode, its power consumption is reduced to the minimum, the management interface towards the host is available and the data path is deactivated. User space can choose to put modules that are not currently in use in low power mode and transition them to high power mode before putting the associated ports administratively up. This is useful for user space that favors reduced power consumption and lower temperatures over reduced link up times. In QSFP-DD modules the transition from low power mode to high power mode can take a few seconds and this transition is only expected to get longer with future / more complex modules. User space can control the power mode of the module via the power mode policy attribute ('ETHTOOL_A_MODULE_POWER_MODE_POLICY'). Possible values: * high: Module is always in high power mode. * auto: Module is transitioned by the host to high power mode when the first port using it is put administratively up and to low power mode when the last port using it is put administratively down. The operational power mode of the module is available to user space via the 'ETHTOOL_A_MODULE_POWER_MODE' attribute. The attribute is not reported to user space when a module is not plugged-in. The user API is designed to be generic enough so that it could be used for modules with different memory maps (e.g., SFF-8636, CMIS). The only implementation of the device driver API in this series is for a MAC driver (mlxsw) where the module is controlled by the device's firmware, but it is designed to be generic enough so that it could also be used by implementations where the module is controlled by the CPU. CMIS testing ============ # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x03 (ModuleReady) LowPwrAllowRequestHW : Off LowPwrRequestSW : Off The module is not in low power mode, as it is not forced by hardware (LowPwrAllowRequestHW is off) or by software (LowPwrRequestSW is off). The power mode can be queried from the kernel. In case LowPwrAllowRequestHW was on, the kernel would need to take into account the state of the LowPwrRequestHW signal, which is not visible to user space. $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy high power-mode high Change the power mode policy to 'auto': # ethtool --set-module swp11 power-mode-policy auto Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x01 (ModuleLowPwr) LowPwrAllowRequestHW : Off LowPwrRequestSW : On Put the associated port administratively up which will instruct the host to transition the module to high power mode: # ip link set dev swp11 up Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode high Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x03 (ModuleReady) LowPwrAllowRequestHW : Off LowPwrRequestSW : Off Put the associated port administratively down which will instruct the host to transition the module to low power mode: # ip link set dev swp11 down Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x01 (ModuleLowPwr) LowPwrAllowRequestHW : Off LowPwrRequestSW : On SFF-8636 testing ================ # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled Power set : Off Power override : On ... Transmit avg optical power (Channel 1) : 0.7733 mW / -1.12 dBm Transmit avg optical power (Channel 2) : 0.7649 mW / -1.16 dBm Transmit avg optical power (Channel 3) : 0.7790 mW / -1.08 dBm Transmit avg optical power (Channel 4) : 0.7837 mW / -1.06 dBm Rcvr signal avg optical power(Channel 1) : 0.9302 mW / -0.31 dBm Rcvr signal avg optical power(Channel 2) : 0.9079 mW / -0.42 dBm Rcvr signal avg optical power(Channel 3) : 0.8993 mW / -0.46 dBm Rcvr signal avg optical power(Channel 4) : 0.8778 mW / -0.57 dBm The module is not in low power mode, as it is not forced by hardware (Power override is on) or by software (Power set is off). The power mode can be queried from the kernel. In case Power override was off, the kernel would need to take into account the state of the LPMode signal, which is not visible to user space. $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy high power-mode high Change the power mode policy to 'auto': # ethtool --set-module swp13 power-mode-policy auto Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled Power set : On Power override : On ... Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm Put the associated port administratively up which will instruct the host to transition the module to high power mode: # ip link set dev swp13 up Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode high Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled Power set : Off Power override : On ... Transmit avg optical power (Channel 1) : 0.7934 mW / -1.01 dBm Transmit avg optical power (Channel 2) : 0.7859 mW / -1.05 dBm Transmit avg optical power (Channel 3) : 0.7885 mW / -1.03 dBm Transmit avg optical power (Channel 4) : 0.7985 mW / -0.98 dBm Rcvr signal avg optical power(Channel 1) : 0.9325 mW / -0.30 dBm Rcvr signal avg optical power(Channel 2) : 0.9034 mW / -0.44 dBm Rcvr signal avg optical power(Channel 3) : 0.9086 mW / -0.42 dBm Rcvr signal avg optical power(Channel 4) : 0.8885 mW / -0.51 dBm Put the associated port administratively down which will instruct the host to transition the module to low power mode: # ip link set dev swp13 down Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled Power set : On Power override : On ... Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-02net: wwan: iosm: correct devlink extra paramsM Chetan Kumar1-26/+6
1. Removed driver specific extra params like download_region, address & region_count. The required information is passed as part of flash API. 2. IOSM Devlink documentation updated to reflect the same. Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-01devlink: report maximum number of snapshots with regionsJacob Keller2-2/+6
Each region has an independently configurable number of maximum snapshots. This information is not reported to userspace, making it not very discoverable. Fix this by adding a new DEVLINK_ATTR_REGION_MAX_SNAPSHOST attribute which is used to report this maximum. Ex: $devlink region pci/0000:af:00.0/nvm-flash: size 10485760 snapshot [] max 1 pci/0000:af:00.0/device-caps: size 4096 snapshot [] max 10 pci/0000:af:00.1/nvm-flash: size 10485760 snapshot [] max 1 pci/0000:af:00.1/device-caps: size 4096 snapshot [] max 10 This information enables users to understand why a new region command may fail due to having too many existing snapshots. Reported-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29doc/mctp: Add a little detail about kernel internalsJeremy Kerr1-0/+59
Describe common flows and refcounting behaviour. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-2/+2
net/mptcp/protocol.c 977d293e23b4 ("mptcp: ensure tx skbs always have the MPTCP ext") efe686ffce01 ("mptcp: ensure tx skbs always have the MPTCP ext") same patch merged in both trees, keep net-next. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-23tcp: remove sk_{tr}x_skb_cacheEric Dumazet1-8/+0
This reverts the following patches : - commit 2e05fcae83c4 ("tcp: fix compile error if !CONFIG_SYSCTL") - commit 4f661542a402 ("tcp: fix zerocopy and notsent_lowat issues") - commit 472c2e07eef0 ("tcp: add one skb cache for tx") - commit 8b27dae5a2e8 ("tcp: add one skb cache for rx") Having a cache of one skb (in each direction) per TCP socket is fragile, since it can cause a significant increase of memory needs, and not good enough for high speed flows anyway where more than one skb is needed. We want instead to add a generic infrastructure, with more flexible per-cpu caches, for alien NUMA nodes. Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-21Doc: networking: Fox a typo in ice.rstMasanari Iida1-1/+1
This patch fixes a spelling typo in ice.rst Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-20net: wwan: iosm: devlink fw flashing & cd collection documentationM Chetan Kumar2-0/+183
Documents devlink params, fw update & cd collection commands and its usage. Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19docs: net: dsa: sja1105: fix reference to sja1105.txtAlejandro Concepcion-Rodriguez1-1/+1
The file sja1105.txt was converted to nxp,sja1105.yaml. Signed-off-by: Alejandro Concepcion-Rodriguez <asconcepcion@acoro.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-07Merge tag 'net-5.15-rc1' of ↵Linus Torvalds1-5/+8
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes and stragglers from Jakub Kicinski: "Networking stragglers and fixes, including changes from netfilter, wireless and can. Current release - regressions: - qrtr: revert check in qrtr_endpoint_post(), fixes audio and wifi - ip_gre: validate csum_start only on pull - bnxt_en: fix 64-bit doorbell operation on 32-bit kernels - ionic: fix double use of queue-lock, fix a sleeping in atomic - can: c_can: fix null-ptr-deref on ioctl() - cs89x0: disable compile testing on powerpc Current release - new code bugs: - bridge: mcast: fix vlan port router deadlock, consistently disable BH Previous releases - regressions: - dsa: tag_rtl4_a: fix egress tags, only port 0 was working - mptcp: fix possible divide by zero - netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex - netfilter: socket: icmp6: fix use-after-scope - stmmac: fix MAC not working when system resume back with WoL active Previous releases - always broken: - ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address - seg6: set fc_nlinfo in nh_create_ipv4, nh_create_ipv6 - mptcp: only send extra TCP acks in eligible socket states - dsa: lantiq_gswip: fix maximum frame length - stmmac: fix overall budget calculation for rxtx_napi - bnxt_en: fix firmware version reporting via devlink - renesas: sh_eth: add missing barrier to fix freeing wrong tx descriptor Stragglers: - netfilter: conntrack: switch to siphash - netfilter: refuse insertion if chain has grown too large - ncsi: add get MAC address command to get Intel i210 MAC address" * tag 'net-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits) ieee802154: Remove redundant initialization of variable ret net: stmmac: fix MAC not working when system resume back with WoL active net: phylink: add suspend/resume support net: renesas: sh_eth: Fix freeing wrong tx descriptor bonding: 3ad: pass parameter bond_params by reference cxgb3: fix oops on module removal can: c_can: fix null-ptr-deref on ioctl() can: rcar_canfd: add __maybe_unused annotation to silence warning net: wwan: iosm: Unify IO accessors used in the driver net: wwan: iosm: Replace io.*64_lo_hi() with regular accessors net: qcom/emac: Replace strlcpy with strscpy ip6_gre: Revert "ip6_gre: add validation for csum_start" net: hns3: make hclgevf_cmd_caps_bit_map0 and hclge_cmd_caps_bit_map0 static selftests/bpf: Test XDP bonding nest and unwind bonding: Fix negative jump label count on nested bonding MAINTAINERS: add VM SOCKETS (AF_VSOCK) entry stmmac: dwmac-loongson:Fix missing return value iwlwifi: fix printk format warnings in uefi.c net: create netdev->dev_addr assignment helpers bnxt_en: Fix possible unintended driver initiated error recovery ...
2021-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski1-5/+8
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Protect nft_ct template with global mutex, from Pavel Skripkin. 2) Two recent commits switched inet rt and nexthop exception hashes from jhash to siphash. If those two spots are problematic then conntrack is affected as well, so switch voer to siphash too. While at it, add a hard upper limit on chain lengths and reject insertion if this is hit. Patches from Florian Westphal. 3) Fix use-after-scope in nf_socket_ipv6 reported by KASAN, from Benjamin Hesmans. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: socket: icmp6: fix use-after-scope netfilter: refuse insertion if chain has grown too large netfilter: conntrack: switch to siphash netfilter: conntrack: sanitize table size default settings netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex ==================== Link: https://lore.kernel.org/r/20210903163020.13741-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-01Merge tag 'docs-5.15' of git://git.lwn.net/linuxLinus Torvalds1-0/+1
Pull documentation updates from Jonathan Corbet: "Yet another set of documentation changes: - A reworking of PDF generation to yield better results for documents using CJK fonts in particular. - A new set of translations into traditional Chinese, a dialect for which I am assured there is a community of interested readers. - A lot more regular Chinese translation work as well. ... plus the usual assortment of updates, fixes, typo tweaks, etc" * tag 'docs-5.15' of git://git.lwn.net/linux: (55 commits) docs: sphinx-requirements: Move sphinx_rtd_theme to top docs: pdfdocs: Enable language-specific font choice of zh_TW translations docs: pdfdocs: Teach xeCJK about character classes of quotation marks docs: pdfdocs: Permit AutoFakeSlant for CJK fonts docs: pdfdocs: One-half spacing for CJK translations docs: pdfdocs: Add conf.py local to translations for ascii-art alignment docs: pdfdocs: Preserve inter-phrase space in Korean translations docs: pdfdocs: Choose Serif font as CJK mainfont if possible docs: pdfdocs: Add CJK-language-specific font settings docs: pdfdocs: Refactor config for CJK document scripts/kernel-doc: Override -Werror from KCFLAGS with KDOC_WERROR docs/zh_CN: Add zh_CN/accounting/psi.rst doc: align Italian translation Documentation/features/vm: riscv supports THP now docs/zh_CN: add infiniband user_verbs translation docs/zh_CN: add infiniband user_mad translation docs/zh_CN: add infiniband tag_matching translation docs/zh_CN: add infiniband sysfs translation docs/zh_CN: add infiniband opa_vnic translation docs/zh_CN: add infiniband ipoib translation ...
2021-08-30Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski1-2/+2
Daniel Borkmann says: ==================== bpf-next 2021-08-31 We've added 116 non-merge commits during the last 17 day(s) which contain a total of 126 files changed, 6813 insertions(+), 4027 deletions(-). The main changes are: 1) Add opaque bpf_cookie to perf link which the program can read out again, to be used in libbpf-based USDT library, from Andrii Nakryiko. 2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu. 3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang. 4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch to another congestion control algorithm during init, from Martin KaFai Lau. 5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima. 6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta. 7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT} progs, from Xu Liu and Stanislav Fomichev. 8) Support for __weak typed ksyms in libbpf, from Hao Luo. 9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky. 10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov. 11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann. 12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian, Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others. 13) Another big batch to revamp XDP samples in order to give them consistent look and feel, from Kumar Kartikeya Dwivedi. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits) MAINTAINERS: Remove self from powerpc BPF JIT selftests/bpf: Fix potential unreleased lock samples: bpf: Fix uninitialized variable in xdp_redirect_cpu selftests/bpf: Reduce more flakyness in sockmap_listen bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS bpf: selftests: Add dctcp fallback test bpf: selftests: Add connect_to_fd_opts to network_helpers bpf: selftests: Add sk_state to bpf_tcp_helpers.h bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt selftests: xsk: Preface options with opt selftests: xsk: Make enums lower case selftests: xsk: Generate packets from specification selftests: xsk: Generate packet directly in umem selftests: xsk: Simplify cleanup of ifobjects selftests: xsk: Decrease sending speed selftests: xsk: Validate tx stats on tx thread selftests: xsk: Simplify packet validation in xsk tests selftests: xsk: Rename worker_* functions that are not thread entry points selftests: xsk: Disassociate umem size with packets sent selftests: xsk: Remove end-of-test packet ... ==================== Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-0/+7
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Clean up and consolidate ct ecache infrastructure by merging ct and expect notifiers, from Florian Westphal. 2) Missing counters and timestamp in nfnetlink_queue and _log conntrack information. 3) Missing error check for xt_register_template() in iptables mangle, as a incremental fix for the previous pull request, also from Florian Westphal. 4) Add netfilter hooks for the SRv6 lightweigh tunnel driver, from Ryoga Sato. The hooks are enabled via nf_hooks_lwtunnel sysctl to make sure existing netfilter rulesets do not break. There is a static key to disable the hooks by default. The pktgen_bench_xmit_mode_netif_receive.sh shows no noticeable impact in the seg6_input path for non-netfilter users: similar numbers with and without this patch. This is a sample of the perf report output: 11.67% kpktgend_0 [ipv6] [k] ipv6_get_saddr_eval 7.89% kpktgend_0 [ipv6] [k] __ipv6_addr_label 7.52% kpktgend_0 [ipv6] [k] __ipv6_dev_get_saddr 6.63% kpktgend_0 [kernel.vmlinux] [k] asm_exc_nmi 4.74% kpktgend_0 [ipv6] [k] fib6_node_lookup_1 3.48% kpktgend_0 [kernel.vmlinux] [k] pskb_expand_head 3.33% kpktgend_0 [ipv6] [k] ip6_rcv_core.isra.29 3.33% kpktgend_0 [ipv6] [k] seg6_do_srh_encap 2.53% kpktgend_0 [ipv6] [k] ipv6_dev_get_saddr 2.45% kpktgend_0 [ipv6] [k] fib6_table_lookup 2.24% kpktgend_0 [kernel.vmlinux] [k] ___cache_free 2.16% kpktgend_0 [ipv6] [k] ip6_pol_route 2.11% kpktgend_0 [kernel.vmlinux] [k] __ipv6_addr_type ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30netfilter: conntrack: sanitize table size default settingsFlorian Westphal1-5/+8
conntrack has two distinct table size settings: nf_conntrack_max and nf_conntrack_buckets. The former limits how many conntrack objects are allowed to exist in each namespace. The second sets the size of the hashtable. As all entries are inserted twice (once for original direction, once for reply), there should be at least twice as many buckets in the table than the maximum number of conntrack objects that can exist at the same time. Change the default multiplier to 1 and increase the chosen bucket sizes. This results in the same nf_conntrack_max settings as before but reduces the average bucket list length. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-30netfilter: add netfilter hooks to SRv6 data planeRyoga Saito1-0/+7
This patch introduces netfilter hooks for solving the problem that conntrack couldn't record both inner flows and outer flows. This patch also introduces a new sysctl toggle for enabling lightweight tunnel netfilter hooks. Signed-off-by: Ryoga Saito <contact@proelbtn.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25pktgen: document the latest pktgen usage optionsJuhee Kang1-10/+8
Currently, the pktgen.rst documentation doesn't cover the latest pktgen sample usage options such as count and IPv6, and so on. Also, this documentation includes the old sample scripts which are no longer use because it was removed by the commit a4b6ade8359f ("samples/pktgen : remove remaining old pktgen sample scripts") Thus, this commit documents pktgen sample usage using the latest options and removes old sample scripts, and fixes a minor typo. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24ethtool: add two coalesce attributes for CQE modeYufeng Mo1-0/+15
Currently, there are many drivers who support CQE mode configuration, some configure it as a fixed when initialized, some provide an interface to change it by ethtool private flags. In order to make it more generic, add two new 'ETHTOOL_A_COALESCE_USE_CQE_TX' and 'ETHTOOL_A_COALESCE_USE_CQE_RX' coalesce attributes, then these parameters can be accessed by ethtool netlink coalesce uAPI. Also add an new structure kernel_ethtool_coalesce, then the new parameter can be added into this struct. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-23doc: Document unexpected tcp_l3mdev_accept=1 behaviorBenjamin Poirier1-0/+13
As suggested by David, document a somewhat unexpected behavior that results from net.ipv4.tcp_l3mdev_accept=1. This behavior was encountered while debugging FRR, a VRF-aware application, on a system which used net.ipv4.tcp_l3mdev_accept=1 and where TCP connections for BGP with MD5 keys were failing to establish. Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-22docs: net: dsa: document the new methods for bridge TX forwarding offloadVladimir Oltean1-0/+16
Two new methods have been introduced, add some verbiage about what they do. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-22docs: net: dsa: remove references to struct dsa_device_ops::filterVladimir Oltean1-13/+0
This function has disappeared in commit edac6f6332d9 ("Revert "net: dsa: Allow drivers to filter packets they can decode source port from""). Also, since commit 4e50025129ef ("net: dsa: generalize overhead for taggers that use both headers and trailers"), the next paragraph is no longer true (it is still discouraged to do that, but it is now supported, so no point in mentioning it). Delete. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-22docs: net: dsa: sja1105: update list of limitationsVladimir Oltean1-217/+1
Remove the paragraphs that talk about the various modes of traffic support, bridging with foreign interfaces, etc etc. There is nothing that the user needs to know now, it should all work out of the box as expected. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-22docs: devlink: remove the references to sja1105Vladimir Oltean2-50/+0
The sja1105 driver has removed its devlink params, so there is nothing to see here. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20Merge tag 'mlx5-updates-2021-08-19' of ↵David S. Miller1-0/+44
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-08-19 This series introduces the support for two new mlx5 features: 1) Sample offload for tunneled traffic 2) devlink rate objects support 1) From Chris Mi: Sample offload for tunneled traffic ===================================================== Background and solution ----------------------- Currently the sample offload actions send the encapsulated packet to software. This series de-capsulates the packet before performing the sampling and set the tunnel properties on the skb metadata fields to make the behavior consistent with OVS sFlow. If de-capsulating first, we can't use the same match like before in default table. So instantiate a post action instance to continue processing the action list. If HW can preserve reg_c, also use the post action instance. Post action infrastructure -------------------------- Some tc actions are modeled in hardware using multiple tables causing a tc action list split. For example, CT action is modeled by jumping to a ct table which is controlled by nf flow table. sFlow jumps in hardware to a sample table, which continues to a "default table" where it should continue processing the action list. Multi table actions are modeled in hardware using a unique fte_id. The fte_id is set before jumping to a table. Split actions continue to a post-action table where the matched fte_id value continues the execution the tc action list. This series also introduces post action infrastructure. Both ct and sample use it. Sample for tunnel in TC SW -------------------------- tc filter add dev vxlan1 protocol ip parent ffff: prio 3 \ flower src_mac 24:25:d0:e1:00:00 dst_mac 02:25:d0:13:01:02 \ enc_src_ip 192.168.1.14 enc_dst_ip 192.168.1.13 \ enc_dst_port 4789 enc_key_id 4 \ action sample rate 1 group 6 \ action tunnel_key unset \ action mirred egress redirect dev enp4s0f0_1 MLX5 sample HW offload ---------------------- For the following typical flow table: +-------------------------------+ + original flow table + +-------------------------------+ + original match + +-------------------------------+ + sample action + other actions + +-------------------------------+ We translate the tc filter with sample action to the following HW model: +---------------------+ + original flow table + +---------------------+ + original match + +---------------------+ | set fte_id (if reg_c preserve cap) | do decap v +------------------------------------------------+ + Flow Sampler Object + +------------------------------------------------+ + sample ratio + +------------------------------------------------+ + sample table id | default table id + +------------------------------------------------+ | | v v +-----------------------------+ +-------------------+ + sample table + + default table + +-----------------------------+ +-------------------+ + forward to management vport + | +-----------------------------+ | +-------+------+ | |reg_c preserve cap | |or decap action v v +-----------------+ +-------------+ + per vport table + + post action + +-----------------+ +-------------+ + original match + +-----------------+ + other actions + +-----------------+ 2) From Dmytro Linkin: devlink rate object support for mlx5_core driver ======================================================================= HIGH-LEVEL OVERVIEW Devlink leaf rate objects created per vport (VF/SF, and PF on BlueField) in switchdev mode on devlink port registration. Implement devlink ops callbacks to create/destroy rate groups, set TX rate values of the vport/group, assign vport to the group. Driver accepts TX rate values as fraction of 1Mbps. Refactor existing eswitch QoS infrastructure to be accessible by legacy NDO rate API and new devlink rate API. NDO rate API is not removed/disabled in switchdev mode to not break existing users. Rate values configured with NDO rate API are not visible for devlink infrastructure, therefore APIs should not be used simultaneously. IMPLEMENTATION DETAILS Driver provide two level rate hierarchy to manage bandwidth - group level and vport level. Initially each vport added to internal unlimited group created by default. Each rate element (vport or group) receive bandwidth relative to its parent element (for groups the parent is a physical link itself) in a Round Robin manner, where element get bandwidth value according to its weight. Example: Created four rate groups with tx_share limits: $ devlink port function rate add \ pci/0000:06:00.0/group_1 tx_share 30gbit $ devlink port function rate add \ pci/0000:06:00.0/group_2 tx_share 20gbit $ devlink port function rate add \ pci/0000:06:00.0/group_3 tx_share 20gbit $ devlink port function rate add \ pci/0000:06:00.0/group_4 tx_share 10gbit Weights created in HW for each group are relative to the bigest tx_share value, which is 30gbit: <group_1> 1.0 <group_2> 0.67 <group_3> 0.67 <group_4> 0.33 Assuming link speed is 50 Gbit/sec and each group can sustain such amount of traffic, maximum bandwidth is 50 / (1.0 + 0.67 + 0.67 + 0.33) = ~18.75 Gbit/sec. Normilized bandwidth values for groups: <group_1> 18.75 * 1.0 = 18.75 Gbit/sec <group_2> 18.75 * 0.67 = 12.5 Gbit/sec <group_3> 18.75 * 0.67 = 12.5 Gbit/sec <group_4> 18.75 * 0.33 = 6.25 Gbit/sec If in example above group_1 doesn't produce any traffic, then maximum bandwidth becomes 50 / (0.67 + 0.67 + 0.33) = ~30.0 Gbit/sec. Normalized values: <group_2> 30.0 * 0.67 = 20.0 Gbit/sec <group_3> 30.0 * 0.67 = 20.0 Gbit/sec <group_4> 30.0 * 0.33 = 10.0 Gbit/sec Same normalization applied to each vport in the group. Normalized values are internal, therefore driver provides QoS tracepoints for next events: * vport rate element creation/deletion: * vport rate element configuration; * group rate element creation/deletion; * group rate element configuration. PATCHES OVERVIEW 1 - Moving and isolation of eswitch QoS logic in separate file; 2 - Implement devlink leaf rate object support for vports; 3 - Implement rate groups creation/deletion; 4 - Implement TX rate management for the groups; 5 - Implement parent set for vports; 6 - Eswitch QoS tracepoints. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20Merge tag 'batadv-next-pullrequest-20210819' of ↵David S. Miller1-1/+1
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - update docs about move IRC channel away from freenode, by Sven Eckelmann - Switch to kstrtox.h for kstrtou64, by Sven Eckelmann - Update NULL checks, by Sven Eckelmann (2 patches) - remove remaining skb-copy calls for broadcast packets, by Linus Lüssing ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19net/mlx5: E-switch, Add QoS tracepointsDmytro Linkin1-0/+44
Add tracepoints to log QoS enabling/disabling/configuration for vports and rate groups. Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Huy Nguyen <huyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17bpf: Refactor BPF_PROG_RUN into a functionAndrii Nakryiko1-2/+2
Turn BPF_PROG_RUN into a proper always inlined function. No functional and performance changes are intended, but it makes it much easier to understand what's going on with how BPF programs are actually get executed. It's more obvious what types and callbacks are expected. Also extra () around input parameters can be dropped, as well as `__` variable prefixes intended to avoid naming collisions, which makes the code simpler to read and write. This refactoring also highlighted one extra issue. BPF_PROG_RUN is both a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning BPF_PROG_RUN into a function causes naming conflict compilation error. So rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210815070609.987780-2-andrii@kernel.org
2021-08-16docs: ethtool: Add two link extended substates of bad signal integrityGuangbin Huang1-0/+8
Add documentation for two bad signal integrity substates: ETHTOOL_LINK_EXT_SUBSTATE_BSI_SERDES_REFERENCE_CLOCK_LOST ETHTOOL_LINK_EXT_SUBSTATE_BSI_SERDES_ALOS. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-14mptcp: faster active backup recoveryPaolo Abeni1-0/+12
The msk can use backup subflows to transmit in-sequence data only if there are no other active subflow. On active backup scenario, the MPTCP connection can do forward progress only due to MPTCP retransmissions - rtx can pick backup subflows. This patch introduces a new flag flow MPTCP subflows: if the underlying TCP connection made no progresses for long time, and there are other less problematic subflows available, the given subflow become stale. Stale subflows are not considered active: if all non backup subflows become stale, the MPTCP scheduler can pick backup subflows for plain transmissions. Stale subflows can return in active state, as soon as any reply from the peer is observed. Active backup scenarios can now leverage the available b/w with no restrinction. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-10/+0
Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h 9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp") 9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins") 099fdeda659d ("bnxt_en: Event handler for PPS events") kernel/bpf/helpers.c include/linux/bpf-cgroup.h a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()") c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current") drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c 5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray") 2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer") MAINTAINERS 7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo") 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11devlink: Add new "enable_vnet" generic device paramParav Pandit1-0/+4
Add new device generic parameter to enable/disable creation of VDPA net auxiliary device and associated device functionality in the devlink instance. User who prefers to disable such functionality can disable it using below example. $ devlink dev param set pci/0000:06:00.0 \ name enable_vnet value false cmode driverinit $ devlink dev reload pci/0000:06:00.0 At this point devlink instance do not create auxiliary device for the VDPA net functionality. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11devlink: Add new "enable_rdma" generic device paramParav Pandit1-0/+4
Add new device generic parameter to enable/disable creation of RDMA auxiliary device and associated device functionality in the devlink instance. User who prefers to disable such functionality can disable it using below example. $ devlink dev param set pci/0000:06:00.0 \ name enable_rdma value false cmode driverinit $ devlink dev reload pci/0000:06:00.0 At this point devlink instance do not create auxiliary device for the RDMA functionality. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11devlink: Add new "enable_eth" generic device paramParav Pandit1-0/+4
Add new device generic parameter to enable/disable creation of Ethernet auxiliary device and associated device functionality in the devlink instance. User who prefers to disable such functionality can disable it using below example. $ devlink dev param set pci/0000:06:00.0 \ name enable_eth value false cmode driverinit $ devlink dev reload pci/0000:06:00.0 At this point devlink instance do not create auxiliary device for the Ethernet functionality. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-08batman-adv: Move IRC channel to hackint.orgSven Eckelmann1-1/+1
Due to recent developments around the Freenode.org IRC network, the opinions about the usage of this service shifted dramatically. The majority of the still active users of the #batman channel prefers a move to the hackint.org network. Signed-off-by: Sven Eckelmann <sven@narfation.org>
2021-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski1-10/+0
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Restrict range element expansion in ipset to avoid soft lockup, from Jozsef Kadlecsik. 2) Memleak in error path for nf_conntrack_bridge for IPv4 packets, from Yajun Deng. 3) Simplify conntrack garbage collection strategy to avoid frequent wake-ups, from Florian Westphal. 4) Fix NFNLA_HOOK_FUNCTION_NAME string, do not include module name. 5) Missing chain family netlink attribute in chain description in nfnetlink_hook. 6) Incorrect sequence number on nfnetlink_hook dumps. 7) Use netlink request family in reply message for consistency. 8) Remove offload_pickup sysctl, use conntrack for established state instead, from Florian Westphal. 9) Translate NFPROTO_INET/ingress to NFPROTO_NETDEV/ingress, since NFPROTO_INET is not exposed through nfnetlink_hook. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: nfnetlink_hook: translate inet ingress to netdev netfilter: conntrack: remove offload_pickup sysctl again netfilter: nfnetlink_hook: Use same family as request message netfilter: nfnetlink_hook: use the sequence number of the request message netfilter: nfnetlink_hook: missing chain family netfilter: nfnetlink_hook: strip off module name from hookfn netfilter: conntrack: collect all entries in one cycle netfilter: nf_conntrack_bridge: Fix memory leak when error netfilter: ipset: Limit the maximal range of consecutive elements to add/delete ==================== Link: https://lore.kernel.org/r/20210806151149.6356-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-06netfilter: conntrack: remove offload_pickup sysctl againFlorian Westphal1-10/+0
These two sysctls were added because the hardcoded defaults (2 minutes, tcp, 30 seconds, udp) turned out to be too low for some setups. They appeared in 5.14-rc1 so it should be fine to remove it again. Marcelo convinced me that there should be no difference between a flow that was offloaded vs. a flow that was not wrt. timeout handling. Thus the default is changed to those for TCP established and UDP stream, 5 days and 120 seconds, respectively. Marcelo also suggested to account for the timeout value used for the offloading, this avoids increase beyond the value in the conntrack-sysctl and will also instantly expire the conntrack entry with altered sysctls. Example: nf_conntrack_udp_timeout_stream=60 nf_flowtable_udp_timeout=60 This will remove offloaded udp flows after one minute, rather than two. An earlier version of this patch also cleared the ASSURED bit to allow nf_conntrack to evict the entry via early_drop (i.e., table full). However, it looks like we can safely assume that connection timed out via HW is still in established state, so this isn't needed. Quoting Oz: [..] the hardware sends all packets with a set FIN flags to sw. [..] Connections that are aged in hardware are expected to be in the established state. In case it turns out that back-to-sw-path transition can occur for 'dodgy' connections too (e.g., one side disappeared while software-path would have been in RETRANS timeout), we can adjust this later. Cc: Oz Shlomo <ozsh@nvidia.com> Cc: Paul Blakey <paulb@nvidia.com> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-2/+21
Build failure in drivers/net/wwan/mhi_wwan_mbim.c: add missing parameter (0, assuming we don't want buffer pre-alloc). Conflict in drivers/net/dsa/sja1105/sja1105_main.c between: 589918df9322 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too") 0fac6aa098ed ("net: dsa: sja1105: delete the best_effort_vlan_filtering mode") Follow the instructions from the commit message of the former commit - removed the if conditions. When looking at commit 589918df9322 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too") note that the mask_iotag fields get removed by the following patch. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-04docs: networking: netdevsim rulesJakub Kicinski1-0/+17
There are aspects of netdevsim which are commonly misunderstood and pointed out in review. Cong suggest we document them. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-03bonding: add new option lacp_activeHangbin Liu1-0/+12
Add an option lacp_active, which is similar with team's runner.active. This option specifies whether to send LACPDU frames periodically. If set on, the LACPDU frames are sent along with the configured lacp_rate setting. If set off, the LACPDU frames acts as "speak when spoken to". Note, the LACPDU state frames still will be sent when init or unbind port. v2: remove module parameter Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>