summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2018-12-03rhashtable: detect when object movement between tables might have ↵NeilBrown2-11/+31
invalidated a lookup Some users of rhashtables might need to move an object from one table to another - this appears to be the reason for the incomplete usage of NULLS markers. To support these, we store a unique NULLS_MARKER at the end of each chain, and when a search fails to find a match, we check if the NULLS marker found was the expected one. If not, the search may not have examined all objects in the target bucket, so it is repeated. The unique NULLS_MARKER is derived from the address of the head of the chain. As this cannot be derived at load-time the static rhnull in rht_bucket_nested() needs to be initialised at run time. Any caller of a lookup function must still be prepared for the possibility that the object returned is in a different table - it might have been there for some time. Note that this does NOT provide support for other uses of NULLS_MARKERs such as allocating with SLAB_TYPESAFE_BY_RCU or changing the key of an object and re-inserting it in the same table. These could only be done safely if new objects were inserted at the *start* of a hash chain, and that is not currently the case. Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03Merge branch 'hns3-ethtool-dump'David S. Miller4-5/+345
Salil Mehta says: ==================== Adds VF/PF PCIe reg dump(ethtool -d) support to HNS3 driver This patchset adds VF/PF PCIe register dump support to HNS3 VF and PF driver using "ethtool -d" command. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net: hns3: Adds support to dump(using ethool-d) PCIe regs in HNS3 PF driverJian Shen2-5/+171
This patch adds support to dump PF PCIe registers using ethtool -d for HNS3 PF Driver. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net: hns3: Support "ethtool -d" for HNS3 VF driverJian Shen2-0/+174
This patch adds "ethtool -d" support for HNS3 VF Driver. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net: phy: improve generic EEE ethtool functionsHeiner Kallweit1-5/+10
So far the two functions consider neither member eee_enabled nor eee_active. Therefore network drivers have to do this in some kind of glue code. I think this can be avoided. Getting EEE parameters: When not advertising any EEE mode, we can't consider EEE to be enabled. Therefore interpret "EEE enabled" as "we advertise at least one EEE mode". It's similar with "EEE active": interpret it as "EEE modes advertised by both link partner have at least one mode in common". Setting EEE parameters: If eee_enabled isn't set, don't advertise any EEE mode and restart aneg if needed to switch off EEE. If eee_enabled is set and data->advertised is empty (e.g. because EEE was disabled), advertise everything we support as default. This way EEE can easily switched on/off by doing ethtool --set-eee <if> eee on/off, w/o any additional parameters. The changes to both functions shouldn't break any existing user. Once the changes have been applied, at least some users can be simplified. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03Merge branch 'VXLAN-underlay-VRF'David S. Miller8-9/+228
Alexis Bauvin says: ==================== net: Add VRF support for VXLAN underlay v6 -> v7: - proper locking for device in udp_tunnel following Sabrina Dubroca's advice v5 -> v6: - remove automatic rebinding patch following Roopa Prabhu's advice v4 -> v5: - move test script to its own patch (6/6) - add schematic for test script - apply David Ahern comments to the test script v3 -> v4: - rename vxlan_is_in_l3mdev_chain to netdev_is_upper master - move it to net/core/dev.c - make it return bool instead of int - check if remote_ifindex is zero before resolving the l3mdev - add testing script v2 -> v3: - fix build when CONFIG_NET_IPV6 is off - fix build "unused l3mdev_master_upper_ifindex_by_index" build error with some configs v1 -> v2: - move vxlan_get_l3mdev from vxlan driver to l3mdev driver as l3mdev_master_upper_ifindex_by_index - vxlan: rename variables named l3mdev_ifindex to ifindex v0 -> v1: - fix typos We are trying to isolate the VXLAN traffic from different VMs with VRF as shown in the schemas below: +-------------------------+ +----------------------------+ | +----------+ | | +------------+ | | | | | | | | | | | tap-red | | | | tap-blue | | | | | | | | | | | +----+-----+ | | +-----+------+ | | | | | | | | | | | | | | +----+---+ | | +----+----+ | | | | | | | | | | | br-red | | | | br-blue | | | | | | | | | | | +----+---+ | | +----+----+ | | | | | | | | | | | | | | | | | | | | +----+--------+ | | +--------------+ | | | | | | | | | | | vxlan-red | | | | vxlan-blue | | | | | | | | | | | +------+------+ | | +-------+------+ | | | | | | | | | VRF | | | VRF | | | red | | | blue | +-------------------------+ +----------------------------+ | | | | +---------------------------------------------------------+ | | | | | | | | | | +--------------+ | | | | | | | | | +---------+ eth0.2030 +---------+ | | | 10.0.0.1/24 | | | +-----+--------+ VRF | | | green| +---------------------------------------------------------+ | | +----+---+ | | | eth0 | | | +--------+ iproute2 commands to reproduce the setup: ip link add green type vrf table 1 ip link set green up ip link add eth0.2030 link eth0 type vlan id 2030 ip link set eth0.2030 master green ip addr add 10.0.0.1/24 dev eth0.2030 ip link set eth0.2030 up ip link add blue type vrf table 2 ip link set blue up ip link add br-blue type bridge ip link set br-blue master blue ip link set br-blue up ip link add vxlan-blue type vxlan id 2 local 10.0.0.1 dev eth0.2030 \ port 4789 ip link set vxlan-blue master br-blue ip link set vxlan-blue up ip link set tap-blue master br-blue ip link set tap-blue up ip link add red type vrf table 3 ip link set red up ip link add br-red type bridge ip link set br-red master red ip link set br-red up ip link add vxlan-red type vxlan id 3 local 10.0.0.1 dev eth0.2030 \ port 4789 ip link set vxlan-red master br-red ip link set vxlan-red up ip link set tap-red master br-red ip link set tap-red up We faced some issue in the datapath, here are the details: * Egress traffic: The vxlan packets are sent directly to the default VRF because it's where the socket is bound, therefore the traffic has a default route via eth0. the workaround is to force this traffic to VRF green with ip rules. * Ingress traffic: When receiving the traffic on eth0.2030 the vxlan socket is unreachable from VRF green. The workaround is to enable *udp_l3mdev_accept* sysctl, but this breaks isolation between overlay and underlay: packets sent from blue or red by e.g. a guest VM will be accepted by the socket, allowing injection of VXLAN packets from the overlay. This patch series fixes the issues describe above by allowing VXLAN socket to be bound to a specific VRF device therefore looking up in the correct table. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03test/net: Add script for VXLAN underlay in a VRFAlexis Bauvin2-1/+130
This script tests the support of a VXLAN underlay in a non-default VRF. It does so by simulating two hypervisors and two VMs, an extended L2 between the VMs with the hypervisors as VTEPs with the underlay in a VRF, and finally by pinging the two VMs. It also tests that moving the underlay from a VRF to another works when down/up the VXLAN interface. Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03vxlan: add support for underlay in non-default VRFAlexis Bauvin1-8/+24
Creating a VXLAN device with is underlay in the non-default VRF makes egress route lookup fail or incorrect since it will resolve in the default VRF, and ingress fail because the socket listens in the default VRF. This patch binds the underlying UDP tunnel socket to the l3mdev of the lower device of the VXLAN device. This will listen in the proper VRF and output traffic from said l3mdev, matching l3mdev routing rules and looking up the correct routing table. When the VXLAN device does not have a lower device, or the lower device is in the default VRF, the socket will not be bound to any interface, keeping the previous behaviour. The underlay l3mdev is deduced from the VXLAN lower device (IFLA_VXLAN_LINK). +----------+ +---------+ | | | | | vrf-blue | | vrf-red | | | | | +----+-----+ +----+----+ | | | | +----+-----+ +----+----+ | | | | | br-blue | | br-red | | | | | +----+-----+ +---+-+---+ | | | | +-----+ +-----+ | | | +----+-----+ +------+----+ +----+----+ | | lower device | | | | | eth0 | <- - - - - - - | vxlan-red | | tap-red | (... more taps) | | | | | | +----------+ +-----------+ +---------+ Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03l3mdev: add function to retreive upper masterAlexis Bauvin2-0/+40
Existing functions to retreive the l3mdev of a device did not walk the master chain to find the upper master. This patch adds a function to find the l3mdev, even indirect through e.g. a bridge: +----------+ | | | vrf-blue | | | +----+-----+ | | +----+-----+ | | | br-blue | | | +----+-----+ | | +----+-----+ | | | eth0 | | | +----------+ This will properly resolve the l3mdev of eth0 to vrf-blue. Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03udp_tunnel: add config option to bind to a deviceAlexis Bauvin3-0/+34
UDP tunnel sockets are always opened unbound to a specific device. This patch allow the socket to be bound on a custom device, which incidentally makes UDP tunnels VRF-aware if binding to an l3mdev. Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03Merge branch 'mlxsw-fw_load_policy'David S. Miller8-14/+127
Ido Schimmel says: ==================== mlxsw: Add 'fw_load_policy' devlink parameter Shalom says: Currently, drivers do not have the ability to control the firmware loading policy and they always use their own fixed policy. This prevents drivers from running the device with a different firmware version for testing and/or debugging purposes. For example, testing a firmware bug fix. For these situations, the new devlink generic parameter, 'fw_load_policy', gives the ability to control this option and allows drivers to run with a different firmware version than required by the driver. Patch #1 adds the new parameter to devlink. The other two patches, #2 and #3, add support for this parameter in the mlxsw driver. Example: # Query the devlink parameters supported by the device $ devlink dev param show pci/0000:03:00.0: name fw_load_policy type generic values: cmode driverinit value driver # Flash new firmware using ethtool $ ethtool -f swp1 mellanox/mlxsw_spectrum-13.1703.4.mfa2 # Toggle parameter $ devlink dev param set pci/0000:03:00.0 name fw_load_policy value flash cmode driverinit # devlink reset $ devlink dev reload pci/0000:03:00.0 # Query firmware version to show changes took affect $ ethtool -i swp1 driver: mlxsw_spectrum version: 1.0 firmware-version: 13.1703.4 expansion-rom-version: bus-info: 0000:03:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no iproute2 patches available here: https://github.com/tshalom/iproute2-next v2: * Change 'fw_version_check' to 'fw_load_policy' with values 'driver' and 'flash' (Jakub) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03mlxsw: spectrum: Load firmware version based on devlink parameterShalom Toledo3-0/+75
Load firmware version based on 'fw_load_policy' devlink parameter. The driver supports these two options: * DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DRIVER (0) Default, load firmware version preferred by the driver * DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH (1) Load firmware currently stored in flash The second option, 'flash', allow the device to run with different firmware version than preferred by the driver for testing and/or debugging purposes. For example, testing a firmware bug fix. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03mlxsw: core: Reset firmware after flash during driver initializationShalom Toledo2-14/+29
After flashing new firmware during the driver initialization flow (reload or not), the driver should do a firmware reset when it gets -EAGAIN in order to load the new one. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03devlink: Add 'fw_load_policy' generic parameterShalom Toledo4-0/+23
Many drivers load the device's firmware image during the initialization flow either from the flash or from the disk. Currently this option is not controlled by the user and the driver decides from where to load the firmware image. 'fw_load_policy' gives the ability to control this option which allows the user to choose between different loading policies supported by the driver. This parameter can be useful while testing and/or debugging the device. For example, testing a firmware bug fix. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net: phy: don't allow __set_phy_supported to add unsupported modesHeiner Kallweit1-28/+14
Currently __set_phy_supported allows to add modes w/o checking whether the PHY supports them. This is wrong, it should never add modes but only remove modes we don't want to support. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30net: usb: aqc111: Initialize wol_cfg with memset in aqc111_suspendNathan Chancellor1-1/+3
Clang warns: drivers/net/usb/aqc111.c:1326:37: warning: suggest braces around initialization of subobject [-Wmissing-braces] struct aqc111_wol_cfg wol_cfg = { 0 }; ^ {} 1 warning generated. Use memset to initialize the object to take compiler instrumentation out of the equation. Fixes: e58ba4544c77 ("net: usb: aqc111: Add support for wake on LAN by MAGIC packet") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30net: qualcomm: rmnet: Remove set but not used variable 'cmd'YueHaibing1-2/+0
Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c: In function 'rmnet_map_do_flow_control': drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c:23:36: warning: variable 'cmd' set but not used [-Wunused-but-set-variable] struct rmnet_map_control_command *cmd; 'cmd' not used anymore now, should also be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30tun: implement carrier changeNicolas Dichtel2-1/+27
The userspace may need to control the carrier state. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30net: reorder flowi_common fields to avoid holesPaolo Abeni1-1/+1
the flowi* structures are used and memsetted by server functions in critical path. Currently flowi_common has a couple of holes that we can eliminate reordering the struct fields. As a side effect, both flowi4 and flowi6 shrink by 8 bytes. Before: pahole -EC flowi_common struct flowi_common { // ... /* size: 40, cachelines: 1, members: 10 */ /* sum members: 32, holes: 1, sum holes: 4 */ /* padding: 4 */ /* last cacheline: 40 bytes */ }; pahole -EC flowi6 struct flowi6 { // ... /* size: 88, cachelines: 2, members: 6 */ /* padding: 4 */ /* last cacheline: 24 bytes */ }; pahole -EC flowi4 struct flowi4 { // ... /* size: 56, cachelines: 1, members: 4 */ /* padding: 4 */ /* last cacheline: 56 bytes */ }; After: struct flowi_common { // ... /* size: 32, cachelines: 1, members: 10 */ /* last cacheline: 32 bytes */ }; struct flowi6 { // ... /* size: 80, cachelines: 2, members: 6 */ /* padding: 4 */ /* last cacheline: 16 bytes */ }; struct flowi4 { // ... /* size: 48, cachelines: 1, members: 4 */ /* padding: 4 */ /* last cacheline: 48 bytes */ }; Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30Merge branch 'mlxsw-Add-VxLAN-support-with-VLAN-aware-bridges'David S. Miller8-63/+1441
Ido Schimmel says: ==================== mlxsw: Add VxLAN support with VLAN-aware bridges Commit 53e50a6ec24d ("Merge branch 'mlxsw-Add-VxLAN-support'") added mlxsw support for VxLAN when the VxLAN device was enslaved to VLAN-unaware bridges. This patchset extends mlxsw to also support VxLAN with VLAN-aware bridges. With VLAN-aware bridges, the VxLAN device's VNI is mapped to the VLAN that is configured as 'pvid untagged' on the corresponding bridge port. To prevent ambiguity, mlxsw forbids configurations in which the same VLAN is configured as 'pvid untagged' on multiple VxLAN devices. Patches #1-#2 add the necessary APIs in mlxsw and the bridge driver. Patches #3-#4 perform small refactoring in order to prepare mlxsw for VLAN-aware support. Patch #5 finally enables the enslavement of VxLAN devices to a VLAN-aware bridge. Among other things, it extends mlxsw to handle switchdev notifications about VLAN add / delete on a VxLAN device enslaved to an offloaded VLAN-aware bridge. Patches #6-#8 add selftests to test the new functionality. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30selftests: forwarding: Add VxLAN test with a VLAN-aware bridgeIdo Schimmel2-0/+800
The test is very similar to its VLAN-unaware counterpart (vxlan_bridge_1d.sh), but instead of using multiple VLAN-unaware bridges, a single VLAN-aware bridge is used with multiple VLANs. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30selftests: mlxsw: Add a test for VxLAN configuration with a VLAN-aware bridgeIdo Schimmel1-1/+203
Extend the existing VLAN-unaware tests with their VLAN-aware counterparts. This includes sanitization of invalid configuration and offload indication on the local route performing decapsulation and the FDB entries perform encapsulation. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30selftests: mlxsw: Consider VLAN-aware bridges as validIdo Schimmel1-1/+1
Previous patches add the ability to work with VLAN-aware bridges and VxLAN devices, so make sure such configuration no longer fails. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridgesIdo Schimmel3-14/+404
Commit 1c30d1836aeb ("mlxsw: spectrum: Enable VxLAN enslavement to bridges") enabled the enslavement of VxLAN devices to bridges that have mlxsw ports (or their upper) as slaves. This patch extends mlxsw to also support VLAN-aware bridges. The patch is similar in nature to mentioned commit, but there is one major difference. With VLAN-aware bridges, the VxLAN device's VNI is mapped to the VLAN that is configured as PVID and egress untagged on the bridge port. Therefore, the driver is extended to listen to VLAN configuration on VxLAN devices of interest and enable / disable NVE encapsulation on the corresponding 802.1Q FIDs. To prevent ambiguity, the driver makes sure that a given VLAN is not configured as PVID and egress untagged on multiple VxLAN devices. This sanitization takes place both when a port is enslaved to a bridge with existing VxLAN devices and when a VLAN is added to / removed from a VxLAN device of interest. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30mlxsw: spectrum_switchdev: Prepare function for VLAN-aware bridgesIdo Schimmel3-9/+11
The vxlan_join() function resolves the FID on which the VNI should be set and then sets the VNI. Currently, the FID is simply resolved according to the ifindex of the bridge device to which the VxLAN device is enslaved. This works because only VLAN-unaware bridges are supported. With VLAN-aware bridges the FID would need to be resolved based on the VLAN to which the VNI is mapped to. Add the VLAN ID to the argument list of the function. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30mlxsw: spectrum_switchdev: Unify VxLAN leave functionIdo Schimmel3-38/+9
The function mlxsw_sp_bridge_vxlan_leave() is currently split between VLAN-aware and VLAN-unaware bridges, but actually both types can use the same function. The function needs to resolve the FID that corresponds to the VxLAN device and disable NVE encapsulation on it. Instead of looking up the FID differently for VLAN-aware and VLAN-unaware bridges, we can always use the VxLAN's device VNI. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30mlxsw: spectrum_fid: Add API to lookup 802.1Q FIDs without creating themIdo Schimmel3-2/+11
In a similar fashion to commit 564c6d727aca ("mlxsw: spectrum_fid: Add APIs to lookup FID without creating it"), add a corresponding API to lookup 802.1Q FIDs. This is a prerequisite to VxLAN support with VLAN-aware bridges and will allow us to resolve a 802.1Q FID by its VLAN when an FDB entry is added on the bridge port of the VxLAN device. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30net: bridge: Extend br_vlan_get_pvid() for bridge portsIdo Schimmel1-1/+5
Currently, the function only works for the bridge device itself, but subsequent patches will need to be able to query the PVID of a given bridge port, so extend the function. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30Merge branch 'qed-Doorbell-overflow-recovery'David S. Miller13-47/+756
Ariel Elior says: ==================== qed*: Doorbell overflow recovery Doorbell Overflow If sufficient CPU cores will send doorbells at a sufficiently high rate, they can cause an overflow in the doorbell queue block message fifo. When fill level reaches maximum, the device stops accepting all doorbells from that PF until a recovery procedure has taken place. Doorbell Overflow Recovery The recovery procedure basically means resending the last doorbell for every doorbelling entity. A doorbelling entity is anything which may send doorbells: L2 tx ring, rdma sq/rq/cq, light l2, vf l2 tx ring, spq, etc. This relies on the design assumption that all doorbells are aggregative, so last doorbell carries the information of all previous doorbells. APIs All doorbelling entities need to register with the mechanism before sending doorbells. The registration entails providing the doorbell address the entity would be using, and a virtual address where last doorbell data can be found. Typically fastpath structures already have this construct. Executing the recovery procedure Handling the attentions, iterating over all the registered entities and resending their doorbells, is all handled within qed core module. Relevance All doorbelling entities in all protocols need to register with the mechanism, via the new APIs. Technically this is quite simple (just call the API). Some protocol fastpath implementation may not have the doorbell data stored anywhere (compute it from scratch every time) and will have to add such a place. This is rare and is also better practice (save some cycles on the fastpath). Performance Penalty No performance penalty should incur as a result of this feature. If anything performance can improve by avoiding recalcualtion of doorbell data everytime doorbell is sent (in some flows). Add the database used to register doorbelling entities, and APIs for adding and deleting entries, and logic for traversing the database and doorbelling once on behalf of all entities. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30qede: Register l2 queues with doorbell overflow recovery mechanismAriel Elior1-0/+9
All L2 queues funnel through this flow, so this would cover the regular RSS queues, as well queues created for VFs, mqos queues, xdp queues, etc. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30qed: Expose the doorbell overflow recovery mechanism to the protocol driversAriel Elior2-0/+29
Most of the doorbelling entities are outside of the core module. L2 queues, Roce queues, iscsi and fcoe all need to register. Make the APIs available for these drivers. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30qed: Register light L2 queues with doorbell overflow recovery mechanismAriel Elior2-10/+21
Light L2 queues are doorbelling entities. Modify the implementation to keep the doorbell data necessary for doorbelling in well known location instead of recomputing every time. Register the LL2 queue with doorbell recovery mechanism. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30qed: Register slowpath queue doorbell with doorbell overflow recovery mechanismAriel Elior2-13/+38
Slow path queue is a doorbelling entity. Register it with the overflow mechanism. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30qed: Use the doorbell overflow recovery mechanism in case of doorbell overflowAriel Elior6-24/+280
In case of an attention from the doorbell queue block, analyze the HW indications. In case of a doorbell overflow, execute a doorbell recovery. Since there can be spurious indications (race conditions between multiple PFs), schedule a periodic task for checking whether a doorbell overflow may have been missed. After a set time with no indications, terminate the periodic task. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30qed: Add doorbell overflow recovery mechanismAriel Elior4-0/+379
Add the database used to register doorbelling entities, and APIs for adding and deleting entries, and logic for traversing the database and doorbelling once on behalf of all entities. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30Merge branch 'rtnetlink-avoid-a-warning-in-rtnl_newlink'David S. Miller1-161/+170
Jakub Kicinski says: ==================== rtnetlink: avoid a warning in rtnl_newlink() I've been hoping for some time that someone more competent would fix the stack frame size warning in rtnl_newlink(), but looks like I'll have to take a stab at it myself :) That's the only warning I see in most of my builds. First patch refactors away a somewhat surprising if (1) code block. Reindentation will most likely cause cherry-pick problems but OTOH rtnl_newlink() doesn't seem to be changed often, so perhaps we can risk it in the name of cleaner code? Second patch fixes the warning in simplest possible way. I was pondering if there is any more clever solution, but I can't see it.. rtnl_newlink() is quite long with a lot of possible execution paths so doing memory allocations half way through leads to very ugly results. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30rtnetlink: avoid frame size warning in rtnl_newlink()Jakub Kicinski1-3/+17
Standard kernel compilation produces the following warning: net/core/rtnetlink.c: In function ‘rtnl_newlink’: net/core/rtnetlink.c:3232:1: warning: the frame size of 1288 bytes is larger than 1024 bytes [-Wframe-larger-than=] } ^ This should not really be an issue, as rtnl_newlink() stack is generally quite shallow. Fix the warning by allocating attributes with kmalloc() in a wrapper and passing it down to rtnl_newlink(), avoiding complexities on error paths. Alternatively we could kmalloc() some structure within rtnl_newlink(), slave attributes look like a good candidate. In practice it adds to already rather high complexity and length of the function. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30rtnetlink: remove a level of indentation in rtnl_newlink()Jakub Kicinski1-159/+154
rtnl_newlink() used to create VLAs based on link kind. Since commit ccf8dbcd062a ("rtnetlink: Remove VLA usage") statically sized array is created on the stack, so there is no more use for a separate code block that used to be the VLA's live range. While at it christmas tree the variables. Note that there is a goto-based retry so to be on the safe side the variables can no longer be initialized in place. It doesn't seem to matter, logically, but why make the code harder to read.. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30Merge branch 'nfp-update-TX-path-to-enable-repr-offloads'David S. Miller8-37/+213
Jakub Kicinski says: ==================== nfp: update TX path to enable repr offloads This set starts with three micro optimizations to the TX path. The improvement is measurable, but below 1% of CPU utilization. Patches 4 - 9 add basic TX offloads to representor devices, like checksum offload or TSO, and remove the unnecessary TX lock and Qdisc (our representors are software constructs on top of the PF). The last 2 patches add more info to error messages - id of command which failed and exact location of incorrect TLVs, very useful for debugging. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30nfp: report more info when reconfiguration failsJakub Kicinski2-2/+9
FW reconfiguration timeouts are a common indicator of FW trouble. To make debugging easier print requested update and control word when reconfiguration fails. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30nfp: add offset to all TLV parsing errorsJakub Kicinski1-8/+8
When troubleshooting incorrect FW capabilities it's useful to know where the faulty TLV is located. Add offset to all errors messages. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30nfp: add offloads on representorsJakub Kicinski5-0/+143
FW/HW can generally support the standard networking offloads on representors without any trouble. Add the ability for FW to advertise which features should be available on representors. Because representors are muxed on top of the vNIC we need to listen on feature changes of their lower devices, and update their features appropriately. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30nfp: add locking around representor changesJakub Kicinski3-0/+8
Up until now we never needed to keep a networking locks around representors accesses, we only accessed them when device was reconfigured (under nfp pf->lock) or on fast path (under RCU). Now we want to be able to iterate over all representors during notifications, so make sure representor assignment is done under RTNL lock. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30nfp: run don't require Qdiscs on representor netdevsJakub Kicinski1-0/+1
Our representors are software devices built on top of the PF vNIC, the queuing should only happen at the vNIC netdevice. Allow representors to run qdisc-less. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30nfp: run representor TX locklesslyJakub Kicinski1-0/+2
Our representors are software devices built on top of the PF vNIC, the only state they have are per-cpu stats, so make the TX run locklessly. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30nfp: avoid oversized TSO headers with metadata prependJakub Kicinski1-1/+4
In preparation for TSO over representors make sure the port id prepend will always fit in the frame. The current max header length is 255, which is ample, so assume worst case scenario of 8 byte prepend and save ourselves the conditionals. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30nfp: correct descriptor offsets in presence of metadataJakub Kicinski1-8/+12
The TSO-related offsets in the descriptor should not include the length of the prepended metadata. Adjust them. Note that this could not have caused issues in the past as we don't support TSO with metadata prepend as of this patch. Signed-off-by: Michael Rapson <michael.rapson@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30nfp: move queue variable initJakub Kicinski1-1/+3
nd_q is only used at the very end of nfp_net_tx(), there is no need to initialize it early. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30nfp: move temporary variables in nfp_net_tx_complete()Jakub Kicinski1-14/+17
Move temporary variables in scope of the loop in nfp_net_tx_complete(), and add a temp for txbuf software structure. This saves us 0.2% of CPU. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30nfp: copy only the relevant part of the TX descriptor for fragsJakub Kicinski2-5/+8
Chained descriptors for fragments need to duplicate all the descriptor fields of the skb head, so we copy the descriptor and then modify the relevant fields. This is wasteful, because the top half of the descriptor will get overwritten entirely while the bottom half is not modified at all. Copy only the bottom half. This saves us 0.3% of CPU in a GSO test. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>