summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)AuthorFilesLines
2020-05-28Merge tag 'mlx5-updates-2020-05-26' of ↵David S. Miller27-1011/+1965
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-05-26 Updates highlights: 1) From Vu Pham (8): Support VM traffics failover with bonded VF representors and e-switch egress/ingress ACLs This series introduce the support for Virtual Machine running I/O traffic over direct/fast VF path and failing over to slower paravirtualized path using the following features: __________________________________ | VM _________________ | | |FAILOVER device | | | |________________| | | | | | ____|_____ | | | | | | ______ |___ ____|_______ | | | VF PT | |VIRTIO-NET | | | | device | | device | | | |_________| |___________| | |___________|______________|________| | | | HYPERVISOR | | ____|______ | | macvtap | | |virtio BE | | |___________| | | | ____|_____ | |host VF | | |_________| | | _____|______ _____|_____ | PT VF | | host VF | |representor| |representor| |___________| |___________| \ / \ / \ / \ / _________________ \_______/ | | _______|________ | V-SWITCH | |VF representors |________________| (OVS) | | bond | |________________| |________________| | ________|________ | Uplink | | representor | |_________________| Summary: -------- Problem statement: ------------------ Currently in above topology, when netfailover device is configured using VFs and eswitch VF representors, and when traffic fails over to stand-by VF which is exposed using macvtap device to guest VM, eswitch fails to switch the traffic to the stand-by VF representor. This occurs because there is no knowledge at eswitch level of the stand-by representor device. Solution: --------- Using standard bonding driver, a bond netdevice is created over VF representor device which is used for offloading tc rules. Two VF representors are bonded together, one for the passthrough VF device and another one for the stand-by VF device. With this solution, mlx5 driver listens to the failover events occuring at the bond device level to failover traffic to either of the active VF representor of the bond. a. VM with netfailover device of VF pass-thru (PT) device and virtio-net paravirtualized device with same MAC-address to handle failover traffics at VM level. b. Host bond is active-standby mode, with the lower devices being the VM VF PT representor, and the representor of the 2nd VF to handle failover traffics at Hypervisor/V-Switch OVS level. - During the steady state (fast datapath): set the bond active device to be the VM PT VF representor. - During failover: apply bond failover to the second VF representor device which connects to the VM non-accelerated path. c. E-Switch ingress/egress ACL tables to support failover traffics at E-Switch level I. E-Switch egress ACL with forward-to-vport rule: - By default, eswitch vport egress acl forward packets to its counterpart NIC vport. - During port failover, the egress acl forward-to-vport rule will be added to e-switch vport of passive/in-active slave VF representor to forward packets to other e-switch vport ie. the active slave representor's e-switch vport to handle egress "failover" traffics. - Using lower change netdev event to detect a representor is a lower dev (slave) of bond and becomes active, adding egress acl forward-to-vport rule of all other slave netdevs to forward to this representor's vport. - Using upper change netdev event to detect a representor unslaving from bond device to delete its vport's egress acl forward-to-vport rule. II. E-Switch ingress ACL metadata reg_c for match - Bonded representors' vorts sharing tc block have the same root ingress acl table and a unique metadata for match. - Traffics from both representors's vports will be tagged with same unique metadata reg_c. - Using upper change netdev event to detect a representor enslaving/unslaving from bond device to setup shared root ingress acl and unique metadata. 2) From Alex Vesker (2): Slpit RX and TX lock for parallel rule insertion in software steering 3) Eli Britstein (2): Optimize performance for IPv4/IPv6 ethertype use the HW ip_version register rather than parsing eth frames for ethertype. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net/mlx5: DR, Split RX and TX lock for parallel insertionAlex Vesker5-35/+56
Change the locking flow to support RX and TX locks, splitting the single lock to two will allow inserting rules in parallel for RX and TX parts of the FDB. Locking the dr_domain will be done by locking the RX domain and the TX domain locks, this is mostly used for control operations on the dr_domain. When inserting rules for RX or TX the single nic_doamin RX or TX lock will be used. Splitting the lock is safe since RX and TX domains are logically separated from each other, shared objects such the send-ring and memory pool are protected by locks. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27net/mlx5: DR, Add a spinlock to protect the send ringAlex Vesker2-4/+10
Adding this lock will allow writing steering entries without locking the dr_domain and allow parallel insertion. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27net/mlx5e: Optimize performance for IPv4/IPv6 ethertypeEli Britstein5-60/+85
The HW is optimized for IPv4/IPv6. For such cases, pending capability, avoid matching on ethertype, and use ip_version field instead. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27net/mlx5e: Helper function to set ethertypeEli Britstein4-16/+27
Set ethertype match in a helper function as a pre-step towards optimizing it. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27net/mlx5: Add missing mutex destroyParav Pandit1-2/+14
Add mutex destroy calls to balance with mutex_init() done in the init path. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27net/mlx5e: Use change upper event to setup representors' bond_metadataVu Pham1-8/+14
Use change upper event to detect slave representor from enslaving/unslaving to/from lag device. On enslaving event, call mlx5_enslave_rep() API to create, add this slave representor shadow entry to the slaves list of bond_metadata structure representing master lag device and use its metadata to setup ingress acl metadata header. On unslaving event, resetting the vport of unslaved representor to use its default ingress/egress acls and rx rules with its default_metadata. The last slave will free the shared bond_metadata and its unique metadata. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27net/mlx5e: Slave representors sharing unique metadata for matchVu Pham3-8/+80
Bonded slave representors' vports must share a unique metadata for match. On enslaving event of slave representor to lag device, allocate new unique "bond_metadata" for match if this is the first slave. The subsequent enslaved representors will share the same unique "bond_metadata". On unslaving event of slave representor, reset the slave representor's vport to use its own default metadata. Replace ingress acl and rx rules of the slave representors' vports using new vport->bond_metadata. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27net/mlx5: E-Switch, Alloc and free unique metadata for matchVu Pham5-32/+103
Introduce infrastructure to create unique metadata for match for vport without depending on vport_num. Vport uses its default metadata for match in standalone configuration but will share a different unique "bond_metadata" for match with other vports in bond configuration. Using ida to generate unique metadata for match for vports in default and bond configurations. Introduce APIs to generate, free metadata for match. Introduce APIs to set vport's bond_metadata and replace its ingress acl rules with bond_metatada. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27net/mlx5e: Add bond_metadata and its slave entriesVu Pham2-0/+133
Adding bond_metadata and its slave entries to represent a lag device and its slaves VF representors. Bond_metadata structure includes a unique metadata shared by slaves VF respresentors, and a list of slaves representors slave entries. On enslaving event, create a bond_metadata structure representing the upper lag device of this slave representor if it has not been created yet. Create and add entry for the slave representor to the slaves list. On unslaving event, free the slave entry of the slave representor. On the last unslave event, free the bond_metadata structure and its resources. Introduce APIs to create and remove bond_metadata and its resources, enslave and unslave VF representor slave entries. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27net/mlx5e: Offload flow rules to active lower representorOr Gerlitz1-9/+26
When a bond device is created over one or more non uplink representors, and when a flow rule is offloaded to such bond device, offload a rule to the active lower device. Assuming that this is active-backup lag, the rules should be offloaded to the active lower device which is the representor of the direct path (not the failover). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27net/mlx5e: Support tc block sharing for representorsVu Pham1-0/+23
Currently offloading a rule over a tc block shared by multiple representors fails because an e-switch global hashtable to keep the mapping from tc cookies to mlx5e flow instances is used, and tc block sharing offloads the same rule/cookie multiple times, each time for different representor sharing the tc block. Changing the implementation and behavior by acknowledging and returning success if the same rule/cookie is offloaded again to other slave representor sharing the tc block by setting, checking and comparing the netdev that added the rule first. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27net/mlx5e: Use netdev events to set/del egress acl forward-to-vport ruleOr Gerlitz4-4/+175
Register a notifier block to handle netdev events for bond device of non-uplink representors to support eswitch vports bonding. When a non-uplink representor is a lower dev (slave) of bond and becomes active, adding egress acl forward-to-vport rule of all slave netdevs (active + standby) to forward to this representor's vport. Use change lower netdev event to do this. Use change upper event to detect slave representor unslaved from lag device to delete its vport egress acl forward rule if any. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27net/mlx5: E-Switch, Introduce APIs to enable egress acl forward-to-vport ruleVu Pham3-24/+187
By default, e-switch vport's egress acl just forward packets to its counterpart NIC vport using existing egress acl table. During port failover in bonding scenario where two VFs representors are bonded, the egress acl forward-to-vport rule will be added to the existing egress acl table of e-switch vport of passive/inactive slave representor to forward packets to other NIC vport ie. the active slave representor's NIC vport to handle egress "failover" traffic. Enable egress acl and have APIs to create and destroy egress acl forward-to-vport rule and group. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27net/mlx5: E-Switch, Refactor eswitch ingress acl codesVu Pham10-583/+619
Restructure the eswitch ingress acl codes into eswitch directory and different files: . Acl ingress helper functions to acl_helper.c/h . Acl ingress functions used in offloads mode to acl_ingress_ofld.c . Acl ingress functions used in legacy mode to acl_ingress_lgy.c This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com>
2020-05-27net/mlx5: E-Switch, Refactor eswitch egress acl codesVu Pham10-275/+462
Refactor the egress acl codes so that offloads and legacy modes can configure specifically their own needs of egress acl table, groups and rules. While at it, restructure the eswitch egress acl codes into eswitch directory and different files: . Acl egress helper functions to acl_helper.c/h . Acl egress functions used in offloads mode to acl_egress_ofld.c . Acl egress functions used in legacy mode to acl_egress_lgy.c This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27mlxsw: spectrum_router: remove redundant initialization of pointer br_devColin Ian King1-1/+1
The pointer br_dev is being initialized with a value that is never read and is being updated with a new value later on. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: spectrum_router: Allow programming link-local prefix routesIdo Schimmel1-2/+4
The device has a trap for IPv6 packets that need be routed and have a unicast link-local destination IP (i.e., fe80::/10). This allows mlxsw to ignore link-local routes, as the packets will be trapped to the CPU in any case. However, since link-local routes are not programmed, it is possible for routed packets to hit the default route which might also be programmed to trap packets. This means that packets with a link-local destination IP might be trapped for the wrong reason. To overcome this, allow programming link-local prefix routes (usually one fe80::/64 per-table), so that the packets will be forwarded until reaching the link-local trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: spectrum: Add packet traps for BFD packetsIdo Schimmel3-0/+10
Bidirectional Forwarding Detection (BFD) provides "low-overhead, short-duration detection of failures in the path between adjacent forwarding engines" (RFC 5880). This is accomplished by exchanging BFD packets between the two forwarding engines. Up until now these packets were trapped via the general local delivery (i.e., IP2ME) trap which also traps a lot of other packets that are not as time-sensitive as BFD packets. Expose dedicated traps for BFD packets so that user space could configure a dedicated policer for them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: spectrum: Treat IPv6 link-local SIP as an exceptionIdo Schimmel1-1/+1
IPv6 packets that need to be forwarded and have a link-local source IP are dropped by the kernel and an ICMPv6 "Destination unreachable" is sent to the sending host. As such, change the trap group of such packets so that they do not interfere with IPv6 management packets. In the future this trap will be exposed as an exception via devlink-trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: spectrum: Share one group for all locally delivered packetsIdo Schimmel1-2/+2
Routed IP packets with the Router Alert option need to be trapped to the CPU as they might need to be locally delivered to raw sockets with the IP_ROUTER_ALERT / IPV6_ROUTER_ALERT socket option. Move them to the same group with other packets that might need to be trapped following route lookup. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: reg: Move all trap groups under the same enumIdo Schimmel1-7/+3
After the previous patch the split is no longer necessary and all the trap groups can be moved under the same enum. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: spectrum_trap: Do not hard code "thin" policer identifierIdo Schimmel2-6/+13
As explained in commit e612523041ab ("mlxsw: spectrum_trap: Introduce dummy group with thin policer"), the purpose of the "thin" policer is to pass as less packets as possible to the CPU. The identifier of this policer is currently set according to the maximum number of used trap groups, but this is fragile: On Spectrum-1 the maximum number of policers is less than the maximum number of trap groups, which might result in an invalid policer identifier in case the number of used trap groups grows beyond the policer limit. Solve this by dynamically allocating the policer identifier. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: switchx2: Move SwitchX-2 trap groups out of main enumIdo Schimmel2-2/+5
The number of Spectrum trap groups is not infinite, but two identifiers are occupied by SwitchX-2 specific trap groups. Free these identifiers by moving them out of the main enum. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: spectrum: Reduce priority of locally delivered packetsIdo Schimmel1-1/+1
To align with recent recommended values. Will be configurable by future patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: spectrum: Use same trap group for local routes and link-local destinationIdo Schimmel1-1/+1
Packets with an IPv6 link-local destination (i.e., fe80::/10) should not be forwarded and are therefore trapped to the CPU for local delivery. Since these packets are trapped for the same logical reason as packets hitting local routes, associate both traps with the same group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: spectrum: Use separate trap group for FID missIdo Schimmel2-1/+4
When a packet enters the device it is classified to a filtering identifier (FID) based on the ingress port and VLAN. The FID miss trap is used to trap packets for which a FID could not be found. In mlxsw this trap should only be triggered when a port is enslaved to an OVS bridge and a matching ACL rule could not be found, so as to trigger learning. These packets are therefore completely unrelated to packets hitting local routes and should be in a different group. Move them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: spectrum: Use same trap group for various IPv6 packetsIdo Schimmel1-3/+3
Group these various IPv6 packets (e.g., router solicitations, router advertisement) together and subject them to the same policer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: spectrum: Rename IPv6 ND trap groupIdo Schimmel2-6/+6
The IPv6 Neighbour Discovery (ND) group will be used for various IPv6 packets, not all of which fall under the definition of ND, so rename it to "IPV6" which is more appropriate. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: spectrum: Use same switch case for identical groupsIdo Schimmel1-3/+0
Trap groups that use the same policer settings can share the same switch case. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mlxsw: spectrum: Use dedicated trap group for ACL trapIdo Schimmel2-1/+4
Packets that are trapped via tc's trap action are currently subject to the same policer as packets hitting local routes. The latter are critical to the correct functioning of the control plane, while the former are mainly used for traffic inspection. Split the ACL trap to a separate group with its own policer. Use a higher priority for these traps than for traps using mirror action (e.g., ARP, IGMP). Otherwise, packets matching both traps will not be forwarded in hardware (because of trap action) and also not forwarded in software because they will be marked with 'offload_fwd_mark'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26flow_dissector: Parse multiple MPLS Label Stack EntriesGuillaume Nault1-8/+19
The current MPLS dissector only parses the first MPLS Label Stack Entry (second LSE can be parsed too, but only to set a key_id). This patch adds the possibility to parse several LSEs by making __skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long as the Bottom Of Stack bit hasn't been seen, up to a maximum of FLOW_DIS_MPLS_MAX entries. FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for many practical purposes, without wasting too much space. To record the parsed values, flow_dissector_key_mpls is modified to store an array of stack entries, instead of just the values of the first one. A bit field, "used_lses", is also added to keep track of the LSEs that have been set. The objective is to avoid defining a new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack. TC flower is adapted for the new struct flow_dissector_key_mpls layout. Matching on several MPLS Label Stack Entries will be added in the next patch. The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and mlx5's parse_tunnel() now verify that the rule only uses the first LSE and fail if it doesn't. Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is slightly modified. Instead of recording the first Entropy Label, it now records the last one. This shouldn't have any consequences since there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY in the tree. We'd probably better do a hash of all parsed MPLS labels instead (excluding reserved labels) anyway. That'd give better entropy and would probably also simplify the code. But that's not the purpose of this patch, so I'm keeping that as a future possible improvement. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24mlxsw: spectrum: Fix spelling mistake in trap's nameIdo Schimmel2-4/+4
Fix incorrect spelling of "advertisement". Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24mlxsw: spectrum: Use dedicated trap group for sampled packetsIdo Schimmel2-1/+3
The rate with which packets are sampled is determined by user space, so there is no need to associate such packets with a policer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24mlxsw: spectrum: Use same trap group for IPv6 ND and ARP packetsIdo Schimmel1-4/+4
Both packet types are needed for the same reason (neighbour discovery), so associate them with the same trap group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24mlxsw: spectrum: Rename ARP trap groupIdo Schimmel2-7/+8
The ARP trap group will be used for IPv6 ND traps in the next patch, so rename it to "NEIGH_DISCOVERY" which is more appropriate. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24mlxsw: spectrum_trap: Remove unnecessary fieldIdo Schimmel1-6/+1
Now that traffic class (TC) and priority are set to the same value, there is no need to store both. Remove the first. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24mlxsw: spectrum: Align TC and trap priorityIdo Schimmel2-5/+5
The traffic class (TC) attribute of packet traps determines through which TC a packet trap will be scheduled through the CPU port. The priority attribute determines which trap will be triggered in case several packet traps match a packet. We try to configure these attributes to the same value for all packet traps as there is little reason not to. Some packet traps did not use the same value, so rectify that now. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24mlxsw: spectrum_buffers: Assign non-zero quotas to TC 0 of the CPU portIdo Schimmel1-1/+1
As explained in commit 9ffcc3725f09 ("mlxsw: spectrum: Allow packets to be trapped from any PG"), incoming packets can be admitted to the shared buffer and forwarded / trapped, if: (Ingress{Port}.Usage < Thres && Ingress{Port,PG}.Usage < Thres && Egress{Port}.Usage < Thres && Egress{Port,TC}.Usage < Thres) || (Ingress{Port}.Usage < Min || Ingress{Port,PG} < Min || Egress{Port}.Usage < Min || Egress{Port,TC}.Usage < Min) Trapped packets are scheduled to transmission through the CPU port. Currently, the minimum and maximum quotas of traffic class (TC) 0 of the CPU port are 0, which means it is not usable. Assign non-zero quotas to TC 0 of the CPU port, so that it could be utilized by subsequent patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24mlxsw: spectrum: Change default rate and priority of DHCP packetsIdo Schimmel1-2/+2
Reduce the default acceptable rate of DHCP packets to 128 packets per second and reduce their priority. This is reasonable given the Spectrum ASICs are limited to 128 ports at the moment. These are only the default values. Users will be able to modify them via devlink-trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24mlxsw: spectrum: Trap IPv4 DHCP packets in routerIdo Schimmel2-1/+2
Currently, IPv4 DHCP packets are trapped during L2 forwarding, which means that packets might be trapped unnecessarily. Instead, only trap the DHCP packets that reach the router. Either because they were flooded to the router port or forwarded to it by the FDB. This is consistent with the corresponding IPv6 trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24mlxsw: spectrum: Use same trap group for MLD and IGMP packetsIdo Schimmel2-10/+7
Both packet types are needed for the same reason (multicast snooping), so associate them with the same trap group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24mlxsw: spectrum: Rename IGMP trap groupIdo Schimmel2-8/+8
The IGMP trap group will be used for MLD traps in the next patch, so rename it to "MC_SNOOPING" which is more appropriate. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller18-49/+173
The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-23Merge tag 'mlx5-fixes-2020-05-22' of ↵David S. Miller15-48/+152
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2020-05-22 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.13 ('net/mlx5: Add command entry handling completion') For -stable v5.2 ('net/mlx5: Fix error flow in case of function_setup failure') ('net/mlx5: Fix memory leak in mlx5_events_init') For -stable v5.3 ('net/mlx5e: Update netdev txq on completions during closure') ('net/mlx5e: kTLS, Destroy key object after destroying the TIS') ('net/mlx5e: Fix inner tirs handling') For -stable v5.6 ('net/mlx5: Fix cleaning unmanaged flow tables') ('net/mlx5: Fix a race when moving command interface to events mode') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-23Merge tag 'mlx5-updates-2020-05-22' of ↵David S. Miller20-1121/+1809
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-05-22 This series includes two updates and one cleanup patch 1) Tang Bim, clean-up with IS_ERR() usage 2) Vlad introduces a new mlx5 kconfig flag for TC support This is required due to the high volume of current and upcoming development in the eswitch and representors areas where some of the feature are TC based such as the downstream patches of MPLSoUDP and the following representor bonding support for VF live migration and uplink representor dynamic loading. For this Vlad kept TC specific code in tc.c and rep/tc.c and organized non TC code in representors specific files. 3) Eli Cohen adds support for MPLS over UPD encap and decap TC offloads. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-23net/mlx4_core: fix a memory leak bug.Qiushi Wu1-1/+1
In function mlx4_opreq_action(), pointer "mailbox" is not released, when mlx4_cmd_box() return and error, causing a memory leak bug. Fix this issue by going to "out" label, mlx4_free_cmd_mailbox() can free this pointer. Fixes: fe6f700d6cbb ("net/mlx4_core: Respond to operation request by firmware") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller11-214/+100
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-05-23 The following pull-request contains BPF updates for your *net-next* tree. We've added 50 non-merge commits during the last 8 day(s) which contain a total of 109 files changed, 2776 insertions(+), 2887 deletions(-). The main changes are: 1) Add a new AF_XDP buffer allocation API to the core in order to help lowering the bar for drivers adopting AF_XDP support. i40e, ice, ixgbe as well as mlx5 have been moved over to the new API and also gained a small improvement in performance, from Björn Töpel and Magnus Karlsson. 2) Add getpeername()/getsockname() attach types for BPF sock_addr programs in order to allow for e.g. reverse translation of load-balancer backend to service address/port tuple from a connected peer, from Daniel Borkmann. 3) Improve the BPF verifier is_branch_taken() logic to evaluate pointers being non-NULL, e.g. if after an initial test another non-NULL test on that pointer follows in a given path, then it can be pruned right away, from John Fastabend. 4) Larger rework of BPF sockmap selftests to make output easier to understand and to reduce overall runtime as well as adding new BPF kTLS selftests that run in combination with sockmap, also from John Fastabend. 5) Batch of misc updates to BPF selftests including fixing up test_align to match verifier output again and moving it under test_progs, allowing bpf_iter selftest to compile on machines with older vmlinux.h, and updating config options for lirc and v6 segment routing helpers, from Stanislav Fomichev, Andrii Nakryiko and Alan Maguire. 6) Conversion of BPF tracing samples outdated internal BPF loader to use libbpf API instead, from Daniel T. Lee. 7) Follow-up to BPF kernel test infrastructure in order to fix a flake in the XDP selftests, from Jesper Dangaard Brouer. 8) Minor improvements to libbpf's internal hashmap implementation, from Ian Rogers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22net/mlx5: Fix error flow in case of function_setup failureShay Drory1-1/+2
Currently, if an error occurred during mlx5_function_setup(), we keep dev->state as DEVICE_STATE_UP. Fixing it by adding a goto label. Fixes: e161105e58da ("net/mlx5: Function setup/teardown procedures") Signed-off-by: Shay Drory <shayd@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-22net/mlx5e: CT: Correctly get flow ruleRoi Dayan2-3/+6
The correct way is to us the flow_cls_offload_flow_rule() wrapper instead of f->rule directly. Fixes: 4c3844d9e97e ("net/mlx5e: CT: Introduce connection tracking") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>