linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2021-07-26	net: dsa: sja1105: add bridge TX data plane offload based on tag_8021q	Vladimir Oltean	1	-0/+4
	The main desire for having this feature in sja1105 is to support network stack termination for traffic coming from a VLAN-aware bridge. For sja1105, offloading the bridge data plane means sending packets as-is, with the proper VLAN tag, to the chip. The chip will look up its FDB and forward them to the correct destination port. But we support bridge data plane offload even for VLAN-unaware bridges, and the implementation there is different. In fact, VLAN-unaware bridging is governed by tag_8021q, so it makes sense to have the .bridge_fwd_offload_add() implementation fully within tag_8021q. The key difference is that we only support 1 VLAN-aware bridge, but we support multiple VLAN-unaware bridges. So we need to make sure that the forwarding domain is not crossed by packets injected from the stack. For this, we introduce the concept of a tag_8021q TX VLAN for bridge forwarding offload. As opposed to the regular TX VLANs which contain only 2 ports (the user port and the CPU port), a bridge data plane TX VLAN is "multicast" (or "imprecise"): it contains all the ports that are part of a certain bridge, and the hardware will select where the packet goes within this "imprecise" forwarding domain. Each VLAN-unaware bridge has its own "imprecise" TX VLAN, so we make use of the unique "bridge_num" provided by DSA for the data plane offload. We use the same 3 bits from the tag_8021q VLAN ID format to encode this bridge number. Note that these 3 bit positions have been used before for sub-VLANs in best-effort VLAN filtering mode. The difference is that for best-effort, the sub-VLANs were only valid on RX (and it was documented that the sub-VLAN field needed to be transmitted as zero). Whereas for the bridge data plane offload, these 3 bits are only valid on TX. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26	net: dsa: sja1105: add support for imprecise RX	Vladimir Oltean	1	-1/+7
	This is already common knowledge by now, but the sja1105 does not have hardware support for DSA tagging for data plane packets, and tag_8021q sets up a unique pvid per port, transmitted as VLAN-tagged towards the CPU, for the source port to be decoded nonetheless. When the port is part of a VLAN-aware bridge, the pvid committed to hardware is taken from the bridge and not from tag_8021q, so we need to work with that the best we can. Configure the switches to send all packets to the CPU as VLAN-tagged (even ones that were originally untagged on the wire) and make use of dsa_untag_bridge_pvid() to get rid of it before we send those packets up the network stack. With the classified VLAN used by hardware known to the tagger, we first peek at the VID in an attempt to figure out if the packet was received from a VLAN-unaware port (standalone or under a VLAN-unaware bridge), case in which we can continue to call dsa_8021q_rcv(). If that is not the case, the packet probably came from a VLAN-aware bridge. So we call the DSA helper that finds for us a "designated bridge port" - one that is a member of the VLAN ID from the packet, and is in the proper STP state - basically these are all checks performed by br_handle_frame() in the software RX data path. The bridge will accept the packet as valid even if the source port was maybe wrong. So it will maybe learn the MAC SA of the packet on the wrong port, and its software FDB will be out of sync with the hardware FDB. So replies towards this same MAC DA will not work, because the bridge will send towards a different netdev. This is where the bridge data plane offload ("imprecise TX") added by the next patch comes in handy. The software FDB is wrong, true, but the hardware FDB isn't, and by offloading the bridge forwarding plane we have a chance to right a wrong, and have the hardware look up the FDB for us for the reply packet. So it all cancels out. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26	net: dsa: sja1105: deny more than one VLAN-aware bridge	Vladimir Oltean	1	-0/+13
	With tag_sja1105.c's only ability being to perform an imprecise RX procedure and identify whether a packet comes from a VLAN-aware bridge or not, we have no way to determine whether a packet with VLAN ID 5 comes from, say, br0 or br1. Actually we could, but it would mean that we need to restrict all VLANs from br0 to be different from all VLANs from br1, and this includes the default_pvid, which makes a setup with 2 VLAN-aware bridges highly imprectical. The fact of the matter is that this isn't even that big of a practical limitation, since even with a single VLAN-aware bridge we can pretty much enforce forwarding isolation based on the VLAN port membership. So in the end, tell the user that they need to model their setup using a single VLAN-aware bridge. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26	net: dsa: sja1105: deny 8021q uppers on ports	Vladimir Oltean	1	-0/+15
	Now that best-effort VLAN filtering is gone and we are left with the imprecise RX and imprecise TX based in VLAN-aware mode, where the tagger just guesses the source port based on plausibility of the VLAN ID, 8021q uppers installed on top of a standalone port, while other ports of that switch are under a VLAN-aware bridge don't quite "just work". In fact it could be possible to restrict the VLAN IDs used by the 8021q uppers to not be shared with VLAN IDs used by that VLAN-aware bridge, but then the tagger needs to be patched to search for 8021q uppers too, not just for the "designated bridge port" which will be introduced in a later patch. I haven't given a possible implementation full thought, it seems maybe possible but not worth the effort right now. The only certain thing is that currently the tagger won't be able to figure out the source port for these packets because they will come with the VLAN ID of the 8021q upper and are no longer retagged to a tag_8021q sub-VLAN like the best effort VLAN filtering code used to do. So just deny these for the moment. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26	net: dsa: sja1105: delete vlan delta save/restore logic	Vladimir Oltean	2	-300/+114
	With the best_effort_vlan_filtering mode now gone, the driver does not have 3 operating modes anymore (VLAN-unaware, VLAN-aware and best effort), but only 2. The idea is that we will gain support for network stack I/O through a VLAN-aware bridge, using the data plane offload framework (imprecise RX, imprecise TX). So the VLAN-aware use case will be more functional. But standalone ports that are part of the same switch when some other ports are under a VLAN-aware bridge should work too. Termination on those should work through the tag_8021q RX VLAN and TX VLAN. This was not possible using the old logic, because: - in VLAN-unaware mode, only the tag_8021q VLANs were committed to hw - in VLAN-aware mode, only the bridge VLANs were committed to hw - in best-effort VLAN mode, both the tag_8021q and bridge VLANs were committed to hw The strategy for the new VLAN-aware mode is to allow the bridge and the tag_8021q VLANs to coexist in the VLAN table at the same time. [ yes, we need to make sure that the bridge cannot install a tag_8021q VLAN, but ] This means that the save/restore logic introduced by commit ec5ae61076d0 ("net: dsa: sja1105: save/restore VLANs using a delta commit method") does not serve a purpose any longer. We can delete it and restore the old code that simply adds a VLAN to the VLAN table and calls it a day. Note that we keep the sja1105_commit_pvid() function from those days, but adapt it slightly. Ports that are under a VLAN-aware bridge use the bridge's pvid, ports that are standalone or under a VLAN-unaware bridge use the tag_8021q pvid, for local termination or VLAN-unaware forwarding. Now, when the vlan_filtering property is toggled for the bridge, the pvid of the ports beneath it is the only thing that's changing, we no longer delete some VLANs and restore others. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26	net: dsa: sja1105: remove redundant re-assignment of pointer table	Colin Ian King	1	-2/+0
	The pointer table is being re-assigned with a value that is never read. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23	net: dsa: mv88e6xxx: map virtual bridges with forwarding offload in the PVT	Vladimir Oltean	1	-4/+74
	The mv88e6xxx switches have the ability to receive FORWARD (data plane) frames from the CPU port and route them according to the FDB. We can use this to offload the forwarding process of packets sent by the software bridge. Because DSA supports bridge domain isolation between user ports, just sending FORWARD frames is not enough, as they might leak the intended broadcast domain of the bridge on behalf of which the packets are sent. It should be noted that FORWARD frames are also (and typically) used to forward data plane packets on DSA links in cross-chip topologies. The FORWARD frame header contains the source port and switch ID, and switches receiving this frame header forward the packet according to their cross-chip port-based VLAN table (PVT). To address the bridging domain isolation in the context of offloading the forwarding on TX, the idea is that we can reuse the parts of the PVT that don't have any physical switch mapped to them, one entry for each software bridge. The switches will therefore think that behind their upstream port lie many switches, all in fact backed up by software bridges through tag_dsa.c, which constructs FORWARD packets with the right switch ID corresponding to each bridge. The mapping we use is absolutely trivial: DSA gives us a unique bridge number, and we add the number of the physical switches in the DSA switch tree to that, to obtain a unique virtual bridge device number to use in the PVT. Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	David S. Miller	4	-1/+10
	Conflicts are simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21	net: dsa: sja1105: make VID 4095 a bridge VLAN too	Vladimir Oltean	1	-0/+6
	This simple series of commands: ip link add br0 type bridge vlan_filtering 1 ip link set swp0 master br0 fails on sja1105 with the following error: [ 33.439103] sja1105 spi0.1: vlan-lookup-table needs to have at least the default untagged VLAN [ 33.447710] sja1105 spi0.1: Invalid config, cannot upload Warning: sja1105: Failed to change VLAN Ethertype. For context, sja1105 has 3 operating modes: - SJA1105_VLAN_UNAWARE: the dsa_8021q_vlans are committed to hardware - SJA1105_VLAN_FILTERING_FULL: the bridge_vlans are committed to hardware - SJA1105_VLAN_FILTERING_BEST_EFFORT: both the dsa_8021q_vlans and the bridge_vlans are committed to hardware Swapping out a VLAN list and another in happens in sja1105_build_vlan_table(), which performs a delta update procedure. That function is called from a few places, notably from sja1105_vlan_filtering() which is called from the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING handler. The above set of 2 commands fails when run on a kernel pre-commit 8841f6e63f2c ("net: dsa: sja1105: make devlink property best_effort_vlan_filtering true by default"). So the priv->vlan_state transition that takes place is between VLAN-unaware and full VLAN filtering. So the dsa_8021q_vlans are swapped out and the bridge_vlans are swapped in. So why does it fail? Well, the bridge driver, through nbp_vlan_init(), first sets up the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING attribute, and only then proceeds to call nbp_vlan_add for the default_pvid. So when we swap out the dsa_8021q_vlans and swap in the bridge_vlans in the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING handler, there are no bridge VLANs (yet). So we have wiped the VLAN table clean, and the low-level static config checker complains of an invalid configuration. We _will_ add the bridge VLANs using the dynamic config interface, albeit later, when nbp_vlan_add() calls us. So it is natural that it fails. So why did it ever work? Surprisingly, it looks like I only tested this configuration with 2 things set up in a particular way: - a network manager that brings all ports up - a kernel with CONFIG_VLAN_8021Q=y It is widely known that commit ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") installs VID 0 to every net device that comes up. DSA treats these VLANs as bridge VLANs, and therefore, in my testing, the list of bridge_vlans was never empty. However, if CONFIG_VLAN_8021Q is not enabled, or the port is not up when it joins a VLAN-aware bridge, the bridge_vlans list will be temporarily empty, and the sja1105_static_config_reload() call from sja1105_vlan_filtering() will fail. To fix this, the simplest thing is to keep VID 4095, the one used for CPU-injected control packets since commit ed040abca4c1 ("net: dsa: sja1105: use 4095 as the private VLAN for untagged traffic"), in the list of bridge VLANs too, not just the list of tag_8021q VLANs. This ensures that the list of bridge VLANs will never be empty. Fixes: ec5ae61076d0 ("net: dsa: sja1105: save/restore VLANs using a delta commit method") Reported-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20	mt7530 mt7530_fdb_write only set ivl bit vid larger than 1	Eric Woudstra	1	-1/+2
	Fixes my earlier patch which broke vlan unaware bridges. The IVL bit now only gets set for vid's larger than 1. Fixes: 11d8d98cbeef ("mt7530 fix mt7530_fdb_write vid missing ivl bit") Signed-off-by: Eric Woudstra <ericwouds@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20	net: dsa: tag_8021q: add proper cross-chip notifier support	Vladimir Oltean	1	-126/+6
	The big problem which mandates cross-chip notifiers for tag_8021q is this: \| sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 [ user ] [ user ] [ user ] [ dsa ] [ cpu ] \| +---------+ \| sw1p0 sw1p1 sw1p2 sw1p3 sw1p4 [ user ] [ user ] [ user ] [ dsa ] [ dsa ] \| +---------+ \| sw2p0 sw2p1 sw2p2 sw2p3 sw2p4 [ user ] [ user ] [ user ] [ dsa ] [ dsa ] When the user runs: ip link add br0 type bridge ip link set sw0p0 master br0 ip link set sw2p0 master br0 It doesn't work. This is because dsa_8021q_crosschip_bridge_join() assumes that "ds" and "other_ds" are at most 1 hop away from each other, so it is sufficient to add the RX VLAN of {ds, port} into {other_ds, other_port} and vice versa and presto, the cross-chip link works. When there is another switch in the middle, such as in this case switch 1 with its DSA links sw1p3 and sw1p4, somebody needs to tell it about these VLANs too. Which is exactly why the problem is quadratic: when a port joins a bridge, for each port in the tree that's already in that same bridge we notify a tag_8021q VLAN addition of that port's RX VLAN to the entire tree. It is a very complicated web of VLANs. It must be mentioned that currently we install tag_8021q VLANs on too many ports (DSA links - to be precise, on all of them). For example, when sw2p0 joins br0, and assuming sw1p0 was part of br0 too, we add the RX VLAN of sw2p0 on the DSA links of switch 0 too, even though there isn't any port of switch 0 that is a member of br0 (at least yet). In theory we could notify only the switches which sit in between the port joining the bridge and the port reacting to that bridge_join event. But in practice that is impossible, because of the way 'link' properties are described in the device tree. The DSA bindings require DT writers to list out not only the real/physical DSA links, but in fact the entire routing table, like for example switch 0 above will have: sw0p3: port@3 { link = <&sw1p4 &sw2p4>; }; This was done because: /* TODO: ideally DSA ports would have a single dp->link_dp member, * and no dst->rtable nor this struct dsa_link would be needed, * but this would require some more complex tree walking, * so keep it stupid at the moment and list them all. */ but it is a perfect example of a situation where too much information is actively detrimential, because we are now in the position where we cannot distinguish a real DSA link from one that is put there to avoid the 'complex tree walking'. And because DT is ABI, there is not much we can change. And because we do not know which DSA links are real and which ones aren't, we can't really know if DSA switch A is in the data path between switches B and C, in the general case. So this is why tag_8021q RX VLANs are added on all DSA links, and probably why it will never change. On the other hand, at least the number of additions/deletions is well balanced, and this means that once we implement reference counting at the cross-chip notifier level a la fdb/mdb, there is absolutely zero need for a struct dsa_8021q_crosschip_link, it's all self-managing. In fact, with the tag_8021q notifiers emitted from the bridge join notifiers, it becomes so generic that sja1105 does not need to do anything anymore, we can just delete its implementation of the .crosschip_bridge_{join,leave} methods. Among other things we can simply delete is the home-grown implementation of sja1105_notify_crosschip_switches(). The reason why that is wrong is because it is not quadratic - it only covers remote switches to which we have a cross-chip bridging link and that does not cover in-between switches. This deletion is part of the same patch because sja1105 used to poke deep inside the guts of the tag_8021q context in order to do that. Because the cross-chip links went away, so needs the sja1105 code. Last but not least, dsa_8021q_setup_port() is simplified (and also renamed). Because our TAG_8021Q_VLAN_ADD notifier is designed to react on the CPU port too, the four dsa_8021q_vid_apply() calls: - 1 for RX VLAN on user port - 1 for the user port's RX VLAN on the CPU port - 1 for TX VLAN on user port - 1 for the user port's TX VLAN on the CPU port now get squashed into only 2 notifier calls via dsa_port_tag_8021q_vlan_add. And because the notifiers to add and to delete a tag_8021q VLAN are distinct, now we finally break up the port setup and teardown into separate functions instead of relying on a "bool enabled" flag which tells us what to do. Arguably it should have been this way from the get go. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20	net: dsa: tag_8021q: absorb dsa_8021q_setup into dsa_tag_8021q_{,un}register	Vladimir Oltean	2	-37/+7
	Right now, setting up tag_8021q is a 2-step operation for a driver, first the context structure needs to be created, then the VLANs need to be installed on the ports. A similar thing is true for teardown. Merge the 2 steps into the register/unregister methods, to be as transparent as possible for the driver as to what tag_8021q does behind the scenes. This also gets rid of the funny "bool setup == true means setup, == false means teardown" API that tag_8021q used to expose. Note that dsa_tag_8021q_register() must be called at least in the .setup() driver method and never earlier (like in the driver probe function). This is because the DSA switch tree is not initialized at probe time, and the cross-chip notifiers will not work. For symmetry with .setup(), the unregister method should be put in .teardown(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20	net: dsa: make tag_8021q operations part of the core	Vladimir Oltean	2	-14/+6
	Make tag_8021q a more central element of DSA and move the 2 driver specific operations outside of struct dsa_8021q_context (which is supposed to hold dynamic data and not really constant function pointers). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20	net: dsa: let the core manage the tag_8021q context	Vladimir Oltean	4	-38/+26
	The basic problem description is as follows: Be there 3 switches in a daisy chain topology: \| sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 [ user ] [ user ] [ user ] [ dsa ] [ cpu ] \| +---------+ \| sw1p0 sw1p1 sw1p2 sw1p3 sw1p4 [ user ] [ user ] [ user ] [ dsa ] [ dsa ] \| +---------+ \| sw2p0 sw2p1 sw2p2 sw2p3 sw2p4 [ user ] [ user ] [ user ] [ user ] [ dsa ] The CPU will not be able to ping through the user ports of the bottom-most switch (like for example sw2p0), simply because tag_8021q was not coded up for this scenario - it has always assumed DSA switch trees with a single switch. To add support for the topology above, we must admit that the RX VLAN of sw2p0 must be added on some ports of switches 0 and 1 as well. This is in fact a textbook example of thing that can use the cross-chip notifier framework that DSA has set up in switch.c. There is only one problem: core DSA (switch.c) is not able right now to make the connection between a struct dsa_switch ds and a struct dsa_8021q_context ctx. Right now, it is drivers who call into tag_8021q.c and always provide a struct dsa_8021q_context *ctx pointer, and tag_8021q.c calls them back with the .tag_8021q_vlan_{add,del} methods. But with cross-chip notifiers, it is possible for tag_8021q to call drivers without drivers having ever asked for anything. A good example is right above: when sw2p0 wants to set itself up for tag_8021q, the .tag_8021q_vlan_add method needs to be called for switches 1 and 0, so that they transport sw2p0's VLANs towards the CPU without dropping them. So instead of letting drivers manage the tag_8021q context, add a tag_8021q_ctx pointer inside of struct dsa_switch, which will be populated when dsa_tag_8021q_register() returns success. The patch is fairly long-winded because we are partly reverting commit 5899ee367ab3 ("net: dsa: tag_8021q: add a context structure") which made the driver-facing tag_8021q API use "ctx" instead of "ds". Now that we can access "ctx" directly from "ds", this is no longer needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20	net: dsa: tag_8021q: create dsa_tag_8021q_{register,unregister} helpers	Vladimir Oltean	2	-17/+13
	In preparation of moving tag_8021q to core DSA, move all initialization and teardown related to tag_8021q which is currently done by drivers in 2 functions called "register" and "unregister". These will gather more functionality in future patches, which will better justify the chosen naming scheme. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20	net: dsa: sja1105: delete the best_effort_vlan_filtering mode	Vladimir Oltean	4	-600/+23
	Simply put, the best-effort VLAN filtering mode relied on VLAN retagging from a bridge VLAN towards a tag_8021q sub-VLAN in order to be able to decode the source port in the tagger, but the VLAN retagging implementation inside the sja1105 chips is not the best and we were relying on marginal operating conditions. The most notable limitation of the best-effort VLAN filtering mode is its incapacity to treat this case properly: ip link add br0 type bridge vlan_filtering 1 ip link set swp2 master br0 ip link set swp4 master br0 bridge vlan del dev swp4 vid 1 bridge vlan add dev swp4 vid 1 pvid When sending an untagged packet through swp2, the expectation is for it to be forwarded to swp4 as egress-tagged (so it will contain VLAN ID 1 on egress). But the switch will send it as egress-untagged. There was an attempt to fix this here: https://patchwork.kernel.org/project/netdevbpf/patch/20210407201452.1703261-2-olteanv@gmail.com/ but it failed miserably because it broke PTP RX timestamping, in a way that cannot be corrected due to hardware issues related to VLAN retagging. So with either PTP broken or pushing VLAN headers on egress for untagged packets being broken, the sad reality is that the best-effort VLAN filtering code is broken. Delete it. Note that this means there will be a temporary loss of functionality in this driver until it is replaced with something better (network stack RX/TX capability for "mode 2" as described in Documentation/networking/dsa/sja1105.rst, the "port under VLAN-aware bridge" case). We simply cannot keep this code until that driver rework is done, it is super bloated and tangled with tag_8021q. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-16	mt7530 fix mt7530_fdb_write vid missing ivl bit	Eric Woudstra	2	-0/+2
	According to reference guides mt7530 (mt7620) and mt7531: NOTE: When IVL is reset, MAC[47:0] and FID[2:0] will be used to read/write the address table. When IVL is set, MAC[47:0] and CVID[11:0] will be used to read/write the address table. Since the function only fills in CVID and no FID, we need to set the IVL bit. The existing code does not set it. This is a fix for the issue I dropped here earlier: http://lists.infradead.org/pipermail/linux-mediatek/2021-June/025697.html With this patch, it is now possible to delete the 'self' fdb entry manually. However, wifi roaming still has the same issue, the entry does not get deleted automatically. Wifi roaming also needs a fix somewhere else to function correctly in combination with vlan. Signed-off-by: Eric Woudstra <ericwouds@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-15	net: dsa: mv88e6xxx: NET_DSA_MV88E6XXX_PTP should depend on NET_DSA_MV88E6XXX	Geert Uytterhoeven	1	-1/+1
	Making global2 support mandatory removed the Kconfig symbol NET_DSA_MV88E6XXX_GLOBAL2. This symbol also served as an intermediate symbol to make NET_DSA_MV88E6XXX_PTP depend on NET_DSA_MV88E6XXX. With the symbol removed, the user is always asked about PTP support for Marvell 88E6xxx switches, even if the latter support is not enabled. Fix this by reinstating the dependency. Fixes: 63368a7416df144b ("net: dsa: mv88e6xxx: Make global2 support mandatory") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-13	net: dsa: sja1105: fix address learning getting disabled on the CPU port	Vladimir Oltean	1	-8/+6
	In May 2019 when commit 640f763f98c2 ("net: dsa: sja1105: Add support for Spanning Tree Protocol") was introduced, the comment that "STP does not get called for the CPU port" was true. This changed after commit 0394a63acfe2 ("net: dsa: enable and disable all ports") in August 2019 and went largely unnoticed, because the sja1105_bridge_stp_state_set() method did nothing different compared to the static setup done by sja1105_init_mac_settings(). With the ability to turn address learning off introduced by the blamed commit, there is a new priv->learn_ena port mask in the driver. When sja1105_bridge_stp_state_set() gets called and we are in BR_STATE_LEARNING or later, address learning is enabled or not depending on priv->learn_ena & BIT(port). So what happens is that priv->learn_ena is not being set from anywhere for the CPU port, and the static configuration done by sja1105_init_mac_settings() is being overwritten. To solve this, acknowledge that the static configuration of STP state is no longer necessary because the STP state is being set by the DSA core now, but what is necessary is to set priv->learn_ena for the CPU port. Fixes: 4d9423549501 ("net: dsa: sja1105: offload bridge port flags to device") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-11	dsa: fix for_each_child.cocci warnings	kernel test robot	1	-1/+3
	For_each_available_child_of_node should have of_node_put() before return around line 423. Generated by: scripts/coccinelle/iterators/for_each_child.cocci CC: Alexander Lobakin <alobakin@pm.me> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Julia Lawall <julia.lawall@inria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-01	net: dsa: mv88e6xxx: enable SerDes PCS register dump via ethtool -d on Topaz	Marek Behún	1	-0/+4
	Commit bf3504cea7d7e ("net: dsa: mv88e6xxx: Add 6390 family PCS registers to ethtool -d") added support for dumping SerDes PCS registers via ethtool -d for Peridot. The same implementation is also valid for Topaz, but was not enabled at the time. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: bf3504cea7d7e ("net: dsa: mv88e6xxx: Add 6390 family PCS registers to ethtool -d") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-01	net: dsa: mv88e6xxx: enable SerDes RX stats for Topaz	Marek Behún	2	-3/+9
	Commit 0df952873636a ("mv88e6xxx: Add serdes Rx statistics") added support for RX statistics on SerDes ports for Peridot. This same implementation is also valid for Topaz, but was not enabled at the time. We need to use the generic .serdes_get_lane() method instead of the Peridot specific one in the stats methods so that on Topaz the proper one is used. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: 0df952873636a ("mv88e6xxx: Add serdes Rx statistics") Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-01	net: dsa: mv88e6xxx: enable devlink ATU hash param for Topaz	Marek Behún	1	-0/+4
	Commit 23e8b470c7788 ("net: dsa: mv88e6xxx: Add devlink param for ATU hash algorithm.") introduced ATU hash algorithm access via devlink, but did not enable it for Topaz. Enable this feature also for Topaz. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: 23e8b470c7788 ("net: dsa: mv88e6xxx: Add devlink param for ATU hash algorithm.") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-01	net: dsa: mv88e6xxx: enable .rmu_disable() on Topaz	Marek Behún	1	-0/+2
	Commit 9e5baf9b36367 ("net: dsa: mv88e6xxx: add RMU disable op") introduced .rmu_disable() method with implementation for several models, but forgot to add Topaz, which can use the Peridot implementation. Use the Peridot implementation of .rmu_disable() on Topaz. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: 9e5baf9b36367 ("net: dsa: mv88e6xxx: add RMU disable op") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-01	net: dsa: mv88e6xxx: use correct .stats_set_histogram() on Topaz	Marek Behún	1	-2/+2
	Commit 40cff8fca9e3 ("net: dsa: mv88e6xxx: Fix stats histogram mode") introduced wrong .stats_set_histogram() method for Topaz family. The Peridot method should be used instead. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: 40cff8fca9e3 ("net: dsa: mv88e6xxx: Fix stats histogram mode") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-01	net: dsa: mv88e6xxx: enable .port_set_policy() on Topaz	Marek Behún	1	-0/+2
	Commit f3a2cd326e44 ("net: dsa: mv88e6xxx: introduce .port_set_policy") introduced .port_set_policy() method with implementation for several models, but forgot to add Topaz, which can use the 6352 implementation. Use the 6352 implementation of .port_set_policy() on Topaz. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: f3a2cd326e44 ("net: dsa: mv88e6xxx: introduce .port_set_policy") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	2	-3/+9
	Trivial conflict in net/netfilter/nf_tables_api.c. Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-next version. skmsg, and L4 bpf - keep the bpf code but remove the flags and err params. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-28	net: dsa: sja1105: fix dynamic access to L2 Address Lookup table for SJA1110	Vladimir Oltean	1	-4/+22
	The SJA1105P/Q/R/S and SJA1110 may have the same layout for the command to read/write/search for L2 Address Lookup entries, but as explained in the comments at the beginning of the sja1105_dynamic_config.c file, the command portion of the buffer is at the end, and we need to obtain a pointer to it by adding the length of the entry to the buffer. Alas, the length of an L2 Address Lookup entry is larger in SJA1110 than it is for SJA1105P/Q/R/S, so we need to create a common helper to access the command buffer, and this receives as argument the length of the entry buffer. Fixes: 3e77e59bf8cf ("net: dsa: sja1105: add support for the SJA1110 switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: dsa: sja1105: fix NULL pointer dereference in sja1105_reload_cbs()	Vladimir Oltean	1	-0/+6
	priv->cbs is an array of priv->info->num_cbs_shapers elements of type struct sja1105_cbs_entry which only get allocated if CONFIG_NET_SCH_CBS is enabled. However, sja1105_reload_cbs() is called from sja1105_static_config_reload() which in turn is called for any of the items in sja1105_reset_reasons, therefore during the normal runtime of the driver and not just from a code path which can be triggered by the tc-cbs offload. The sja1105_reload_cbs() function does not contain a check whether the priv->cbs array is NULL or not, it just assumes it isn't and proceeds to iterate through the credit-based shaper elements. This leads to a NULL pointer dereference. The solution is to return success if the priv->cbs array has not been allocated, since sja1105_reload_cbs() has nothing to do. Fixes: 4d7525085a9b ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: dsa: sja1105: document the SJA1110 in the Kconfig	Vladimir Oltean	1	-2/+6
	Mention support for the SJA1110 in menuconfig. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22	net: dsa: b53: Create default VLAN entry explicitly	Florian Fainelli	1	-8/+19
	In case CONFIG_VLAN_8021Q is not set, there will be no call down to the b53 driver to ensure that the default PVID VLAN entry will be configured with the appropriate untagged attribute towards the CPU port. We were implicitly relying on dsa_slave_vlan_rx_add_vid() to do that for us, instead make it explicit. Reported-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21	net: dsa: mv88e6xxx: Fix adding vlan 0	Eldar Gasanov	1	-3/+3
	8021q module adds vlan 0 to all interfaces when it starts. When 8021q module is loaded it isn't possible to create bond with mv88e6xxx interfaces, bonding module dipslay error "Couldn't add bond vlan ids", because it tries to add vlan 0 to slave interfaces. There is unexpected behavior in the switch. When a PVID is assigned to a port the switch changes VID to PVID in ingress frames with VID 0 on the port. Expected that the switch doesn't assign PVID to tagged frames with VID 0. But there isn't a way to change this behavior in the switch. Fixes: 57e661aae6a8 ("net: dsa: mv88e6xxx: Link aggregation support") Signed-off-by: Eldar Gasanov <eldargasanov2@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18	net: dsa: sja1105: completely error out in sja1105_static_config_reload if ↵	Vladimir Oltean	1	-7/+12
	something fails If reloading the static config fails for whatever reason, for example if sja1105_static_config_check_valid() fails, then we "goto out_unlock_ptp" but we print anyway that "Reset switch and programmed static config.", which is confusing because we didn't. We also do a bunch of other stuff like reprogram the XPCS and reload the credit-based shapers, as if a switch reset took place, which didn't. So just unlock the PTP lock and goto out, skipping all of that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18	net: dsa: sja1105: allow the TTEthernet configuration in the static config ↵	Vladimir Oltean	1	-2/+1
	for SJA1110 Currently sja1105_static_config_check_valid() is coded up to detect whether TTEthernet is supported based on device ID, and this check was not updated to cover SJA1110. However, it is desirable to have as few checks for the device ID as possible, so the driver core is more generic. So what we can do is look at the static config table operations implemented by that specific switch family (populated by sja1105_static_config_init) whether the schedule table has a non-zero maximum entry count (meaning that it is supported) or not. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18	net: dsa: sja1105: properly power down the microcontroller clock for SJA1110	Vladimir Oltean	4	-16/+53
	It turns out that powering down the BASE_TIMER_CLK does not turn off the microcontroller, just its timers, including the one for the watchdog. So the embedded microcontroller is still running, and potentially still doing things. To prevent unwanted interference, we should power down the BASE_MCSS_CLK as well (MCSS = microcontroller subsystem). The trouble is that currently we turn off the BASE_TIMER_CLK for SJA1110 from the .clocking_setup() method, mostly because this is a Clock Generation Unit (CGU) setting which was traditionally configured in that method for SJA1105. But in SJA1105, the CGU was used for bringing up the port clocks at the proper speeds, and in SJA1110 it's not (but rather for initial configuration), so it's best that we rebrand the sja1110_clocking_setup() method into what it really is - an implementation of the .disable_microcontroller() method. Since disabling the microcontroller only needs to be done once, at probe time, we can choose the best place to do that as being in sja1105_setup(), before we upload the static config to the device. This guarantees that the static config being used by the switch afterwards is really ours. Note that the procedure to upload a static config necessarily resets the switch. This already did not reset the microcontroller, only the switch core, so since the .disable_microcontroller() method is guaranteed to be called by that point, if it's disabled, it remains disabled. Add a comment to make that clear. With the code movement for SJA1110 from .clocking_setup() to .disable_microcontroller(), both methods are optional and are guarded by "if" conditions. Tested by enabling in the device tree the rev-mii switch port 0 that goes towards the microcontroller, and flashing a firmware that would have networking. Without this patch, the microcontroller can be pinged, with this patch it cannot. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16	net: dsa: xrs700x: forward HSR supervision frames	George McCollister	1	-8/+19
	Forward supervision frames between redunant HSR ports. This was broken in the last commit. Fixes: 1a42624aecba ("net: dsa: xrs700x: allow HSR/PRP supervision dupes for node_table") Signed-off-by: George McCollister <george.mccollister@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15	net: dsa: b53: remove redundant null check on dev	Colin Ian King	1	-2/+1
	The pointer dev can never be null, the null check is redundant and can be removed. Cleans up a static analysis warning that pointer priv is dereferencing dev before dev is being null checked. Addresses-Coverity: ("Dereference before null check") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14	net: dsa: sja1105: constify the sja1105_regs structures	Vladimir Oltean	1	-3/+3
	The struct sja1105_regs tables are not modified during the runtime of the driver, so they can be made constant. In fact, struct sja1105_info already holds a const pointer to these. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14	net: phy: micrel: ksz886x/ksz8081: add cabletest support	Oleksij Rempel	1	-0/+13
	This patch support for cable test for the ksz886x switches and the ksz8081 PHY. The patch was tested on a KSZ8873RLL switch with following results: - port 1: - provides invalid values, thus return -ENOTSUPP (Errata: DS80000830A: "LinkMD does not work on Port 1", http://ww1.microchip.com/downloads/en/DeviceDoc/KSZ8873-Errata-DS80000830A.pdf) - port 2: - can detect distance - can detect open on each wire of pair A (wire 1 and 2) - can detect open only on one wire of pair B (only wire 3) - can detect short between wires of a pair (wires 1 + 2 or 3 + 6) - short between pairs is detected as open. For example short between wires 2 + 3 is detected as open. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14	net: dsa: microchip: ksz8795: add LINK_MD register support	Oleksij Rempel	2	-2/+25
	Add mapping for LINK_MD register to enable cable testing functionality. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14	net: phy/dsa micrel/ksz886x add MDI-X support	Oleksij Rempel	1	-0/+5
	Add support for MDI-X status and configuration Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14	net: dsa: microchip: ksz8795: add phylink support	Michael Grzeschik	1	-0/+55
	This patch adds the phylink support to the ksz8795 driver to provide configuration exceptions on quirky KSZ8863 and KSZ8873 ports. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14	net: phy: micrel: move phy reg offsets to common header	Michael Grzeschik	2	-121/+60
	Some micrel devices share the same PHY register defines. This patch moves them to one common header so other drivers can reuse them. And reuse generic MII_* defines where possible. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11	net: dsa: sja1105: plug in support for 2500base-x	Vladimir Oltean	3	-2/+16
	The MAC treats 2500base-x same as SGMII (yay for that) except that it must be set to a different speed. Extend all places that check for SGMII to also check for 2500base-x. Also add the missing 2500base-x compatibility matrix entry for SJA1110D. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11	net: dsa: sja1105: SGMII and 2500base-x on the SJA1110 are 'special'	Vladimir Oltean	1	-0/+2
	For the xMII Mode Parameters Table to be properly configured for SGMII mode on SJA1110, we need to set the "special" bit, since SGMII is officially bitwise coded as 0b0011 in SJA1105 (decimal 3, equal to XMII_MODE_SGMII), and as 0b1011 in SJA1110 (decimal 11). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11	net: dsa: sja1105: register the PCS MDIO bus for SJA1110	Vladimir Oltean	3	-0/+109
	On the SJA1110, the PCS of each SERDES-capable port is accessed through a different memory window which is 0x100 bytes in size, denoted by "pcs_base". In each PCS register access window, the XPCS MMDs are accessed in an indirect way: in pages/banks of up to 0x100 addresses each. Changing the page/bank is done by writing to a special register at the end of the access window. The MDIO register map accessed indirectly through the indirect banked method described above is similar to what SJA1105 has: upper 5 bits are the MMD, lower 16 bits are the MDIO address within that MMD. Since the PHY ID reported by the XPCS inside SJA1110 is also all zeroes (like SJA1105), we need to trap those reads and return a fake PHY ID so that the xpcs driver can apply some specific fixups for our integration. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11	net: dsa: sja1105: migrate to xpcs for SGMII	Vladimir Oltean	6	-199/+195
	There is a desire to use the generic driver for the Synopsys XPCS located in drivers/net/pcs, and to achieve that, the sja1105 driver must expose an MDIO bus for the SGMII PCS, because the XPCS probes as an mdio_device. In preparation of the SJA1110 which in fact has a different access procedure for the SJA1105, we register this PCS MDIO bus once in the common code, but we implement function pointers for the read and write methods. In this patch there is a single implementation for them. There is exactly one MDIO bus for the PCS, this will contain all PCSes at MDIO addresses equal to the port number. We delete a bunch of hardware support code because the xpcs driver already does what we need. We need to hack up the MDIO reads for the PHY ID, since our XPCS instantiation returns zeroes and there are some specific fixups which need to be applied by the xpcs driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11	net: dsa: sja1105: implement TX timestamping for SJA1110	Vladimir Oltean	4	-0/+81
	The TX timestamping procedure for SJA1105 is a bit unconventional because the transmit procedure itself is unconventional. Control packets (and therefore PTP as well) are transmitted to a specific port in SJA1105 using "management routes" which must be written over SPI to the switch. These are one-shot rules that match by destination MAC address on traffic coming from the CPU port, and select the precise destination port for that packet. So to transmit a packet from NET_TX softirq context, we actually need to defer to a process context so that we can perform that SPI write before we send the packet. The DSA master dev_queue_xmit() runs in process context, and we poll until the switch confirms it took the TX timestamp, then we annotate the skb clone with that TX timestamp. This is why the sja1105 driver does not need an skb queue for TX timestamping. But the SJA1110 is a bit (not much!) more conventional, and you can request 2-step TX timestamping through the DSA header, as well as give the switch a cookie (timestamp ID) which it will give back to you when it has the timestamp. So now we do need a queue for keeping the skb clones until their TX timestamps become available. The interesting part is that the metadata frames from SJA1105 haven't disappeared completely. On SJA1105 they were used as follow-ups which contained RX timestamps, but on SJA1110 they are actually TX completion packets, which contain a variable (up to 32) array of timestamps. Why an array? Because: - not only is the TX timestamp on the egress port being communicated, but also the RX timestamp on the CPU port. Nice, but we don't care about that, so we ignore it. - because a packet could be multicast to multiple egress ports, each port takes its own timestamp, and the TX completion packet contains the individual timestamps on each port. This is unconventional because switches typically have a timestamping FIFO and raise an interrupt, but this one doesn't. So the tagger needs to detect and parse meta frames, and call into the main switch driver, which pairs the timestamps with the skbs in the TX timestamping queue which are waiting for one. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11	net: dsa: sja1105: add the RX timestamping procedure for SJA1110	Vladimir Oltean	4	-3/+40
	This is really easy, since the full RX timestamp is in the DSA trailer and the tagger code transfers it to SJA1105_SKB_CB(skb)->tstamp, we just need to move it to the skb shared info region. This is as opposed to SJA1105, where the RX timestamp was received in a meta frame (so there needed to be a state machine to pair the 2 packets) and the timestamp was partial (so the packet, once matched with its timestamp, needed to be added to an RX timestamping queue where the PTP aux worker would reconstruct that timestamp). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11	net: dsa: add support for the SJA1110 native tagging protocol	Vladimir Oltean	5	-1/+18
	The SJA1110 has improved a few things compared to SJA1105: - To send a control packet from the host port with SJA1105, one needed to program a one-shot "management route" over SPI. This is no longer true with SJA1110, you can actually send "in-band control extensions" in the packets sent by DSA, these are in fact DSA tags which contain the destination port and switch ID. - When receiving a control packet from the switch with SJA1105, the source port and switch ID were written in bytes 3 and 4 of the destination MAC address of the frame (which was a very poor shot at a DSA header). If the control packet also had an RX timestamp, that timestamp was sent in an actual follow-up packet, so there were reordering concerns on multi-core/multi-queue DSA masters, where the metadata frame with the RX timestamp might get processed before the actual packet to which that timestamp belonged (there is no way to pair a packet to its timestamp other than the order in which they were received). On SJA1110, this is no longer true, control packets have the source port, switch ID and timestamp all in the DSA tags. - Timestamps from the switch were partial: to get a 64-bit timestamp as required by PTP stacks, one would need to take the partial 24-bit or 32-bit timestamp from the packet, then read the current PTP time very quickly, and then patch in the high bits of the current PTP time into the captured partial timestamp, to reconstruct what the full 64-bit timestamp must have been. That is awful because packet processing is done in NAPI context, but reading the current PTP time is done over SPI and therefore needs sleepable context. But it also aggravated a few things: - Not only is there a DSA header in SJA1110, but there is a DSA trailer in fact, too. So DSA needs to be extended to support taggers which have both a header and a trailer. Very unconventional - my understanding is that the trailer exists because the timestamps couldn't be prepared in time for putting them in the header area. - Like SJA1105, not all packets sent to the CPU have the DSA tag added to them, only control packets do: * the ones which match the destination MAC filters/traps in MAC_FLTRES1 and MAC_FLTRES0 * the ones which match FDB entries which have TRAP or TAKETS bits set So we could in theory hack something up to request the switch to take timestamps for all packets that reach the CPU, and those would be DSA-tagged and contain the source port / switch ID by virtue of the fact that there needs to be a timestamp trailer provided. BUT: - The SJA1110 does not parse its own DSA tags in a way that is useful for routing in cross-chip topologies, a la Marvell. And the sja1105 driver already supports cross-chip bridging from the SJA1105 days. It does that by automatically setting up the DSA links as VLAN trunks which contain all the necessary tag_8021q RX VLANs that must be communicated between the switches that span the same bridge. So when using tag_8021q on sja1105, it is possible to have 2 switches with ports sw0p0, sw0p1, sw1p0, sw1p1, and 2 VLAN-unaware bridges br0 and br1, and br0 can take sw0p0 and sw1p0, and br1 can take sw0p1 and sw1p1, and forwarding will happen according to the expected rules of the Linux bridge. We like that, and we don't want that to go away, so as a matter of fact, the SJA1110 tagger still needs to support tag_8021q. So the sja1110 tagger is a hybrid between tag_8021q for data packets, and the native hardware support for control packets. On RX, packets have a 13-byte trailer if they contain an RX timestamp. That trailer is padded in such a way that its byte 8 (the start of the "residence time" field - not parsed by Linux because we don't care) is aligned on a 16 byte boundary. So the padding has a variable length between 0 and 15 bytes. The DSA header contains the offset of the beginning of the padding relative to the beginning of the frame (and the end of the padding is obviously the end of the packet minus 13 bytes, the length of the trailer). So we discard it. Packets which don't have a trailer contain the source port and switch ID information in the header (they are "trap-to-host" packets). Packets which have a trailer contain the source port and switch ID in the trailer. On TX, the destination port mask and switch ID is always in the trailer, so we always need to say in the header that a trailer is present. The header needs a custom EtherType and this was chosen as 0xdadc, after 0xdada which is for Marvell and 0xdadb which is for VLANs in VLAN-unaware mode on SJA1105 (and SJA1110 in fact too). Because we use tag_8021q in concert with the native tagging protocol, control packets will have 2 DSA tags. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>