summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2021-07-29mctp: Add neighbour implementationMatt Johnston3-0/+30
Add an initial neighbour table implementation, to be used in the route output path. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add netlink route managementMatt Johnston1-0/+2
This change adds RTM_GETROUTE, RTM_NEWROUTE & RTM_DELROUTE handlers, allowing management of the MCTP route table. Includes changes from Jeremy Kerr <jk@codeconstruct.com.au>. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add initial routing frameworkJeremy Kerr3-0/+95
Add a simple routing table, and a couple of route output handlers, and the mctp packet_type & handler. Includes changes from Matt Johnston <matt@codeconstruct.com.au>. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add device handling and netlink interfaceJeremy Kerr6-0/+67
This change adds the infrastructure for managing MCTP netdevices; we add a pointer to the AF_MCTP-specific data to struct netdevice, and hook up the rtnetlink operations for adding and removing addresses. Includes changes from Matt Johnston <matt@codeconstruct.com.au>. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add initial driver infrastructureJeremy Kerr1-0/+1
Add an empty drivers/net/mctp/, for future interface drivers. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add sockaddr_mctp to uapiJeremy Kerr1-0/+21
This change introduces the user-visible MCTP header, containing the protocol-specific addressing definitions. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add base packet definitionsJeremy Kerr1-0/+35
Simple packet header format as defined by DMTF DSP0236. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add MCTP baseJeremy Kerr2-1/+20
Add basic Kconfig, an initial (empty) af_mctp source object, and {AF,PF}_MCTP definitions, and the required definitions for a new protocol type. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29nfc: constify passed nfc_devKrzysztof Kozlowski1-2/+2
The struct nfc_dev is not modified by nfc_get_drvdata() and nfc_device_name() so it can be made a const. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29skbuff: allow 'slow_gro' for skb carring sock referencePaolo Abeni1-0/+9
This change leverages the infrastructure introduced by the previous patches to allow soft devices passing to the GRO engine owned skbs without impacting the fast-path. It's up to the GRO caller ensuring the slow_gro bit validity before invoking the GRO engine. The new helper skb_prepare_for_gro() is introduced for that goal. On slow_gro, skbs are aggregated only with equal sk. Additionally, skb truesize on GRO recycle and free is correctly updated so that sk wmem is not changed by the GRO processing. rfc-> v1: - fixed bad truesize on dev_gro_receive NAPI_FREE - use the existing state bit Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29sk_buff: track dst status in slow_groPaolo Abeni2-0/+4
Similar to the previous patch, but covering the dst field: the slow_gro flag is additionally set when a dst is attached to the skb RFC -> v1: - use the existing flag instead of adding a new one Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29sk_buff: introduce 'slow_gro' flagsPaolo Abeni1-0/+4
The new flag tracks if any state field is set, so that GRO requires 'unusual'/slow prepare steps. Set such flag when a ct entry is attached to the skb, and never clear it. The new bit uses an existing hole into the sk_buff struct RFC -> v1: - use a single state bit, never clear it - avoid moving the _nfct field Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28net/sched: act_skbmod: Add SKBMOD_F_ECN option supportPeilin Ye1-0/+1
Currently, when doing rate limiting using the tc-police(8) action, the easiest way is to simply drop the packets which exceed or conform the configured bandwidth limit. Add a new option to tc-skbmod(8), so that users may use the ECN [1] extension to explicitly inform the receiver about the congestion instead of dropping packets "on the floor". The 2 least significant bits of the Traffic Class field in IPv4 and IPv6 headers are used to represent different ECN states [2]: 0b00: "Non ECN-Capable Transport", Non-ECT 0b10: "ECN Capable Transport", ECT(0) 0b01: "ECN Capable Transport", ECT(1) 0b11: "Congestion Encountered", CE As an example: $ tc filter add dev eth0 parent 1: protocol ip prio 10 \ matchall action skbmod ecn Doing the above marks all ECT(0) and ECT(1) packets as CE. It does NOT affect Non-ECT or non-IP packets. In the tc-police scenario mentioned above, users may pipe a tc-police action and a tc-skbmod "ecn" action together to achieve ECN-based rate limiting. For TCP connections, upon receiving a CE packet, the receiver will respond with an ECE packet, asking the sender to reduce their congestion window. However ECN also works with other L4 protocols e.g. DCCP and SCTP [2], and our implementation does not touch or care about L4 headers. The updated tc-skbmod SYNOPSIS looks like the following: tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ... Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod command. Trying to use more than one of them at a time is considered undefined behavior; pipe multiple tc-skbmod commands together instead. "set" and "swap" only affect Ethernet packets, while "ecn" only affects IPv{4,6} packets. It is also worth mentioning that, in theory, the same effect could be achieved by piping a "police" action and a "bpf" action using the bpf_skb_ecn_set_ce() helper, but this requires eBPF programming from the user, thus impractical. Depends on patch "net/sched: act_skbmod: Skip non-Ethernet packets". [1] https://datatracker.ietf.org/doc/html/rfc3168 [2] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28devlink: Remove duplicated registration checkLeon Romanovsky1-3/+1
Both registered flag and devlink pointer are set at the same time and indicate the same thing - devlink/devlink_port are ready. Instead of checking ->registered use devlink pointer as an indication. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27net: bonding: move ioctl handling to private ndo operationArnd Bergmann1-3/+10
All other user triggered operations are gone from ndo_ioctl, so move the SIOCBOND family into a custom operation as well. The .ndo_ioctl() helper is no longer called by the dev_ioctl.c code now, but there are still a few definitions in obsolete wireless drivers as well as the appletalk and ieee802154 layers to call SIOCSIFADDR/SIOCGIFADDR helpers from inside the kernel. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27net: bridge: move bridge ioctls out of .ndo_do_ioctlArnd Bergmann1-1/+6
Working towards obsoleting the .ndo_do_ioctl operation entirely, stop passing the SIOCBRADDIF/SIOCBRDELIF device ioctl commands into this callback. My first attempt was to add another ndo_siocbr() callback, but as there is only a single driver that takes these commands and there is already a hook mechanism to call directly into this driver, extend this hook instead, and use it for both the deviceless and the device specific ioctl commands. Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: bridge@lists.linux-foundation.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27net: split out ndo_siowandev ioctlArnd Bergmann2-2/+4
In order to further reduce the scope of ndo_do_ioctl(), move out the SIOCWANDEV handling into a new network device operation function. Adjust the prototype to only pass the if_settings sub-structure in place of the ifreq, and remove the redundant 'cmd' argument in the process. Cc: Krzysztof Halasa <khc@pm.waw.pl> Cc: "Jan \"Yenya\" Kasprzak" <kas@fi.muni.cz> Cc: Kevin Curtis <kevin.curtis@farsite.co.uk> Cc: Zhao Qiang <qiang.zhao@nxp.com> Cc: Martin Schiller <ms@dev.tdt.de> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: linux-x25@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27dev_ioctl: split out ndo_eth_ioctlArnd Bergmann2-7/+13
Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP. Separate these from the few drivers that use ndo_do_ioctl to implement SIOCBOND, SIOCBR and SIOCWANDEV commands. This is a purely cosmetic change intended to help readers find their way through the implementation. Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27dev_ioctl: pass SIOCDEVPRIVATE data separatelyArnd Bergmann1-2/+2
The compat handlers for SIOCDEVPRIVATE are incorrect for any driver that passes data as part of struct ifreq rather than as an ifr_data pointer, or that passes data back this way, since the compat_ifr_data_ioctl() helper overwrites the ifr_data pointer and does not copy anything back out. Since all drivers using devprivate commands are now converted to the new .ndo_siocdevprivate callback, fix this by adding the missing piece and passing the pointer separately the whole way. This further unifies the native and compat logic for socket ioctls, as the new code now passes the correct pointer as well as the correct data for both native and compat ioctls. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27ip_tunnel: use ndo_siocdevprivateArnd Bergmann1-1/+2
The various ipv4 and ipv6 tunnel drivers each implement a set of 12 SIOCDEVPRIVATE commands for managing tunnels. These all work correctly in compat mode. Move them over to the new .ndo_siocdevprivate operation. Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Ahern <dsahern@kernel.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27hamradio: use ndo_siocdevprivateArnd Bergmann1-1/+1
hamradio uses a set of private ioctls that do seem to work correctly in compat mode, as they only rely on the ifr_data pointer. Move them over to the ndo_siocdevprivate callback as a cleanup. Cc: Thomas Sailer <t.sailer@alumni.ethz.ch> Cc: Joerg Reuter <jreuter@yaina.de> Cc: Jean-Paul Roubelat <jpr@f6fbb.org> Cc: linux-hams@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27net: split out SIOCDEVPRIVATE handling from dev_ioctlArnd Bergmann1-0/+3
SIOCDEVPRIVATE ioctl commands are mainly used in really old drivers, and they have a number of problems: - They hide behind the normal .ndo_do_ioctl function that is also used for other things in modern drivers, so it's hard to spot a driver that actually uses one of these - Since drivers use a number different calling conventions, it is impossible to support compat mode for them in a generic way. - With all drivers using the same 16 commands codes, there is no way to introspect the data being passed through things like strace. Add a new net_device_ops callback pointer, to address the first two of these. Separating them from .ndo_do_ioctl makes it easy to grep for drivers with a .ndo_siocdevprivate callback, and the unwieldy name hopefully makes it easier to spot in code review. By passing the ifreq structure and the ifr_data pointer separately, it is no longer necessary to overload these, and the driver can use either one for a given command. Cc: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27qdisc: add new field for qdisc_enqueue tracepointTonghao Zhang1-0/+2
qdisc_enqueue tracepoint can work with qdisc:qdisc_dequeue to measure packets latency in qdisc queues. Add a new field txq for it, then we can retrieve more info. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27openvswitch: fix alignment issuesMark Gray1-2/+2
Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27openvswitch: update kdoc OVS_DP_ATTR_PER_CPU_PIDSMark Gray1-1/+1
Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27net: netlink: add the case when nlh is NULLYajun Deng1-1/+1
Add the case when nlh is NULL in nlmsg_report(), so that the caller doesn't need to deal with this case. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26Revert "net: dsa: Allow drivers to filter packets they can decode source ↵Vladimir Oltean1-15/+0
port from" This reverts commit cc1939e4b3aaf534fb2f3706820012036825731c. Currently 2 classes of DSA drivers are able to send/receive packets directly through the DSA master: - drivers with DSA_TAG_PROTO_NONE - sja1105 Now that sja1105 has gained the ability to perform traffic termination even under the tricky case (VLAN-aware bridge), and that is much more functional (we can perform VLAN-aware bridging with foreign interfaces), there is no reason to keep this code in the receive path of the network core. So delete it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26net: dsa: sja1105: add bridge TX data plane offload based on tag_8021qVladimir Oltean1-0/+10
The main desire for having this feature in sja1105 is to support network stack termination for traffic coming from a VLAN-aware bridge. For sja1105, offloading the bridge data plane means sending packets as-is, with the proper VLAN tag, to the chip. The chip will look up its FDB and forward them to the correct destination port. But we support bridge data plane offload even for VLAN-unaware bridges, and the implementation there is different. In fact, VLAN-unaware bridging is governed by tag_8021q, so it makes sense to have the .bridge_fwd_offload_add() implementation fully within tag_8021q. The key difference is that we only support 1 VLAN-aware bridge, but we support multiple VLAN-unaware bridges. So we need to make sure that the forwarding domain is not crossed by packets injected from the stack. For this, we introduce the concept of a tag_8021q TX VLAN for bridge forwarding offload. As opposed to the regular TX VLANs which contain only 2 ports (the user port and the CPU port), a bridge data plane TX VLAN is "multicast" (or "imprecise"): it contains all the ports that are part of a certain bridge, and the hardware will select where the packet goes within this "imprecise" forwarding domain. Each VLAN-unaware bridge has its own "imprecise" TX VLAN, so we make use of the unique "bridge_num" provided by DSA for the data plane offload. We use the same 3 bits from the tag_8021q VLAN ID format to encode this bridge number. Note that these 3 bit positions have been used before for sub-VLANs in best-effort VLAN filtering mode. The difference is that for best-effort, the sub-VLANs were only valid on RX (and it was documented that the sub-VLAN field needed to be transmitted as zero). Whereas for the bridge data plane offload, these 3 bits are only valid on TX. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26net: bridge: add a helper for retrieving port VLANs from the data pathVladimir Oltean1-0/+8
Introduce a brother of br_vlan_get_info() which is protected by the RCU mechanism, as opposed to br_vlan_get_info() which relies on taking the write-side rtnl_mutex. This is needed for drivers which need to find out whether a bridge port has a VLAN configured or not. For example, certain DSA switches might not offer complete source port identification to the CPU on RX, just the VLAN in which the packet was received. Based on this VLAN, we cannot set an accurate skb->dev ingress port, but at least we can configure one that behaves the same as the correct one would (this is possible because DSA sets skb->offload_fwd_mark = 1). When we look at the bridge RX handler (br_handle_frame), we see that what matters regarding skb->dev is the VLAN ID and the port STP state. So we need to select an skb->dev that has the same bridge VLAN as the packet we're receiving, and is in the LEARNING or FORWARDING STP state. The latter is easy, but for the former, we should somehow keep a shadow list of the bridge VLANs on each port, and a lookup table between VLAN ID and the 'designated port for imprecise RX'. That is rather complicated to keep in sync properly (the designated port per VLAN needs to be updated on the addition and removal of a VLAN, as well as on the join/leave events of the bridge on that port). So, to avoid all that complexity, let's just iterate through our finite number of ports and ask the bridge, for each packet: "do you have this VLAN configured on this port?". Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Cc: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26Merge tag 'mlx5-updates-2021-07-24' of ↵David S. Miller1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2021-07-24 This series aims to reduce coupling in mlx5e, particularly between RX resources (TIRs, RQTs) and numerous code units that use them. This refactoring is required for upcoming features, ADQ and TX lag hashing. The issue with the current code is that TIRs and RQTs are unmanaged, different places all over the driver create, destroy, track and configure them, often in an uncoordinated way. The responsibilities of different units become vague, leading to a lot of hidden dependencies between numerous units and tight coupling between them, which is prone to bugs and hard to maintain. The result of this refactoring is: 1. Creating a manager for RX resources, that controls their lifecycle and provides a clear API, which restricts the set of actions that other units can do. 2. Using object-oriented approach for TIRs, RQTs and RX resource manager (struct mlx5e_rx_res). 3. Fixing a few bugs and misbehaviors found during the refactoring. 4. Reducing the amount of dependencies, removing hidden dependencies, making them one-directional and organizing the code in clear abstraction layers. 5. Explicitly exposing the remaining weird dependencies. 6. Simplifying and organizing code that creates and modifies TIRs and RQTs. Saeed Mahameed says: ==================== mlx5 updates 2021-07-24 This series provides some refactoring to mlx5e RX resource management, it is required for upcoming ADQ and TX lag hashing features. The first two patches in this series : net/mlx5e: Prohibit inner indir TIRs in IPoIB net/mlx5e: Block LRO if firmware asks for tunneled LRO Were supposed to go to net, but due to dependency and timing they were included here. I would appreciate it if you'd apply them to net and mark for -stable. For more information please see tag log below. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26net/mlx5e: Block LRO if firmware asks for tunneled LROMaxim Mikityanskiy1-1/+2
This commit does a cleanup in LRO configuration. LRO is a parameter of an RQ, but its state is changed by modifying a TIR related to the RQ. The current status: LRO for tunneled packets is not supported in the driver, inner TIRs may enable LRO on creation, but LRO status of inner TIRs isn't changed in mlx5e_modify_tirs_lro(). This is inconsistent, but as long as the firmware doesn't declare support for tunneled LRO, it works, because the same RQs are shared between the inner and outer TIRs. This commit does two fixes: 1. If the firmware has the tunneled LRO capability, LRO is blocked altogether, because it's not possible to block it for inner TIRs only, when the same RQs are shared between inner and outer TIRs, and the driver won't be able to handle tunneled LRO traffic. 2. mlx5e_modify_tirs_lro() is patched to modify LRO state for all TIRs, including inner ones, because all TIRs related to an RQ should agree on their LRO state. Fixes: 7b3722fa9ef6 ("net/mlx5e: Support RSS for GRE tunneled packets") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-25can: flexcan: add platform data headerAngelo Dureghello1-0/+23
Add platform data header for flexcan. Link: https://lore.kernel.org/r/20210702094841.327679-1-angelo@kernel-space.org Signed-off-by: Angelo Dureghello <angelo@kernel-space.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-07-25can: bittiming: fix documentation for struct can_tdcMarc Kleine-Budde1-2/+2
This patch fixes a typo in the documentation for struct can_tdc::tdcv. The number "0" refers to automatic mode not the letter "O". Further two grammar errors in the documentation for struct can_tdc are fixed. First grammar error: add a missing third person 's'. Second grammar error: replace "such as" by "such that". The intent is to give a condition, not an example. Fixes: 289ea9e4ae59 ("can: add new CAN FD bittiming parameters: Transmitter Delay Compensation (TDC)") Link: https://lore.kernel.org/r/20210616095922.2430415-1-mkl@pengutronix.de Link: https://lore.kernel.org/r/20210616124057.60723-1-mailhol.vincent@wanadoo.fr Co-developed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-07-25can: rx-offload: can_rx_offload_threaded_irq_finish(): add new function to ↵Marc Kleine-Budde1-0/+1
be called from threaded interrupt After reading all CAN frames from the controller in the IRQ handler and storing them into a skb_queue, the driver calls napi_schedule(). In the napi poll function the skb from the skb_queue are then pushed into the networking stack. However if napi_schedule() is called from a threaded IRQ handler this triggers the following error: | NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!! To avoid this, create a new rx-offload function (can_rx_offload_threaded_irq_finish()) with a call to local_bh_disable()/local_bh_enable() around the napi_schedule() call. Convert all drivers that call can_rx_offload_irq_finish() from threaded IRQ context to can_rx_offload_threaded_irq_finish(). Link: https://lore.kernel.org/r/20210724204745.736053-4-mkl@pengutronix.de Suggested-by: Daniel Glöckner <dg@emlix.com> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-07-25can: rx-offload: can_rx_offload_irq_finish(): directly call napi_schedule()Marc Kleine-Budde1-5/+0
Instead of calling can_rx_offload_schedule() call napi_schedule() directly. As this was the last use of can_rx_offload_schedule() remove this helper function. Link: https://lore.kernel.org/r/20210724204745.736053-3-mkl@pengutronix.de Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-07-25can: rx-offload: add skb queue for use during ISRMarc Kleine-Budde1-0/+2
Adding a skb to the skb_queue in rx-offload requires to take a lock. This commit avoids this by adding an unlocked skb queue that is appended at the end of the ISR. Having one lock at the end of the ISR should be OK as the HW is empty, not about to overflow. Link: https://lore.kernel.org/r/20210724204745.736053-2-mkl@pengutronix.de Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Co-developed-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-07-25nfc: constify nfc_digital_opsKrzysztof Kozlowski1-2/+2
Neither the core nor the drivers modify the passed pointer to struct nfc_digital_ops, so make it a pointer to const for correctness and safety. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-25nfc: constify nfc_hci_opsKrzysztof Kozlowski1-2/+2
Neither the core nor the drivers modify the passed pointer to struct nfc_hci_ops, so make it a pointer to const for correctness and safety. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-25nfc: constify nfc_opsKrzysztof Kozlowski1-2/+2
Neither the core nor the drivers modify the passed pointer to struct nfc_ops, so make it a pointer to const for correctness and safety. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-25nfc: constify pointer to nfc_vendor_cmdKrzysztof Kozlowski3-4/+4
Neither the core nor the drivers modify the passed pointer to struct nfc_vendor_cmd, so make it a pointer to const for correctness and safety. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-25nfc: constify nci_driver_ops (prop_ops and core_ops)Krzysztof Kozlowski1-2/+2
Neither the core nor the drivers modify the passed pointer to struct nci_driver_ops (consisting of function pointers), so make it a pointer to const for correctness and safety. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-25nfc: constify nci_opsKrzysztof Kozlowski1-2/+2
The struct nci_ops is modified by NFC core in only one case: nci_allocate_device() receives too many proprietary commands (prop_ops) to configure. This is a build time known constrain, so a graceful handling of such case is not necessary. Instead, fail the nci_allocate_device() and add BUILD_BUG_ON() to places which set these. This allows to constify the struct nci_ops (consisting of function pointers) for correctness and safety. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-25nfc: constify payload argument in nci_send_cmd()Krzysztof Kozlowski1-1/+1
The nci_send_cmd() payload argument is passed directly to skb_put_data() which already accepts a pointer to const, so make it const as well for correctness and safety. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23net: dsa: add support for bridge TX forwarding offloadVladimir Oltean1-0/+18
For a DSA switch, to offload the forwarding process of a bridge device means to send the packets coming from the software bridge as data plane packets. This is contrary to everything that DSA has done so far, because the current taggers only know to send control packets (ones that target a specific destination port), whereas data plane packets are supposed to be forwarded according to the FDB lookup, much like packets ingressing on any regular ingress port. If the FDB lookup process returns multiple destination ports (flooding, multicast), then replication is also handled by the switch hardware - the bridge only sends a single packet and avoids the skb_clone(). DSA keeps for each bridge port a zero-based index (the number of the bridge). Multiple ports performing TX forwarding offload to the same bridge have the same dp->bridge_num value, and ports not offloading the TX data plane of a bridge have dp->bridge_num = -1. The tagger can check if the packet that is being transmitted on has skb->offload_fwd_mark = true or not. If it does, it can be sure that the packet belongs to the data plane of a bridge, further information about which can be obtained based on dp->bridge_dev and dp->bridge_num. It can then compose a DSA tag for injecting a data plane packet into that bridge number. For the switch driver side, we offer two new dsa_switch_ops methods, called .port_bridge_fwd_offload_{add,del}, which are modeled after .port_bridge_{join,leave}. These methods are provided in case the driver needs to configure the hardware to treat packets coming from that bridge software interface as data plane packets. The switchdev <-> bridge interaction happens during the netdev_master_upper_dev_link() call, so to switch drivers, the effect is that the .port_bridge_fwd_offload_add() method is called immediately after .port_bridge_join(). If the bridge number exceeds the number of bridges for which the switch driver can offload the TX data plane (and this includes the case where the driver can offload none), DSA falls back to simply returning tx_fwd_offload = false in the switchdev_bridge_port_offload() call. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23net: dsa: track the number of switches in a treeVladimir Oltean1-0/+3
In preparation of supporting data plane forwarding on behalf of a software bridge, some drivers might need to view bridges as virtual switches behind the CPU port in a cross-chip topology. Give them some help and let them know how many physical switches there are in the tree, so that they can count the virtual switches starting from that number on. Note that the first dsa_switch_ops method where this information is reliably available is .setup(). This is because of how DSA works: in a tree with 3 switches, each calling dsa_register_switch(), the first 2 will advance until dsa_tree_setup() -> dsa_tree_setup_routing_table() and exit with error code 0 because the topology is not complete. Since probing is parallel at this point, one switch does not know about the existence of the other. Then the third switch comes, and for it, dsa_tree_setup_routing_table() returns complete = true. This switch goes ahead and calls dsa_tree_setup_switches() for everybody else, calling their .setup() methods too. This acts as the synchronization point. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23net: bridge: switchdev: allow the TX data plane forwarding to be offloadedTobias Waldekranz1-0/+3
Allow switchdevs to forward frames from the CPU in accordance with the bridge configuration in the same way as is done between bridge ports. This means that the bridge will only send a single skb towards one of the ports under the switchdev's control, and expects the driver to deliver the packet to all eligible ports in its domain. Primarily this improves the performance of multicast flows with multiple subscribers, as it allows the hardware to perform the frame replication. The basic flow between the driver and the bridge is as follows: - When joining a bridge port, the switchdev driver calls switchdev_bridge_port_offload() with tx_fwd_offload = true. - The bridge sends offloadable skbs to one of the ports under the switchdev's control using skb->offload_fwd_mark = true. - The switchdev driver checks the skb->offload_fwd_mark field and lets its FDB lookup select the destination port mask for this packet. v1->v2: - convert br_input_skb_cb::fwd_hwdoms to a plain unsigned long - introduce a static key "br_switchdev_fwd_offload_used" to minimize the impact of the newly introduced feature on all the setups which don't have hardware that can make use of it - introduce a check for nbp->flags & BR_FWD_OFFLOAD to optimize cache line access - reorder nbp_switchdev_frame_mark_accel() and br_handle_vlan() in __br_forward() - do not strip VLAN on egress if forwarding offload on VLAN-aware bridge is being used - propagate errors from .ndo_dfwd_add_station() if not EOPNOTSUPP v2->v3: - replace the solution based on .ndo_dfwd_add_station with a solution based on switchdev_bridge_port_offload - rename BR_FWD_OFFLOAD to BR_TX_FWD_OFFLOAD v3->v4: rebase v4->v5: - make sure the static key is decremented on bridge port unoffload - more function and variable renaming and comments for them: br_switchdev_fwd_offload_used to br_switchdev_tx_fwd_offload br_switchdev_accels_skb to br_switchdev_frame_uses_tx_fwd_offload nbp_switchdev_frame_mark_tx_fwd to nbp_switchdev_frame_mark_tx_fwd_to_hwdom nbp_switchdev_frame_mark_accel to nbp_switchdev_frame_mark_tx_fwd_offload fwd_accel to tx_fwd_offload Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller13-97/+305
Conflicts are simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23net: socket: rework compat_ifreq_ioctl()Arnd Bergmann1-0/+2
compat_ifreq_ioctl() is one of the last users of copy_in_user() and compat_alloc_user_space(), as it attempts to convert the 'struct ifreq' arguments from 32-bit to 64-bit format as used by dev_ioctl() and a couple of socket family specific interpretations. The current implementation works correctly when calling dev_ioctl(), inet_ioctl(), ieee802154_sock_ioctl(), atalk_ioctl(), qrtr_ioctl() and packet_ioctl(). The ioctl handlers for x25, netrom, rose and x25 do not interpret the arguments and only block the corresponding commands, so they do not care. For af_inet6 and af_decnet however, the compat conversion is slightly incorrect, as it will copy more data than the native handler accesses, both of them use a structure that is shorter than ifreq. Replace the copy_in_user() conversion with a pair of accessor functions to read and write the ifreq data in place with the correct length where needed, while leaving the other ones to copy the (already compatible) structures directly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23net: socket: simplify dev_ifconf handlingArnd Bergmann1-1/+1
The dev_ifconf() calling conventions make compat handling more complicated than necessary, simplify this by moving the in_compat_syscall() check into the function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23net: socket: remove register_gifconfArnd Bergmann2-8/+9
Since dynamic registration of the gifconf() helper is only used for IPv4, and this can not be in a loadable module, this can be simplified noticeably by turning it into a direct function call as a preparation for cleaning up the compat handling. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>