summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2021-08-20net: mscc: ocelot: transmit the VLAN filtering restrictions via extackVladimir Oltean1-1/+2
We need to transmit more restrictions in future patches, convert this one to netlink extack. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20net: mscc: ocelot: transmit the "native VLAN" error via extackVladimir Oltean1-1/+1
We need to reject some more configurations in future patches, convert the existing one to netlink extack. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20Merge tag 'mlx5-updates-2021-08-19' of ↵David S. Miller1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-08-19 This series introduces the support for two new mlx5 features: 1) Sample offload for tunneled traffic 2) devlink rate objects support 1) From Chris Mi: Sample offload for tunneled traffic ===================================================== Background and solution ----------------------- Currently the sample offload actions send the encapsulated packet to software. This series de-capsulates the packet before performing the sampling and set the tunnel properties on the skb metadata fields to make the behavior consistent with OVS sFlow. If de-capsulating first, we can't use the same match like before in default table. So instantiate a post action instance to continue processing the action list. If HW can preserve reg_c, also use the post action instance. Post action infrastructure -------------------------- Some tc actions are modeled in hardware using multiple tables causing a tc action list split. For example, CT action is modeled by jumping to a ct table which is controlled by nf flow table. sFlow jumps in hardware to a sample table, which continues to a "default table" where it should continue processing the action list. Multi table actions are modeled in hardware using a unique fte_id. The fte_id is set before jumping to a table. Split actions continue to a post-action table where the matched fte_id value continues the execution the tc action list. This series also introduces post action infrastructure. Both ct and sample use it. Sample for tunnel in TC SW -------------------------- tc filter add dev vxlan1 protocol ip parent ffff: prio 3 \ flower src_mac 24:25:d0:e1:00:00 dst_mac 02:25:d0:13:01:02 \ enc_src_ip 192.168.1.14 enc_dst_ip 192.168.1.13 \ enc_dst_port 4789 enc_key_id 4 \ action sample rate 1 group 6 \ action tunnel_key unset \ action mirred egress redirect dev enp4s0f0_1 MLX5 sample HW offload ---------------------- For the following typical flow table: +-------------------------------+ + original flow table + +-------------------------------+ + original match + +-------------------------------+ + sample action + other actions + +-------------------------------+ We translate the tc filter with sample action to the following HW model: +---------------------+ + original flow table + +---------------------+ + original match + +---------------------+ | set fte_id (if reg_c preserve cap) | do decap v +------------------------------------------------+ + Flow Sampler Object + +------------------------------------------------+ + sample ratio + +------------------------------------------------+ + sample table id | default table id + +------------------------------------------------+ | | v v +-----------------------------+ +-------------------+ + sample table + + default table + +-----------------------------+ +-------------------+ + forward to management vport + | +-----------------------------+ | +-------+------+ | |reg_c preserve cap | |or decap action v v +-----------------+ +-------------+ + per vport table + + post action + +-----------------+ +-------------+ + original match + +-----------------+ + other actions + +-----------------+ 2) From Dmytro Linkin: devlink rate object support for mlx5_core driver ======================================================================= HIGH-LEVEL OVERVIEW Devlink leaf rate objects created per vport (VF/SF, and PF on BlueField) in switchdev mode on devlink port registration. Implement devlink ops callbacks to create/destroy rate groups, set TX rate values of the vport/group, assign vport to the group. Driver accepts TX rate values as fraction of 1Mbps. Refactor existing eswitch QoS infrastructure to be accessible by legacy NDO rate API and new devlink rate API. NDO rate API is not removed/disabled in switchdev mode to not break existing users. Rate values configured with NDO rate API are not visible for devlink infrastructure, therefore APIs should not be used simultaneously. IMPLEMENTATION DETAILS Driver provide two level rate hierarchy to manage bandwidth - group level and vport level. Initially each vport added to internal unlimited group created by default. Each rate element (vport or group) receive bandwidth relative to its parent element (for groups the parent is a physical link itself) in a Round Robin manner, where element get bandwidth value according to its weight. Example: Created four rate groups with tx_share limits: $ devlink port function rate add \ pci/0000:06:00.0/group_1 tx_share 30gbit $ devlink port function rate add \ pci/0000:06:00.0/group_2 tx_share 20gbit $ devlink port function rate add \ pci/0000:06:00.0/group_3 tx_share 20gbit $ devlink port function rate add \ pci/0000:06:00.0/group_4 tx_share 10gbit Weights created in HW for each group are relative to the bigest tx_share value, which is 30gbit: <group_1> 1.0 <group_2> 0.67 <group_3> 0.67 <group_4> 0.33 Assuming link speed is 50 Gbit/sec and each group can sustain such amount of traffic, maximum bandwidth is 50 / (1.0 + 0.67 + 0.67 + 0.33) = ~18.75 Gbit/sec. Normilized bandwidth values for groups: <group_1> 18.75 * 1.0 = 18.75 Gbit/sec <group_2> 18.75 * 0.67 = 12.5 Gbit/sec <group_3> 18.75 * 0.67 = 12.5 Gbit/sec <group_4> 18.75 * 0.33 = 6.25 Gbit/sec If in example above group_1 doesn't produce any traffic, then maximum bandwidth becomes 50 / (0.67 + 0.67 + 0.33) = ~30.0 Gbit/sec. Normalized values: <group_2> 30.0 * 0.67 = 20.0 Gbit/sec <group_3> 30.0 * 0.67 = 20.0 Gbit/sec <group_4> 30.0 * 0.33 = 10.0 Gbit/sec Same normalization applied to each vport in the group. Normalized values are internal, therefore driver provides QoS tracepoints for next events: * vport rate element creation/deletion: * vport rate element configuration; * group rate element creation/deletion; * group rate element configuration. PATCHES OVERVIEW 1 - Moving and isolation of eswitch QoS logic in separate file; 2 - Implement devlink leaf rate object support for vports; 3 - Implement rate groups creation/deletion; 4 - Implement TX rate management for the groups; 5 - Implement parent set for vports; 6 - Eswitch QoS tracepoints. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20Merge tag 'for-net-next-2021-08-19' of ↵David S. Miller1-2/+19
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for Foxconn Mediatek Chip - Add support for LG LGSBWAC92/TWCM-K505D - hci_h5 flow control fixes and suspend support - Switch to use lock_sock for SCO and RFCOMM - Various fixes for extended advertising - Reword Intel's setup on btusb unifying the supported generations ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19net/mlx5: E-switch, Introduce rate limiting groups APIDmytro Linkin1-1/+2
Extend eswitch API with rate limiting groups: - Define new struct mlx5_esw_rate_group that is used to hold all internal group data. - Implement functions that allow creation, destruction and cleanup of groups. - Assign all vports to internal unlimited zero group by default. This commit lays the groundwork for group rate limiting by implementing devlink_ops->rate_node_{new|del}() callbacks to support creating and deleting groups through devlink rate node objects. APIs that allows setting rates and adding/removing members are implemented in following patches. Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Huy Nguyen <huyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski9-12/+31
drivers/ptp/Kconfig: 55c8fca1dae1 ("ptp_pch: Restore dependency on PCI") e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19Merge tag 'net-5.14-rc7' of ↵Linus Torvalds1-7/+5
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from bpf, wireless and mac80211 trees. Current release - regressions: - tipc: call tipc_wait_for_connect only when dlen is not 0 - mac80211: fix locking in ieee80211_restart_work() Current release - new code bugs: - bpf: add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id() - ethernet: ice: fix perout start time rounding - wwan: iosm: prevent underflow in ipc_chnl_cfg_get() Previous releases - regressions: - bpf: clear zext_dst of dead insns - sch_cake: fix srchost/dsthost hashing mode - vrf: reset skb conntrack connection on VRF rcv - net/rds: dma_map_sg is entitled to merge entries Previous releases - always broken: - ethernet: bnxt: fix Tx path locking and races, add Rx path barriers" * tag 'net-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits) net: dpaa2-switch: disable the control interface on error path Revert "flow_offload: action should not be NULL when it is referenced" iavf: Fix ping is lost after untrusted VF had tried to change MAC i40e: Fix ATR queue selection r8152: fix the maximum number of PLA bp for RTL8153C r8152: fix writing USB_BP2_EN mptcp: full fully established support after ADD_ADDR mptcp: fix memory leak on address flush net/rds: dma_map_sg is entitled to merge entries net: mscc: ocelot: allow forwarding from bridge ports to the tag_8021q CPU port net: asix: fix uninit value bugs ovs: clear skb->tstamp in forwarding path net: mdio-mux: Handle -EPROBE_DEFER correctly net: mdio-mux: Don't ignore memory allocation errors net: mdio-mux: Delete unnecessary devm_kfree net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse sch_cake: fix srchost/dsthost hashing mode ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error path net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32 mac80211: fix locking in ieee80211_restart_work() ...
2021-08-19Merge tag 'linux-can-next-for-5.15-20210819' of ↵Jakub Kicinski1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== linux-can-next-for-5.15-20210819 The first patch is by me, for the mailmap file and maps the email address of two former ESD employees to a newly created role account. The next 3 patches are by Oleksij Rempel and add support for GPIO based switchable CAN bus termination. The next 3 patches are by Vincent Mailhol. The first one changes the CAN netlink interface to not bail out if the user switched off unsupported features. The next one adds Vincent as the maintainer of the etas_es58x driver and the last one cleans up the documentation of struct es58x_fd_tx_conf_msg. The next patch is by me, for the mcp251xfd driver and marks some instances of struct mcp251xfd_priv as const. Lad Prabhakar contributes 2 patches for the rcar_canfd driver, that add support for RZ/G2L family. The next 5 patches target the m_can/tcan45x5 driver. 2 are by me an fix trivial checkpatch warnings. The remaining 3 patches are by Matt Kline and improve the performance on the SPI based tcan4x5x chip by batching FIFO reads and writes. The last 7 patches are for the c_can driver. Dario Binacchi's patch converts the DT bindings to yaml, 2 patches by me fix a typo and rename a macro to properly represent the usage. The last 4 patches are again by Dario Binacchi and provide a performance improvement for the TX path by operating the TX mailboxes as a true FIFO. * tag 'linux-can-next-for-5.15-20210819' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (22 commits) can: c_can: cache frames to operate as a true FIFO can: c_can: support tx ring algorithm can: c_can: exit c_can_do_tx() early if no frames have been sent can: c_can: remove struct c_can_priv::priv field can: c_can: rename IF_RX -> IF_NAPI can: c_can: c_can_do_tx(): fix typo in comment dt-bindings: net: can: c_can: convert to json-schema can: m_can: Batch FIFO writes during CAN transmit can: m_can: Batch FIFO reads during CAN receive can: m_can: Disable IRQs on FIFO bus errors can: m_can: fix block comment style can: tcan4x5x: cdev_to_priv(): remove stray empty line can: rcar_canfd: Add support for RZ/G2L family dt-bindings: net: can: renesas,rcar-canfd: Document RZ/G2L SoC can: mcp251xfd: mark some instances of struct mcp251xfd_priv as const can: etas_es58x: clean-up documentation of struct es58x_fd_tx_conf_msg MAINTAINERS: add Vincent MAILHOL as maintainer for the ETAS ES58X CAN/USB driver can: netlink: allow user to turn off unsupported features can: dev: provide optional GPIO based termination support dt-bindings: can: fsl,flexcan: enable termination-* bindings ... ==================== Link: https://lore.kernel.org/r/20210819133913.657715-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19Revert "flow_offload: action should not be NULL when it is referenced"Ido Schimmel1-7/+5
This reverts commit 9ea3e52c5bc8bb4a084938dc1e3160643438927a. Cited commit added a check to make sure 'action' is not NULL, but 'action' is already dereferenced before the check, when calling flow_offload_has_one_action(). Therefore, the check does not make any sense and results in a smatch warning: include/net/flow_offload.h:322 flow_action_mixed_hw_stats_check() warn: variable dereferenced before check 'action' (see line 319) Fix by reverting this commit. Cc: gushengxian <gushengxian@yulong.com> Fixes: 9ea3e52c5bc8 ("flow_offload: action should not be NULL when it is referenced") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20210819105842.1315705-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19can: dev: provide optional GPIO based termination supportOleksij Rempel1-0/+8
For CAN buses to work, a termination resistor has to be present at both ends of the bus. This resistor is usually 120 Ohms, other values may be required for special bus topologies. This patch adds support for a generic GPIO based CAN termination. The resistor value has to be specified via device tree, and it can only be attached to or detached from the bus. By default the termination is not active. Link: https://lore.kernel.org/r/20210818071232.20585-4-o.rempel@pengutronix.de Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-08-19net: Fix offloading indirect devices dependency on qdisc order creationEli Cohen1-0/+1
Currently, when creating an ingress qdisc on an indirect device before the driver registered for callbacks, the driver will not have a chance to register its filter configuration callbacks. To fix that, modify the code such that it keeps track of all the ingress qdiscs that call flow_indr_dev_setup_offload(). When a driver calls flow_indr_dev_register(), go through the list of tracked ingress qdiscs and call the driver callback entry point so as to give it a chance to register its callback. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19net: mii: make mii_ethtool_gset() return voidPavel Skripkin1-1/+1
mii_ethtool_gset() does not return any errors. Since there are no users of this function that rely on its return value, it can be made void. Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18pipe: avoid unnecessary EPOLLET wakeups under normal loadsLinus Torvalds1-0/+2
I had forgotten just how sensitive hackbench is to extra pipe wakeups, and commit 3a34b13a88ca ("pipe: make pipe writes always wake up readers") ended up causing a quite noticeable regression on larger machines. Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's not clear that this matters in real life all that much, but as Mel points out, it's used often enough when comparing kernels and so the performance regression shows up like a sore thumb. It's easy enough to fix at least for the common cases where pipes are used purely for data transfer, and you never have any exciting poll usage at all. So set a special 'poll_usage' flag when there is polling activity, and make the ugly "EPOLLET has crazy legacy expectations" semantics explicit to only that case. I would love to limit it to just the broken EPOLLET case, but the pipe code can't see the difference between epoll and regular select/poll, so any non-read/write waiting will trigger the extra wakeup behavior. That is sufficient for at least the hackbench case. Apart from making the odd extra wakeup cases more explicitly about EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe write, not at the first write chunk. That is actually much saner semantics (as much as you can call any of the legacy edge-triggered expectations for EPOLLET "sane") since it means that you know the wakeup will happen once the write is done, rather than possibly in the middle of one. [ For stable people: I'm putting a "Fixes" tag on this, but I leave it up to you to decide whether you actually want to backport it or not. It likely has no impact outside of synthetic benchmarks - Linus ] Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/ Fixes: 3a34b13a88ca ("pipe: make pipe writes always wake up readers") Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Sandeep Patil <sspatil@android.com> Tested-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-18net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()Wei Wang2-1/+7
Add gfp_t mask as an input parameter to mem_cgroup_charge_skmem(), to give more control to the networking stack and enable it to change memcg charging behavior. In the future, the networking stack may decide to avoid oom-kills when fallbacks are more appropriate. One behavior change in mem_cgroup_charge_skmem() by this patch is to avoid force charging by default and let the caller decide when and if force charging is needed through the presence or absence of __GFP_NOFAIL. Signed-off-by: Wei Wang <weiwan@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18net: dsa: tag_sja1105: be dsa_loop-safeVladimir Oltean1-0/+18
Add support for tag_sja1105 running on non-sja1105 DSA ports, by making sure that every time we dereference dp->priv, we check the switch's dsa_switch_ops (otherwise we access a struct sja1105_port structure that is in fact something else). This adds an unconditional build-time dependency between sja1105 being built as module => tag_sja1105 must also be built as module. This was there only for PTP before. Some sane defaults must also take place when not running on sja1105 hardware. These are: - sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols depending on VLAN awareness and switch revision (when an encapsulated VLAN must be sent). Default to 0x8100. - sja1105_rcv_meta_state_machine: this aggregates PTP frames with their metadata timestamp frames. When running on non-sja1105 hardware, don't do that and accept all frames unmodified. - sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c which writes a management route over SPI. When not running on sja1105 hardware, bypass the SPI write and send the frame as-is. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18mptcp: remote addresses fullmeshGeliang Tang1-0/+1
This patch added and managed a new per endpoint flag, named MPTCP_PM_ADDR_FLAG_FULLMESH. In mptcp_pm_create_subflow_or_signal_addr(), if such flag is set, instead of: remote_address((struct sock_common *)sk, &remote); fill a temporary allocated array of all known remote address. After releaseing the pm lock loop on such array and create a subflow for each remote address from the given local. Note that the we could still use an array even for non 'fullmesh' endpoint: with a single entry corresponding to the primary MPC subflow remote address. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16ethtool: add two link extended substates of bad signal integrityGuangbin Huang1-0/+2
ETHTOOL_LINK_EXT_SUBSTATE_BSI_SERDES_REFERENCE_CLOCK_LOST means the input external clock signal for SerDes is too weak or lost. ETHTOOL_LINK_EXT_SUBSTATE_BSI_SERDES_ALOS means the received signal for SerDes is too weak because analog loss of signal. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-16Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds4-4/+19
Pull virtio fixes from Michael Tsirkin: "Fixes in virtio, vhost, and vdpa drivers" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa/mlx5: Fix queue type selection logic vdpa/mlx5: Avoid destroying MR on empty iotlb tools/virtio: fix build virtio_ring: pull in spinlock header vringh: pull in spinlock header virtio-blk: Add validation for block size in config space vringh: Use wiov->used to check for read/write desc order virtio_vdpa: reject invalid vq indices vdpa: Add documentation for vdpa_alloc_device() macro vDPA/ifcvf: Fix return value check for vdpa_alloc_device() vp_vdpa: Fix return value check for vdpa_alloc_device() vdpa_sim: Fix return value check for vdpa_alloc_device() vhost: Fix the calculation in vhost_overflow() vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update() virtio_pci: Support surprise removal of virtio pci device virtio: Protect vqs list access virtio: Keep vring_del_virtqueue() mirror of VQ create virtio: Improve vq->broken access to avoid any compiler optimization
2021-08-16Bluetooth: Store advertising handle so it can be re-enabledLuiz Augusto von Dentz1-0/+1
This stores the advertising handle/instance into hci_conn so it is accessible when re-enabling the advertising once disconnected. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-08-16net: mscc: ocelot: convert to phylinkVladimir Oltean1-4/+15
The felix DSA driver, which is a wrapper over the same hardware class as ocelot, is integrated with phylink, but ocelot is using the plain PHY library. It makes sense to bring together the two implementations, which is what this patch achieves. This is a large patch and hard to break up, but it does the following: The existing ocelot_adjust_link writes some registers, and felix_phylink_mac_link_up writes some registers, some of them are common, but both functions write to some registers to which the other doesn't. The main reasons for this are: - Felix switches so far have used an NXP PCS so they had no need to write the PCS1G registers that ocelot_adjust_link writes - Felix switches have the MAC fixed at 1G, so some of the MAC speed changes actually break the link and must be avoided. The naming conventions for the functions introduced in this patch are: - vsc7514_phylink_{mac_config,validate} are specific to the Ocelot instantiations and placed in ocelot_net.c which is built only for the ocelot switchdev driver. - ocelot_phylink_mac_link_{up,down} are shared between the ocelot switchdev driver and the felix DSA driver (they are put in the common lib). One by one, the registers written by ocelot_adjust_link are: DEV_MAC_MODE_CFG - felix_phylink_mac_link_up had no need to write this register since its out-of-reset value was fine and did not need changing. The write is moved to the common ocelot_phylink_mac_link_up and on felix it is guarded by a quirk bit that makes the written value identical with the out-of-reset one DEV_PORT_MISC - runtime invariant, was moved to vsc7514_phylink_mac_config PCS1G_MODE_CFG - same as above PCS1G_SD_CFG - same as above PCS1G_CFG - same as above PCS1G_ANEG_CFG - same as above PCS1G_LB_CFG - same as above DEV_MAC_ENA_CFG - both ocelot_adjust_link and ocelot_port_disable touched this. felix_phylink_mac_link_{up,down} also do. We go with what felix does and put it in ocelot_phylink_mac_link_up. DEV_CLOCK_CFG - ocelot_adjust_link and felix_phylink_mac_link_up both write this, but to different values. Move to the common ocelot_phylink_mac_link_up and make sure via the quirk that the old values are preserved for both. ANA_PFC_PFC_CFG - ocelot_adjust_link wrote this, felix_phylink_mac_link_up did not. Runtime invariant, speed does not matter since PFC is disabled via the RX_PFC_ENA bits which are cleared. Move to vsc7514_phylink_mac_config. QSYS_SWITCH_PORT_MODE_PORT_ENA - both ocelot_adjust_link and felix_phylink_mac_link_{up,down} wrote this. Ocelot also wrote this register from ocelot_port_disable. Keep what felix did, move in ocelot_phylink_mac_link_{up,down} and delete ocelot_port_disable. ANA_POL_FLOWC - same as above SYS_MAC_FC_CFG - same as above, except slight behavior change. Whereas ocelot always enabled RX and TX flow control, felix listened to phylink (for the most part, at least - see the 2500base-X comment). The registers which only felix_phylink_mac_link_up wrote are: SYS_PAUSE_CFG_PAUSE_ENA - this is why I am not sure that flow control worked on ocelot. Not it should, since the code is shared with felix where it does. ANA_PORT_PORT_CFG - this is a Frame Analyzer block register, phylink should be the one touching them, deleted. Other changes: - The old phylib registration code was in mscc_ocelot_init_ports. It is hard to work with 2 levels of indentation already in, and with hard to follow teardown logic. The new phylink registration code was moved inside ocelot_probe_port(), right between alloc_etherdev() and register_netdev(). It could not be done before (=> outside of) ocelot_probe_port() because ocelot_probe_port() allocates the struct ocelot_port which we then use to assign ocelot_port->phy_mode to. It is more preferable to me to have all PHY handling logic inside the same function. - On the same topic: struct ocelot_port_private :: serdes is only used in ocelot_port_open to set the SERDES protocol to Ethernet. This is logically a runtime invariant and can be done just once, when the port registers with phylink. We therefore don't even need to keep the serdes reference inside struct ocelot_port_private, or to use the devm variant of of_phy_get(). - Phylink needs a valid phy-mode for phylink_create() to succeed, and the existing device tree bindings in arch/mips/boot/dts/mscc/ocelot_pcb120.dts don't define one for the internal PHY ports. So we patch PHY_INTERFACE_MODE_NA into PHY_INTERFACE_MODE_INTERNAL. - There was a strategically placed: switch (priv->phy_mode) { case PHY_INTERFACE_MODE_NA: continue; which made the code skip the serdes initialization for the internal PHY ports. Frankly that is not all that obvious, so now we explicitly initialize the serdes under an "if" condition and not rely on code jumps, so everything is clearer. - There was a write of OCELOT_SPEED_1000 to DEV_CLOCK_CFG for QSGMII ports. Since that is in fact the default value for the register field DEV_CLOCK_CFG_LINK_SPEED, I can only guess the intention was to clear the adjacent fields, MAC_TX_RST and MAC_RX_RST, aka take the port out of reset, which does match the comment. I don't even want to know why this code is placed there, but if there is indeed an issue that all ports that share a QSGMII lane must all be up, then this logic is already buggy, since mscc_ocelot_init_ports iterates using for_each_available_child_of_node, so nobody prevents the user from putting a 'status = "disabled";' for some QSGMII ports which would break the driver's assumption. In any case, in the eventuality that I'm right, we would have yet another issue if ocelot_phylink_mac_link_down would reset those ports and that would be forbidden, so since the ocelot_adjust_link logic did not do that (maybe for a reason), add another quirk to preserve the old logic. The ocelot driver teardown goes through all ports in one fell swoop. When initialization of one port fails, the ocelot->ports[port] pointer for that is reset to NULL, and teardown is done only for non-NULL ports, so there is no reason to do partial teardowns, let the central mscc_ocelot_release_ports() do its job. Tested bind, unbind, rebind, link up, link down, speed change on mock-up hardware (modified the driver to probe on Felix VSC9959). Also regression tested the felix DSA driver. Could not test the Ocelot specific bits (PCS1G, SERDES, device tree bindings). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16net: dsa: felix: stop calling ocelot_port_{enable,disable}Vladimir Oltean1-2/+0
ocelot_port_enable touches ANA_PORT_PORT_CFG, which has the following fields: - LOCKED_PORTMOVE_CPU, LEARNDROP, LEARNCPU, LEARNAUTO, RECV_ENA, all of which are written with their hardware default values, also runtime invariants. So it makes no sense to write these during every .ndo_open. - PORTID_VAL: this field has an out-of-reset value of zero for all ports and must be initialized by software. Additionally, the ocelot_setup_logical_port_ids() code path sets up different logical port IDs for the ports in a hardware LAG, and we absolutely don't want .ndo_open to interfere there and reset those values. So in fact the write from ocelot_port_enable can better be moved to ocelot_init_port, and the .ndo_open hook deleted. ocelot_port_disable touches DEV_MAC_ENA_CFG and QSYS_SWITCH_PORT_MODE_PORT_ENA, in an attempt to undo what ocelot_adjust_link did. But since .ndo_stop does not get called each time the link falls (i.e. this isn't a substitute for .phylink_mac_link_down), felix already does better at this by writing those registers already in felix_phylink_mac_link_down. So keep ocelot_port_disable (for now, until ocelot is converted to phylink too), and just delete the felix call to it, which is not necessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-15Merge tag 'irq-urgent-2021-08-15' of ↵Linus Torvalds3-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for PCI/MSI and x86 interrupt startup: - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked entries stay around e.g. when a crashkernel is booted. - Enforce masking of a MSI-X table entry when updating it, which mandatory according to speification - Ensure that writes to MSI[-X} tables are flushed. - Prevent invalid bits being set in the MSI mask register - Properly serialize modifications to the mask cache and the mask register for multi-MSI. - Cure the violation of the affinity setting rules on X86 during interrupt startup which can cause lost and stale interrupts. Move the initial affinity setting ahead of actualy enabling the interrupt. - Ensure that MSI interrupts are completely torn down before freeing them in the error handling case. - Prevent an array out of bounds access in the irq timings code" * tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: driver core: Add missing kernel doc for device::msi_lock genirq/msi: Ensure deactivation on teardown genirq/timings: Prevent potential array overflow in __irq_timings_store() x86/msi: Force affinity setup before startup x86/ioapic: Force affinity setup before startup genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP PCI/MSI: Protect msi_desc::masked for multi-MSI PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown() PCI/MSI: Correct misleading comments PCI/MSI: Do not set invalid bits in MSI mask PCI/MSI: Enforce MSI[X] entry updates to be visible PCI/MSI: Enforce that MSI-X table entry is masked for update PCI/MSI: Mask all unused MSI-X entries PCI/MSI: Enable and mask MSI-X early
2021-08-14net: Remove net/ipx.h and uapi/linux/ipx.h header filesCai Huoqing2-258/+0
commit <47595e32869f> ("<MAINTAINERS: Mark some staging directories>") indicated the ipx network layer as obsolete in Jan 2018, updated in the MAINTAINERS file now, after being exposed for 3 years to refactoring, so to delete uapi/linux/ipx.h and net/ipx.h header files for good. additionally, there is no module that depends on ipx.h except a broken staging driver(r8188eu) Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14net: bridge: vlan: dump mcast ctx querier stateNikolay Aleksandrov1-0/+1
Use the new mcast querier state dump infrastructure and export vlans' mcast context querier state embedded in attribute BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_STATE. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14net: bridge: mcast: dump ipv6 querier stateNikolay Aleksandrov1-0/+3
Add support for dumping global IPv6 querier state, we dump the state only if our own querier is enabled or there has been another external querier which has won the election. For the bridge global state we use a new attribute IFLA_BR_MCAST_QUERIER_STATE and embed the state inside. The structure is: [IFLA_BR_MCAST_QUERIER_STATE] `[BRIDGE_QUERIER_IPV6_ADDRESS] - ip address of the querier `[BRIDGE_QUERIER_IPV6_PORT] - bridge port ifindex where the querier was seen (set only if external querier) `[BRIDGE_QUERIER_IPV6_OTHER_TIMER] - other querier timeout IPv4 and IPv6 attributes are embedded at the same level of IFLA_BR_MCAST_QUERIER_STATE. If we didn't dump anything we cancel the nest and return. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14net: bridge: mcast: dump ipv4 querier stateNikolay Aleksandrov2-0/+11
Add support for dumping global IPv4 querier state, we dump the state only if our own querier is enabled or there has been another external querier which has won the election. For the bridge global state we use a new attribute IFLA_BR_MCAST_QUERIER_STATE and embed the state inside. The structure is: [IFLA_BR_MCAST_QUERIER_STATE] `[BRIDGE_QUERIER_IP_ADDRESS] - ip address of the querier `[BRIDGE_QUERIER_IP_PORT] - bridge port ifindex where the querier was seen (set only if external querier) `[BRIDGE_QUERIER_IP_OTHER_TIMER] - other querier timeout Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14devlink: Use xarray to store devlink instancesLeon Romanovsky1-1/+1
We can use xarray instead of linearly organized linked lists for the devlink instances. This will let us revise the locking scheme in favour of internal xarray locking that protects database. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14devlink: Count struct devlink consumersLeon Romanovsky1-0/+2
The struct devlink itself is protected by internal lock and doesn't need global lock during operation. That global lock is used to protect addition/removal new devlink instances from the global list in use by all devlink consumers in the system. The future conversion of linked list to be xarray will allow us to actually delete that lock, but first we need to count all struct devlink users. The reference counting provides us a way to ensure that no new user space commands success to grab devlink instance which is going to be destroyed makes it is safe to access it without lock. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-13ethernet: fix PTP_1588_CLOCK dependenciesArnd Bergmann1-20/+28
The 'imply' keyword does not do what most people think it does, it only politely asks Kconfig to turn on another symbol, but does not prevent it from being disabled manually or built as a loadable module when the user is built-in. In the ICE driver, the latter now causes a link failure: aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl': ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config' ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config' aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config' ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config' aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset': ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release' ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release' aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild': This is a recurring problem in many drivers, and we have discussed it several times befores, without reaching a consensus. I'm providing a link to the previous email thread for reference, which discusses some related problems. To solve the dependency issue better than the 'imply' keyword, introduce a separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver can depend on if it is able to use PTP support when available, but works fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are then prevented from being built-in, the same way as with a 'depends on PTP_1588_CLOCK || !PTP_1588_CLOCK' dependency that does the same trick, but that can be rather confusing when you first see it. Since this should cover the dependencies correctly, the IS_REACHABLE() hack in the header is no longer needed now, and can be turned back into a normal IS_ENABLED() check. Any driver that gets the dependency wrong will now cause a link time failure rather than being unable to use PTP support when that is in a loadable module. However, the two recently added ptp_get_vclocks_index() and ptp_convert_timestamp() interfaces are only called from builtin code with ethtool and socket timestamps, so keep the current behavior by stubbing those out completely when PTP is in a loadable module. This should be addressed properly in a follow-up. As Richard suggested, we may want to actually turn PTP support into a 'bool' option later on, preventing it from being a loadable module altogether, which would be one way to solve the problem with the ethtool interface. Fixes: 06c16d89d2cb ("ice: register 1588 PTP clock device object for E810 devices") Link: https://lore.kernel.org/netdev/20210804121318.337276-1-arnd@kernel.org/ Link: https://lore.kernel.org/netdev/CAK8P3a06enZOf=XyZ+zcAwBczv41UuCTz+=0FMf2gBz1_cOnZQ@mail.gmail.com/ Link: https://lore.kernel.org/netdev/CAK8P3a3=eOxE-K25754+fB_-i_0BZzf9a9RfPTX3ppSwu9WZXw@mail.gmail.com/ Link: https://lore.kernel.org/netdev/20210726084540.3282344-1-arnd@kernel.org/ Acked-by: Shannon Nelson <snelson@pensando.io> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210812183509.1362782-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13net: in_irq() cleanupChangbin Du1-1/+1
Replace the obsolete and ambiguos macro in_irq() with new macro in_hardirq(). Signed-off-by: Changbin Du <changbin.du@gmail.com> Link: https://lore.kernel.org/r/20210813145749.86512-1-changbin.du@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski13-10/+53
Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h 9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp") 9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins") 099fdeda659d ("bnxt_en: Event handler for PPS events") kernel/bpf/helpers.c include/linux/bpf-cgroup.h a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()") c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current") drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c 5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray") 2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer") MAINTAINERS 7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo") 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13driver core: Add missing kernel doc for device::msi_lockThomas Gleixner1-0/+1
Fixes: 77e89afc25f3 ("PCI/MSI: Protect msi_desc::masked for multi-MSI") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2021-08-12Merge tag 'net-5.14-rc6' of ↵Linus Torvalds10-12/+27
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from netfilter, bpf, can and ieee802154. The size of this is pretty normal, but we got more fixes for 5.14 changes this week than last week. Nothing major but the trend is the opposite of what we like. We'll see how the next week goes.. Current release - regressions: - r8169: fix ASPM-related link-up regressions - bridge: fix flags interpretation for extern learn fdb entries - phy: micrel: fix link detection on ksz87xx switch - Revert "tipc: Return the correct errno code" - ptp: fix possible memory leak caused by invalid cast Current release - new code bugs: - bpf: add missing bpf_read_[un]lock_trace() for syscall program - bpf: fix potentially incorrect results with bpf_get_local_storage() - page_pool: mask the page->signature before the checking, avoid dma mapping leaks - netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps - bnxt_en: fix firmware interface issues with PTP - mlx5: Bridge, fix ageing time Previous releases - regressions: - linkwatch: fix failure to restore device state across suspend/resume - bareudp: fix invalid read beyond skb's linear data Previous releases - always broken: - bpf: fix integer overflow involving bucket_size - ppp: fix issues when desired interface name is specified via netlink - wwan: mhi_wwan_ctrl: fix possible deadlock - dsa: microchip: ksz8795: fix number of VLAN related bugs - dsa: drivers: fix broken backpressure in .port_fdb_dump - dsa: qca: ar9331: make proper initial port defaults Misc: - bpf: add lockdown check for probe_write_user helper - netfilter: conntrack: remove offload_pickup sysctl before 5.14 is out - netfilter: conntrack: collect all entries in one cycle, heuristically slow down garbage collection scans on idle systems to prevent frequent wake ups" * tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) vsock/virtio: avoid potential deadlock when vsock device remove wwan: core: Avoid returning NULL from wwan_create_dev() net: dsa: sja1105: unregister the MDIO buses during teardown Revert "tipc: Return the correct errno code" net: mscc: Fix non-GPL export of regmap APIs net: igmp: increase size of mr_ifc_count MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets net: pcs: xpcs: fix error handling on failed to allocate memory net: linkwatch: fix failure to restore device state across suspend/resume net: bridge: fix memleak in br_add_if() net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge net: bridge: fix flags interpretation for extern learn fdb entries net: dsa: sja1105: fix broken backpressure in .port_fdb_dump net: dsa: lantiq: fix broken backpressure in .port_fdb_dump net: dsa: lan9303: fix broken backpressure in .port_fdb_dump net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump bpf, core: Fix kernel-doc notation net: igmp: fix data-race in igmp_ifc_timer_expire() net: Fix memory leak in ieee802154_raw_deliver ...
2021-08-12Merge tag 'mlx5-updates-2021-08-11' of ↵David S. Miller2-40/+46
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 updates 2021-08-11 This series provides misc updates to mlx5. For more information please see tag log below. Please pull and let me know if there is any problem. mlx5-updates-2021-08-11 Misc. cleanup for mlx5. 1) Typos and use of netdev_warn() 2) smatch cleanup 3) Minor fix to inner TTC table creation 4) Dynamic capability cache allocation ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11mctp: Specify route types, require rtm_type in RTM_*ROUTE messagesJeremy Kerr1-0/+1
This change adds a 'type' attribute to routes, which can be parsed from a RTM_NEWROUTE message. This will help to distinguish local vs. peer routes in a future change. This means userspace will need to set a correct rtm_type in RTM_NEWROUTE and RTM_DELROUTE messages; we currently only accept RTN_UNICAST. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://lore.kernel.org/r/20210810023834.2231088-1-jk@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11net: igmp: increase size of mr_ifc_countEric Dumazet1-1/+1
Some arches support cmpxchg() on 4-byte and 8-byte only. Increase mr_ifc_count width to 32bit to fix this problem. Fixes: 4a2b285e7e10 ("net: igmp: fix data-race in igmp_ifc_timer_expire()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210811195715.3684218-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11bonding: remove extraneous definitions from bonding.hJonathan Toppins1-12/+0
All of the symbols either only exist in bond_options.c or nowhere at all. These symbols were verified to not exist in the code base by using `git grep` and their removal was verified by compiling bonding.ko. Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11vmlinux.lds.h: Handle clang's module.{c,d}tor sectionsNathan Chancellor1-0/+1
A recent change in LLVM causes module_{c,d}tor sections to appear when CONFIG_K{A,C}SAN are enabled, which results in orphan section warnings because these are not handled anywhere: ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_ctor) is being placed in '.text.asan.module_ctor' ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_dtor) is being placed in '.text.asan.module_dtor' ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor' Fangrui explains: "the function asan.module_ctor has the SHF_GNU_RETAIN flag, so it is in a separate section even with -fno-function-sections (default)". Place them in the TEXT_TEXT section so that these technologies continue to work with the newer compiler versions. All of the KASAN and KCSAN KUnit tests continue to pass after this change. Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1432 Link: https://github.com/llvm/llvm-project/commit/7b789562244ee941b7bf2cefeb3fc08a59a01865 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Fangrui Song <maskray@google.com> Acked-by: Marco Elver <elver@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210731023107.1932981-1-nathan@kernel.org
2021-08-11net/mlx5: Allocate individual capabilityParav Pandit2-34/+37
Currently mlx5_core_dev contains array of capabilities. It contains 19 valid capabilities of the device, 2 reserved entries and 12 holes. Due to this for 14 unused entries, mlx5_core_dev allocates 14 * 8K = 112K bytes of memory which is never used. Due to this mlx5_core_dev structure size is 270Kbytes odd. This allocation further aligns to next power of 2 to 512Kbytes. By skipping non-existent entries, (a) 112Kbyte is saved, (b) mlx5_core_dev reduces to 8KB with alignment (c) 350KB saved in alignment In future individual capability allocation can be used to skip its allocation when such capability is disabled at the device level. This patch prepares mlx5_core_dev to hold capability using a pointer instead of inline array. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11net/mlx5: Reorganize current and maximal capabilities to be per-typeParav Pandit2-35/+39
In the current code, the current and maximal capabilities are maintained in separate arrays which are both per type. In order to allow the creation of such a basic structure as a dynamically allocated array, we move curr and max fields to a unified structure so that specific capabilities can be allocated as one unit. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11net/mlx5: Delete impossible dev->state checksLeon Romanovsky1-2/+1
New mlx5_core device structure is allocated through devlink_alloc with\ kzalloc and that ensures that all fields are equal to zero and it includes ->state too. That means that checks of that field in the mlx5_init_one() is completely redundant, because that function is called only once in the begging of mlx5_core_dev lifetime. PCI: .probe() -> probe_one() -> mlx5_init_one() The recovery flow can't run at that time or before it, because relevant work initialized later in mlx5_init_once(). Such initialization flow ensures that dev->state can't be MLX5_DEVICE_STATE_UNINITIALIZED at all, so remove such impossible checks. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11net/mlx5: Fix typo in commentsCai Huoqing2-3/+3
Fix typo: *vectores ==> vectors *realeased ==> released *erros ==> errors *namepsace ==> namespace *trafic ==> traffic *proccessed ==> processed *retore ==> restore *Currenlty ==> Currently *crated ==> created *chane ==> change *cannnot ==> cannot *usuallly ==> usually *failes ==> fails *importent ==> important *reenabled ==> re-enabled *alocation ==> allocation *recived ==> received *tanslation ==> translation Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11devlink: Add APIs to publish, unpublish individual parameterParav Pandit1-0/+4
Enable drivers to publish/unpublish individual parameter. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11devlink: Add API to register and unregister single parameterParav Pandit1-0/+4
Currently device configuration parameters can be registered as an array. Due to this a constant array must be registered. A single driver supporting multiple devices each with different device capabilities end up registering all parameters even if it doesn't support it. One possible workaround a driver can do is, it registers multiple single entry arrays to overcome such limitation. Better is to provide a API that enables driver to register/unregister a single parameter. This also further helps in two ways. (1) to reduce the memory of devlink_param_entry by avoiding in registering parameters which are not supported by the device. (2) avoid generating multiple parameter add, delete, publish, unpublish, init value notifications for such unsupported parameters Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11devlink: Add new "enable_vnet" generic device paramParav Pandit1-0/+4
Add new device generic parameter to enable/disable creation of VDPA net auxiliary device and associated device functionality in the devlink instance. User who prefers to disable such functionality can disable it using below example. $ devlink dev param set pci/0000:06:00.0 \ name enable_vnet value false cmode driverinit $ devlink dev reload pci/0000:06:00.0 At this point devlink instance do not create auxiliary device for the VDPA net functionality. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11devlink: Add new "enable_rdma" generic device paramParav Pandit1-0/+4
Add new device generic parameter to enable/disable creation of RDMA auxiliary device and associated device functionality in the devlink instance. User who prefers to disable such functionality can disable it using below example. $ devlink dev param set pci/0000:06:00.0 \ name enable_rdma value false cmode driverinit $ devlink dev reload pci/0000:06:00.0 At this point devlink instance do not create auxiliary device for the RDMA functionality. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11devlink: Add new "enable_eth" generic device paramParav Pandit1-0/+4
Add new device generic parameter to enable/disable creation of Ethernet auxiliary device and associated device functionality in the devlink instance. User who prefers to disable such functionality can disable it using below example. $ devlink dev param set pci/0000:06:00.0 \ name enable_eth value false cmode driverinit $ devlink dev reload pci/0000:06:00.0 At this point devlink instance do not create auxiliary device for the Ethernet functionality. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11net: bridge: vlan: use br_rports_fill_info() to export mcast router portsNikolay Aleksandrov1-0/+1
Embed the standard multicast router port export by br_rports_fill_info() into a new global vlan attribute BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS. In order to have the same format for the global bridge mcast context and the per-vlan mcast context we need a double-nesting: - BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS - MDBA_ROUTER Currently we don't compare router lists, if any router port exists in the bridge mcast contexts we consider their option sets as different and export them separately. In addition we export the router port vlan id when dumping similar to the router port notification format. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11net: bridge: vlan: add support for mcast router global optionNikolay Aleksandrov1-0/+1
Add support to change and retrieve global vlan multicast router state which is used for the bridge itself. We just need to pass multicast context to br_multicast_set_router instead of bridge device and the rest of the logic remains the same. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11net: bridge: vlan: add support for mcast querier global optionNikolay Aleksandrov1-0/+1
Add support to change and retrieve global vlan multicast querier state. We just need to pass multicast context to br_multicast_set_querier instead of bridge device and the rest of the logic remains the same. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>