linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2021-08-13	mac80211: Fix monitor MTU limit so that A-MSDUs get through	Johan Almbladh	1	-2/+9
	The maximum MTU was set to 2304, which is the maximum MSDU size. While this is valid for normal WLAN interfaces, it is too low for monitor interfaces. A monitor interface may receive and inject MPDU frames, and the maximum MPDU frame size is larger than 2304. The MPDU may also contain an A-MSDU frame, in which case the size may be much larger than the MTU limit. Since the maximum size of an A-MSDU depends on the PHY mode of the transmitting STA, it is not possible to set an exact MTU limit for a monitor interface. Now the maximum MTU for a monitor interface is unrestricted. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Link: https://lore.kernel.org/r/20210628123246.2070558-1-johan.almbladh@anyfinetworks.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-08-13	mac80211: remove unnecessary NULL check in ieee80211_register_hw()	Dan Carpenter	1	-1/+1
	The address "&sband->iftype_data[i]" points to an array at the end of struct. It can't be NULL and so the check can be removed. Fixes: bac2fd3d7534 ("mac80211: remove use of ieee80211_get_he_sta_cap()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YNmgHi7Rh3SISdog@mwanda Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-08-13	mac80211: Reject zero MAC address in sta_info_insert_check()	YueHaibing	1	-1/+1
	As commit 52dba8d7d5ab ("mac80211: reject zero MAC address in add station") said, we don't consider all-zeroes to be a valid MAC address in most places, so also reject it here. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20210626130334.13624-1-yuehaibing@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-08-13	nl80211: vendor-cmd: add Intel vendor commands for iwlmei usage	Emmanuel Grumbach	1	-0/+77
	iwlmei allows to integrate with the CSME firmware. There are flows that are prioprietary for this purpose: * Get the information of the AP the CSME firmware is connected to. This is useful when we need to speed up the connection process in case the CSME firmware has a TCP connection that must be kept alive across the ownership transition. * Forbid roaming, which will happen when the CSME firmware wants to tell the user space not disrupt the connection. * Request ownership, upon driver boot when the CSME firmware owns the device. This is a notification sent by the kernel. All those commands are expected to be used by any software managing the connection (mainly NetworkManager). Those commands are expected to be used only in case the CSME firmware owns the device and doesn't want to release the device unless the host made sure that it can keep the connectivity. Here are the steps of the expected flow: 1) The machine boots while AMT has an active TCP connection 2) iwlwifi starts and tries to access the device 3) The device is not available because of the active TCP connection. (If there are no active connections, the CSME firmware would have allowed iwlwifi to use the device) Note that all the steps up to here don't involve iwlmei. All this happens in iwlwifi (in iwl_pcie_prepare_card_hw). 4) iwlmei establishes a connection to the CSME firmware (through SAP) Here iwlwifi uses iwlmei to access the device's capabilities (since it can't touch the device), but this is not relevant for the vendor commands. 5) The CSME firmware tells iwlmei that it uses the NIC and that there is an acitve TCP connection, and hence, the host needs to think twice before asking the CSME firmware to release the device 6) iwlmei tells iwlwifi to report HW RFKILL with a special reason Up to here, there was no user space involved. 7) The user space (NetworkManager) boots and sees that the device is in RFKILL because the host doesn't own the device 8) The user space asks the kernel what AP the CSME firmware is connected to (with the first vendor command mentionned above) 9) The user space checks if it has a profile that matches the reply from the CSME firmware 10) The user space installs a network to the wpa_supplicant with a specific BSSID and a specific frequency 11) The user space prevents any type of full scan 12) The user space asks iwlmei to request ownership on the device (with the third vendor command) 13) iwlmei request ownership from the CSME firmware 14) The CSME firmware grants ownership 15) iwlmei tells iwlwifi to lift the RFKILL 16) RFKILL OFF is reported to userspace 17) The host boots the device, loads the firwmare, and connect to a specific BSSID without scanning including IP in less than 600ms (this is what I measured, of course it depends on many factors) 18) The host reports to the CSME firmware that there is a connection 19) The TCP connection is preserved and the host has now connectivity 20) Later, the TCP connection to the CSME firmware is terminated 21) The CSME firmware tells iwlmei that it is now free to do whatever it likes 22) iwlwifi sends the second vendor command to tell the user space that it can remove the special network configuration and pick any SSID / BSSID it likes. Co-Developed-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://lore.kernel.org/r/20210625081717.7680-4-emmanuel.grumbach@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-08-12	dt-bindings: net: qcom,ipa: make imem interconnect optional	Alex Elder	1	-8/+16
	On some newer SoCs, the interconnect between IPA and SoC internal memory (imem) is not used. Update the binding to indicate that having just the memory and config interconnects is another allowed configuration. Signed-off-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20210811141802.2635424-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-12	net: ipa: always inline ipa_aggr_granularity_val()	Alex Elder	1	-4/+3
	It isn't required, but all callers of ipa_aggr_granularity_val() pass a constant value (IPA_AGGR_GRANULARITY) as the usec argument. Two of those callers are in ipa_validate_build(), with the result being passed to BUILD_BUG_ON(). Evidently the "sparc64-linux-gcc" compiler (at least) doesn't always inline ipa_aggr_granularity_val(), so the result of the function is not constant at compile time, and that leads to build errors. Define the function with the __always_inline attribute to avoid the errors. We can see by inspection that the value passed is never zero, so we can just remove its WARN_ON() call. Fixes: 5bc5588466a1f ("net: ipa: use WARN_ON() rather than assertions") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20210811135948.2634264-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-12	Merge tag 'mlx5-updates-2021-08-11' of ↵	David S. Miller	20	-107/+190
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 updates 2021-08-11 This series provides misc updates to mlx5. For more information please see tag log below. Please pull and let me know if there is any problem. mlx5-updates-2021-08-11 Misc. cleanup for mlx5. 1) Typos and use of netdev_warn() 2) smatch cleanup 3) Minor fix to inner TTC table creation 4) Dynamic capability cache allocation ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-12	Merge branch 'dsa-cross-chip-notifiers'	David S. Miller	4	-22/+38
	Vladimir Oltean says: ==================== Improvements to the DSA tag_8021q cross-chip notifiers This series improves cross-chip notifier error messages and addresses a benign error message seen during reboot on a system with disjoint DSA trees. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-12	net: dsa: tag_8021q: don't broadcast during setup/teardown	Vladimir Oltean	4	-16/+26
	Currently, on my board with multiple sja1105 switches in disjoint trees described in commit f66a6a69f97a ("net: dsa: permit cross-chip bridging between all trees in the system"), rebooting the board triggers the following benign warnings: [ 12.345566] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 1088 deletion: -ENOENT [ 12.353804] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 2112 deletion: -ENOENT [ 12.362019] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 1089 deletion: -ENOENT [ 12.370246] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 2113 deletion: -ENOENT [ 12.378466] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 1090 deletion: -ENOENT [ 12.386683] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 2114 deletion: -ENOENT Basically switch 1 calls dsa_tag_8021q_unregister, and switch 1's TX and RX VLANs cannot be found on switch 2's CPU port. But why would switch 2 even attempt to delete switch 1's TX and RX tag_8021q VLANs from its CPU port? Well, because we use dsa_broadcast, and it is supposed that it had added those VLANs in the first place (because in dsa_port_tag_8021q_vlan_match, all CPU ports match regardless of their tree index or switch index). The two trees probe asynchronously, and when switch 1 probed, it called dsa_broadcast which did not notify the tree of switch 2, because that didn't probe yet. But during unbind, switch 2's tree _is_ probed, so it _is_ notified of the deletion. Before jumping to introduce a synchronization mechanism between the probing across disjoint switch trees, let's take a step back and see whether we _need_ to do that in the first place. The RX and TX VLANs of switch 1 would be needed on switch 2's CPU port only if switch 1 and 2 were part of a cross-chip bridge. And dsa_tag_8021q_bridge_join takes care precisely of that (but if probing was synchronous, the bridge_join would just end up bumping the VLANs' refcount, because they are already installed by the setup path). Since by the time the ports are bridged, all DSA trees are already set up, and we don't need the tag_8021q VLANs of one switch installed on the other switches during probe time, the answer is that we don't need to fix the synchronization issue. So make the setup and teardown code paths call dsa_port_notify, which notifies only the local tree, and the bridge code paths call dsa_broadcast, which let the other trees know as well. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-12	net: dsa: print more information when a cross-chip notifier fails	Vladimir Oltean	1	-6/+12
	Currently this error message does not say a lot: [ 32.693498] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.699725] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.705931] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.712139] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.718347] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.724554] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT but in this form, it is immediately obvious (at least to me) what the problem is, even without further looking at the code: [ 12.345566] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 1088 deletion: -ENOENT [ 12.353804] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 2112 deletion: -ENOENT [ 12.362019] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 1089 deletion: -ENOENT [ 12.370246] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 2113 deletion: -ENOENT [ 12.378466] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 1090 deletion: -ENOENT [ 12.386683] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 2114 deletion: -ENOENT Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-12	wwan: core: Unshadow error code returned by ida_alloc_range()	Andy Shevchenko	1	-2/+5
	ida_alloc_range() may return other than -ENOMEM error code. Unshadow it in the wwan_create_port(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-12	net: dsa: mt7530: fix VLAN traffic leaks again	DENG Qingfang	1	-4/+1
	When a port leaves a VLAN-aware bridge, the current code does not clear other ports' matrix field bit. If the bridge is later set to VLAN-unaware mode, traffic in the bridge may leak to that port. Remove the VLAN filtering check in mt7530_port_bridge_leave. Fixes: 474a2ddaa192 ("net: dsa: mt7530: fix VLAN traffic leaks") Fixes: 83163f7dca56 ("net: dsa: mediatek: add VLAN support for MT7530") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-12	net: phy: nxp-tja11xx: log critical health state	Oleksij Rempel	1	-2/+11
	TJA1102 provides interrupt notification for the critical health states like overtemperature and undervoltage. The overtemperature bit is set if package temperature is beyond 155C°. This functionality was tested by heating the package up to 200C° The undervoltage bit is set if supply voltage drops beyond some critical threshold. Currently not tested. In a typical use case, both of this events should be logged and stored (or send to some remote system) for further investigations. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-12	Merge branch 'pktgen-imix'	David S. Miller	1	-1/+162
	Nick Richardson says: ==================== pktgen: Add IMIX mode Adds internet mix (IMIX) mode to pktgen. Internet mix is included in many user-space network perf testing tools. It allows for the user to specify a distribution of discrete packet sizes to be generated. This type of test is common among vendors when perf testing their devices. [RFC link: https://datatracker.ietf.org/doc/html/rfc2544#section-9.1] This allows users to get a more complete picture of how their device will perform in the real-world. This feature adds a command that allows users to specify an imix distribution in the following format: imix_weights size_1,weight_1 size_2,weight_2 ... size_n,weight_n The distribution of packets with size_i will be (weight_i / total_weights) where total_weights = weight_1 + weight_2 + ... + weight_n For example: imix_weights 40,7 576,4 1500,1 The pkt_size "40" will account for 7 / (7 + 4 + 1) = ~58% of the total packets sent. This patch was tested with the following: 1. imix_weights = 40,7 576,4 1500,1 2. imix_weights = 0,7 576,4 1500,1 - Packet size of 0 is resized to the minimum, 42 3. imix_weights = 40,7 576,4 1500,1 count = 0 - Zero count. - Runs until user stops pktgen. Invalid Configurations 1. clone_skb = 200 imix_weights = 40,7 576,4 1500,1 - Returns error code -524 (-ENOTSUPP) when setting imix_weights 2. len(imix_weights) > MAX_IMIX_ENTRIES - Returns -7 (-E2BIG) This patch is split into three parts, each provide different aspects of required functionality: 1. Parse internet mix input. 2. Add IMIX Distribution representation. 3. Process and output IMIX results. Changes in v2: * Remove __ prefix outside of uAPI. * Use seq_puts instead of seq_printf where necessary. * Reorder variable declaration. * Return -EINVAL instead of -ENOTSUPP when using IMIX with clone_skb > 0 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-12	pktgen: Add output for imix results	Nick Richardson	1	-1/+25
	The bps for imix mode is calculated by: sum(imix_entry.size) / time_elapsed The actual counts of each imix_entry are displayed under the "Current:" section of the interface output in the following format: imix_size_counts: size_1,count_1 size_2,count_2 ... size_n,count_n Example (count = 200000): imix_weights: 256,1 859,3 205,2 imix_size_counts: 256,32082 859,99796 205,68122 Result: OK: 17992362(c17964678+d27684) usec, 200000 (859byte,0frags) 11115pps 47Mb/sec (47977140bps) errors: 0 Summary of changes: Calculate bps based on imix counters when in IMIX mode. Add output for IMIX counters. Signed-off-by: Nick Richardson <richardsonnick@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-12	pktgen: Add imix distribution bins	Nick Richardson	1	-0/+41
	In order to represent the distribution of imix packet sizes, a pre-computed data structure is used. It features 100 (IMIX_PRECISION) "bins". Contiguous ranges of these bins represent the respective packet size of each imix entry. This is done to avoid the overhead of selecting the correct imix packet size based on the corresponding weights. Example: imix_weights 40,7 576,4 1500,1 total_weight = 7 + 4 + 1 = 12 pkt_size 40 occurs 7/total_weight = 58% of the time pkt_size 576 occurs 4/total_weight = 33% of the time pkt_size 1500 occurs 1/total_weight = 9% of the time We generate a random number between 0-100 and select the corresponding packet size based on the specified weights. Eg. random number = 358723895 % 100 = 65 Selects the packet size corresponding to index:65 in the pre-computed imix_distribution array. An example of the pre-computed array is below: The imix_distribution will look like the following: 0 -> 0 (index of imix_entry.size == 40) 1 -> 0 (index of imix_entry.size == 40) 2 -> 0 (index of imix_entry.size == 40) [...] -> 0 (index of imix_entry.size == 40) 57 -> 0 (index of imix_entry.size == 40) 58 -> 1 (index of imix_entry.size == 576) [...] -> 1 (index of imix_entry.size == 576) 90 -> 1 (index of imix_entry.size == 576) 91 -> 2 (index of imix_entry.size == 1500) [...] -> 2 (index of imix_entry.size == 1500) 99 -> 2 (index of imix_entry.size == 1500) Create and use "bin" representation of the imix distribution. Signed-off-by: Nick Richardson <richardsonnick@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-12	pktgen: Parse internet mix (imix) input	Nick Richardson	1	-0/+96
	Adds "imix_weights" command for specifying internet mix distribution. The command is in this format: "imix_weights size_1,weight_1 size_2,weight_2 ... size_n,weight_n" where the probability that packet size_i is picked is: weight_i / (weight_1 + weight_2 + .. + weight_n) The user may provide up to 100 imix entries (size_i,weight_i) in this command. The user specified imix entries will be displayed in the "Params" section of the interface output. Values for clone_skb > 0 is not supported in IMIX mode. Summary of changes: Add flag for enabling internet mix mode. Add command (imix_weights) for internet mix input. Return -ENOTSUPP when clone_skb > 0 in IMIX mode. Display imix_weights in Params. Create data structures to store imix entries and distribution. Signed-off-by: Nick Richardson <richardsonnick@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	net: bridge: vlan: fix global vlan option range dumping	Nikolay Aleksandrov	1	-1/+2
	When global vlan options are equal sequentially we compress them in a range to save space and reduce processing time. In order to have the proper range end id we need to update range_end if the options are equal otherwise we get ranges with the same end vlan id as the start. Fixes: 743a53d9636a ("net: bridge: vlan: add support for dumping global vlan options") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210810092139.11700-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11	mctp: Specify route types, require rtm_type in RTM_*ROUTE messages	Jeremy Kerr	2	-5/+23
	This change adds a 'type' attribute to routes, which can be parsed from a RTM_NEWROUTE message. This will help to distinguish local vs. peer routes in a future change. This means userspace will need to set a correct rtm_type in RTM_NEWROUTE and RTM_DELROUTE messages; we currently only accept RTN_UNICAST. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://lore.kernel.org/r/20210810023834.2231088-1-jk@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11	net: hns3: add support for triggering reset by ethtool	Yufeng Mo	4	-0/+68
	Currently, four reset types are supported for the HNS3 ethernet driver: IMP reset, global reset, function reset, and FLR. Only FLR can now be triggered by the user. To restore the device when an exception occurs, add support for triggering reset by ethtool. Run the "ethtool --reset DEVNAME mgmt \| all \| dedicated" to trigger the IMP \| global \| function reset manually. In addition, VF can only trigger function reset. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Link: https://lore.kernel.org/r/1628602128-15640-1-git-send-email-huangguangbin2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11	Merge branch 'bonding-cleanup-header-file-and-error-msgs'	Jakub Kicinski	2	-44/+37
	Jonathan Toppins says: ==================== bonding: cleanup header file and error msgs Two small patches removing unreferenced symbols and unifying error messages across netlink and printk. ==================== Link: https://lore.kernel.org/r/cover.1628650079.git.jtoppins@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11	bonding: combine netlink and console error messages	Jonathan Toppins	1	-32/+37
	There seems to be no reason to have different error messages between netlink and printk. It also cleans up the function slightly. Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11	bonding: remove extraneous definitions from bonding.h	Jonathan Toppins	1	-12/+0
	All of the symbols either only exist in bond_options.c or nowhere at all. These symbols were verified to not exist in the code base by using `git grep` and their removal was verified by compiling bonding.ko. Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11	net: mscc: Fix non-GPL export of regmap APIs	Mark Brown	1	-8/+8
	The ocelot driver makes use of regmap, wrapping it with driver specific operations that are thin wrappers around the core regmap APIs. These are exported with EXPORT_SYMBOL, dropping the _GPL from the core regmap exports which is frowned upon. Add _GPL suffixes to at least the APIs that are doing register I/O. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210810123748.47871-1-broonie@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11	net/mlx5e: Make use of netdev_warn()	Cai Huoqing	1	-3/+6
	to replace printk(KERN_WARNING ...) with netdev_warn() kindly Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11	net/mlx5: Fix variable type to match 64bit	Eran Ben Elisha	1	-4/+4
	Fix the following smatch warning: wait_func_handle_exec_timeout() warn: should '1 << ent->idx' be a 64 bit type? Use 1ULL, to have a 64 bit type variable. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11	net/mlx5: Initialize numa node for all core devices	Parav Pandit	1	-2/+1
	Subsequent patches make use of numa node affinity for memory allocations. Initialize it for PCI PF, VF and SF devices. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11	net/mlx5: Allocate individual capability	Parav Pandit	4	-40/+104
	Currently mlx5_core_dev contains array of capabilities. It contains 19 valid capabilities of the device, 2 reserved entries and 12 holes. Due to this for 14 unused entries, mlx5_core_dev allocates 14 * 8K = 112K bytes of memory which is never used. Due to this mlx5_core_dev structure size is 270Kbytes odd. This allocation further aligns to next power of 2 to 512Kbytes. By skipping non-existent entries, (a) 112Kbyte is saved, (b) mlx5_core_dev reduces to 8KB with alignment (c) 350KB saved in alignment In future individual capability allocation can be used to skip its allocation when such capability is disabled at the device level. This patch prepares mlx5_core_dev to hold capability using a pointer instead of inline array. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11	net/mlx5: Reorganize current and maximal capabilities to be per-type	Parav Pandit	4	-41/+45
	In the current code, the current and maximal capabilities are maintained in separate arrays which are both per type. In order to allow the creation of such a basic structure as a dynamically allocated array, we move curr and max fields to a unified structure so that specific capabilities can be allocated as one unit. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11	net/mlx5: SF, use recent sysfs api	Parav Pandit	1	-1/+1
	Use sysfs_emit() which is aware of PAGE_SIZE buffer. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11	net/mlx5: Refcount mlx5_irq with integer	Shay Drory	1	-21/+44
	Currently, all access to mlx5 IRQs are done undere a lock. Hance, there isn't a reason to have kref in struct mlx5_irq. Switch it to integer. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11	net/mlx5: Change SF missing dedicated MSI-X err message to dbg	Shay Drory	1	-1/+1
	When MSI-X vectors allocated are not enough for SFs to have dedicated, MSI-X, kernel log buffer has too many entries. Hence only enable such log with debug level. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11	net/mlx5: Align mlx5_irq structure	Shay Drory	1	-2/+2
	mlx5_irq structure have holes due to incorrect position of fields in it. Make them naturally align. pahole output after alignment: struct mlx5_irq { struct atomic_notifier_head nh; /* 0 72 / / --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- / cpumask_var_t mask; / 72 8 / char name[32]; / 80 32 / struct mlx5_irq_pool pool; /* 112 8 / struct kref kref; / 120 4 / u32 index; / 124 4 / / --- cacheline 2 boundary (128 bytes) --- / int irqn; / 128 4 / / size: 136, cachelines: 3, members: 7 / / padding: 4 / / last cacheline: 8 bytes */ }; Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11	net/mlx5: Delete impossible dev->state checks	Leon Romanovsky	3	-12/+1
	New mlx5_core device structure is allocated through devlink_alloc with\ kzalloc and that ensures that all fields are equal to zero and it includes ->state too. That means that checks of that field in the mlx5_init_one() is completely redundant, because that function is called only once in the begging of mlx5_core_dev lifetime. PCI: .probe() -> probe_one() -> mlx5_init_one() The recovery flow can't run at that time or before it, because relevant work initialized later in mlx5_init_once(). Such initialization flow ensures that dev->state can't be MLX5_DEVICE_STATE_UNINITIALIZED at all, so remove such impossible checks. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11	net/mlx5: Fix inner TTC table creation	Maor Gottlieb	1	-1/+2
	Fix typo of the cited commit that calls to mlx5_create_ttc_table, instead of mlx5_create_inner_ttc_table. Fixes: f4b45940e9b9 ("net/mlx5: Embed mlx5_ttc_table") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11	net/mlx5: Fix typo in comments	Cai Huoqing	16	-19/+19
	Fix typo: vectores ==> vectors realeased ==> released erros ==> errors namepsace ==> namespace trafic ==> traffic proccessed ==> processed retore ==> restore Currenlty ==> Currently crated ==> created chane ==> change cannnot ==> cannot usuallly ==> usually failes ==> fails importent ==> important reenabled ==> re-enabled alocation ==> allocation recived ==> received tanslation ==> translation Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11	Merge branch 'dsa-tagger-helpers'	David S. Miller	8	-81/+119
	Vladimir Oltean says: ==================== DSA tagger helpers The goal of this series is to minimize the use of memmove and skb->data in the DSA tagging protocol drivers. Unfiltered access to this level of information is not very friendly to drive-by contributors, and sometimes is also not the easiest to review. For starters, I have converted the most common form of DSA tagging protocols: the DSA headers which are placed where the EtherType is. The helper functions introduced by this series are: - dsa_alloc_etype_header - dsa_strip_etype_header - dsa_etype_header_pos_rx - dsa_etype_header_pos_tx This series is just a resend as non-RFC of v1. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	net: dsa: create a helper for locating EtherType DSA headers on TX	Vladimir Oltean	7	-17/+23
	Create a similar helper for locating the offset to the DSA header relative to skb->data, and make the existing EtherType header taggers to use it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	net: dsa: create a helper for locating EtherType DSA headers on RX	Vladimir Oltean	8	-26/+21
	It seems that protocol tagging driver writers are always surprised about the formula they use to reach their EtherType header on RX, which becomes apparent from the fact that there are comments in multiple drivers that mention the same information. Create a helper that returns a void pointer to skb->data - 2, as well as centralize the explanation why that is the case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	net: dsa: create a helper which allocates space for EtherType DSA headers	Vladimir Oltean	8	-10/+38
	Hide away the memmove used by DSA EtherType header taggers to shift the MAC SA and DA to the left to make room for the header, after they've called skb_push(). The call to skb_push() is still left explicit in drivers, to be symmetric with dsa_strip_etype_header, and because not all callers can be refactored to do it (for example, brcm_tag_xmit_ll has common code for a pre-Ethernet DSA tag and an EtherType DSA tag). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	net: dsa: create a helper that strips EtherType DSA headers on RX	Vladimir Oltean	8	-28/+37
	All header taggers open-code a memmove that is fairly not all that obvious, and we can hide the details behind a helper function, since the only thing specific to the driver is the length of the header tag. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	Merge branch 'devlink-aux-devices'	David S. Miller	6	-12/+382
	Parav Pandit says: ==================== devlink: Control auxiliary devices Currently, for mlx5 multi-function device, a user is not able to control which functionality to enable/disable. For example, each PCI PF, VF, SF function by default has netdevice, RDMA and vdpa-net devices always enabled. Hence, enable user to control which device functionality to enable/disable. This is achieved by using existing devlink params [1] to enable/disable eth, rdma and vdpa net functionality control knob. For example user interested in only vdpa device function: performs, $ devlink dev param set pci/0000:06:00.0 name enable_rdma value false \ cmode driverinit $ devlink dev param set pci/0000:06:00.0 name enable_eth value false \ cmode driverinit $ devlink dev param set pci/0000:06:00.0 name enable_vnet value true \ cmode driverinit $ devlink dev reload pci/0000:06:00.0 Reload command honors parameters set, initializes the device that user has composed using devlink dev params and resources. Devices before reload: mlx5_core.sf.4 (subfunction device) /\ /\| \ / \| \ / \| \ mlx5_core.eth.4 \| mlx5_core.rdma.4 (SF eth aux dev) \| (SF rdma aux dev) \| \| \| \| \| \| enp6s0f0s88 \| mlx5_0 (SF netdev) \| (SF rdma device) \| mlx5_core.vnet.4 (SF vnet aux dev) \| \| auxiliary/mlx5_core.sf.4 (vdpa net mgmt device) Above example reconfigures the device with only VDPA functionality. Devices after reload: mlx5_core.sf.4 (subfunction device) /\ / \ / \ / \ mlx5_core.vnet.4 no eth, no rdma aux devices (SF vnet aux dev) Above parameters enable user to compose the device as needed based on the use case. Since devlink params are done on the devlink instance, these knobs are uniformly usable for PCI PF, VF and SF devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	net/mlx5: Support enable_vnet devlink dev param	Parav Pandit	3	-2/+57
	Enable user to disable VDPA net auxiliary device so that when it is not required, user can disable it. For example, $ devlink dev param set pci/0000:06:00.0 \ name enable_vnet value false cmode driverinit $ devlink dev reload pci/0000:06:00.0 At this point devlink instance do not create auxiliary device mlx5_core.vnet.2 for the VDPA net functionality. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	net/mlx5: Support enable_rdma devlink dev param	Parav Pandit	3	-3/+79
	Enable user to disable RDMA auxiliary device so that when it is not required, user can disable it. For example, $ devlink dev param set pci/0000:06:00.0 \ name enable_rdma value false cmode driverinit $ devlink dev reload pci/0000:06:00.0 At this point devlink instance do not create auxiliary device mlx5_core.rdma.2 for the RDMA functionality. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	net/mlx5: Support enable_eth devlink dev param	Parav Pandit	3	-2/+96
	Enable user to disable Ethernet auxiliary device so that when it is not required, user can disable it. For example, $ devlink dev param set pci/0000:06:00.0 \ name enable_eth value false cmode driverinit $ devlink dev reload pci/0000:06:00.0 At this point devlink instance do not create mlx5_core.eth.2 auxiliary device for the Ethernet functionality. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	net/mlx5: Fix unpublish devlink parameters	Parav Pandit	1	-0/+1
	Cleanup routine missed to unpublish the parameters. Add it. Fixes: e890acd5ff18 ("net/mlx5: Add devlink flow_steering_mode parameter") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	devlink: Add APIs to publish, unpublish individual parameter	Parav Pandit	2	-0/+52
	Enable drivers to publish/unpublish individual parameter. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	devlink: Add API to register and unregister single parameter	Parav Pandit	2	-0/+41
	Currently device configuration parameters can be registered as an array. Due to this a constant array must be registered. A single driver supporting multiple devices each with different device capabilities end up registering all parameters even if it doesn't support it. One possible workaround a driver can do is, it registers multiple single entry arrays to overcome such limitation. Better is to provide a API that enables driver to register/unregister a single parameter. This also further helps in two ways. (1) to reduce the memory of devlink_param_entry by avoiding in registering parameters which are not supported by the device. (2) avoid generating multiple parameter add, delete, publish, unpublish, init value notifications for such unsupported parameters Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	devlink: Create a helper function for one parameter registration	Parav Pandit	1	-6/+18
	Create and use a helper function for one parameter registration. Subsequent patch also will reuse this for driver facing routine to register a single parameter. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11	devlink: Add new "enable_vnet" generic device param	Parav Pandit	3	-0/+13
	Add new device generic parameter to enable/disable creation of VDPA net auxiliary device and associated device functionality in the devlink instance. User who prefers to disable such functionality can disable it using below example. $ devlink dev param set pci/0000:06:00.0 \ name enable_vnet value false cmode driverinit $ devlink dev reload pci/0000:06:00.0 At this point devlink instance do not create auxiliary device for the VDPA net functionality. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>