summaryrefslogtreecommitdiffstats
path: root/drivers
AgeCommit message (Collapse)AuthorFilesLines
2021-08-11net: ipa: kill ipa_clock_get_additional()Alex Elder3-20/+4
Now that ipa_clock_get_additional() is a trivial wrapper around pm_runtime_get_if_active(), just open-code it in its only caller and delete the function. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11net: ipa: kill IPA clock reference countAlex Elder1-71/+6
The runtime power management core code maintains a usage count. This count mirrors the IPA clock reference count, and there's no need to maintain both. So get rid of the IPA clock reference count and just rely on the runtime PM usage count to determine when the hardware should be suspended or resumed. Use pm_runtime_get_if_active() in ipa_clock_get_additional(). We care whether power is active, regardless of whether it's in use, so pass true for its ign_usage_count argument. The IPA clock mutex is just used to make enabling/disabling the clock and updating the reference count occur atomically. Without the reference count, there's no need for the mutex, so get rid of that too. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11net: ipa: get rid of extra clock referenceAlex Elder1-11/+0
Suspending the IPA hardware is now managed by the runtime PM core code. The ->runtime_idle callback returns a non-zero value, so it will never suspend except when forced. As a result, there's no need to take an extra "do not suspend" clock reference. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11net: ipa: use runtime PM coreAlex Elder1-37/+38
Use the runtime power management core to cause hardware suspend and resume to occur. Enable it in ipa_clock_init() (without autosuspend), and disable it in ipa_clock_exit(). Use ipa_runtime_suspend() as the ->runtime_suspend power operation, and arrange for it to be called by having ipa_clock_get() call pm_runtime_get_sync() when the first clock reference is taken. Similarly, use ipa_runtime_resume() as the ->runtime_resume power operation, and pm_runtime_put() when the last IPA clock reference is dropped. Introduce ipa_runtime_idle() as the ->runtime_idle power operation, and have it return a non-zero value; this way suspend will never occur except when forced. Use pm_runtime_force_suspend() and pm_runtime_force_resume() as the system suspend and resume callbacks, and remove ipa_suspend() and ipa_resume(). Store a pointer to the device structure passed to ipa_clock_init(), so it can be used by ipa_clock_exit() to disable runtime power management. For now we preserve IPA clock reference counting. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11net: ipa: resume in ipa_clock_get()Alex Elder1-26/+37
Introduce ipa_runtime_suspend() and ipa_runtime_resume(), which encapsulate the activities necessary for suspending and resuming the IPA hardware. Call these functions from ipa_clock_get() and ipa_clock_put() when the first reference is taken or last one is dropped. When the very first clock reference is taken (for ipa_config()), setup isn't complete yet, so (as before) only the core clock gets enabled. When the last clock reference is dropped (after ipa_deconfig()), ipa_teardown() will have made the setup_complete flag false, so there too, the core clock will be stopped without affecting GSI or the endpoints. Otherwise these new functions will perform the desired suspend and resume actions once setup is complete. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11net: ipa: disable clock in suspendAlex Elder1-8/+3
Disable the IPA clock rather than dropping a reference to it in the system suspend callback. This forces the suspend to occur without affecting existing references. Similarly, enable the clock rather than taking a reference in ipa_resume(), forcing a resume without changing the reference count. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11net: ipa: have ipa_clock_get() return a valueAlex Elder7-59/+99
We currently assume no errors occur when enabling or disabling the IPA core clock and interconnects. And although this commit exposes errors that could occur, we generally assume this won't happen in practice. This commit changes ipa_clock_get() and ipa_clock_put() so each returns a value. The values returned are meant to mimic what the runtime power management functions return, so we can set up error handling here before we make the switch. Have ipa_clock_get() increment the reference count even if it returns an error, to match the behavior of pm_runtime_get(). More details follow. When taking a reference in ipa_clock_get(), return 0 for the first reference, 1 for subsequent references, or a negative error code if an error occurs. Note that if ipa_clock_get() returns an error, we must not touch hardware; in some cases such errors now cause entire blocks of code to be skipped. When dropping a reference in ipa_clock_put(), we return 0 or an error code. The error would come from ipa_clock_disable(), which now returns what ipa_interconnect_disable() returns (either 0 or a negative error code). For now, callers ignore the return value; if an error occurs, a message will have already been logged, and little more can actually be done to improve the situation. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-10Merge branch 'mlx5-next' of ↵Jakub Kicinski28-103/+1034
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== pull-request: mlx5-next 2020-08-9 This pulls mlx5-next branch which includes patches already reviewed on net-next and rdma mailing lists. 1) mlx5 single E-Switch FDB for lag 2) IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq 3) Add DCS caps & fields support [1] https://patchwork.kernel.org/project/netdevbpf/cover/20210803231959.26513-1-saeed@kernel.org/ [2] https://patchwork.kernel.org/project/netdevbpf/patch/0e3364dab7e0e4eea5423878b01aa42470be8d36.1626609184.git.leonro@nvidia.com/ [3] https://patchwork.kernel.org/project/netdevbpf/patch/55e1d69bef1fbfa5cf195c0bfcbe35c8019de35e.1624258894.git.leonro@nvidia.com/ * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Lag, Create shared FDB when in switchdev mode net/mlx5: E-Switch, add logic to enable shared FDB net/mlx5: Lag, move lag destruction to a workqueue net/mlx5: Lag, properly lock eswitch if needed net/mlx5: Add send to vport rules on paired device net/mlx5: E-Switch, Add event callback for representors net/mlx5e: Use shared mappings for restoring from metadata net/mlx5e: Add an option to create a shared mapping net/mlx5: E-Switch, set flow source for send to uplink rule RDMA/mlx5: Add shared FDB support {net, RDMA}/mlx5: Extend send to vport rules RDMA/mlx5: Fill port info based on the relevant eswitch net/mlx5: Lag, add initial logic for shared FDB net/mlx5: Return mdev from eswitch IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq net/mlx5: Add DCS caps & fields support ==================== Link: https://lore.kernel.org/r/20210809202522.316930-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-10Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski1-57/+397
Daniel Borkmann says: ==================== bpf-next 2021-08-10 We've added 31 non-merge commits during the last 8 day(s) which contain a total of 28 files changed, 3644 insertions(+), 519 deletions(-). 1) Native XDP support for bonding driver & related BPF selftests, from Jussi Maki. 2) Large batch of new BPF JIT tests for test_bpf.ko that came out as a result from 32-bit MIPS JIT development, from Johan Almbladh. 3) Rewrite of netcnt BPF selftest and merge into test_progs, from Stanislav Fomichev. 4) Fix XDP bpf_prog_test_run infra after net to net-next merge, from Andrii Nakryiko. 5) Follow-up fix in unix_bpf_update_proto() to enforce socket type, from Cong Wang. 6) Fix bpf-iter-tcp4 selftest to print the correct dest IP, from Jose Blanquicet. 7) Various misc BPF XDP sample improvements, from Niklas Söderlund, Matthew Cover, and Muhammad Falak R Wani. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (31 commits) bpf, tests: Add tail call test suite bpf, tests: Add tests for BPF_CMPXCHG bpf, tests: Add tests for atomic operations bpf, tests: Add test for 32-bit context pointer argument passing bpf, tests: Add branch conversion JIT test bpf, tests: Add word-order tests for load/store of double words bpf, tests: Add tests for ALU operations implemented with function calls bpf, tests: Add more ALU64 BPF_MUL tests bpf, tests: Add more BPF_LSH/RSH/ARSH tests for ALU64 bpf, tests: Add more ALU32 tests for BPF_LSH/RSH/ARSH bpf, tests: Add more tests of ALU32 and ALU64 bitwise operations bpf, tests: Fix typos in test case descriptions bpf, tests: Add BPF_MOV tests for zero and sign extension bpf, tests: Add BPF_JMP32 test cases samples, bpf: Add an explict comment to handle nested vlan tagging. selftests/bpf: Add tests for XDP bonding selftests/bpf: Fix xdp_tx.c prog section name net, core: Allow netdev_lower_get_next_private_rcu in bh context bpf, devmap: Exclude XDP broadcast to master device net, bonding: Add XDP support to the bonding driver ... ==================== Link: https://lore.kernel.org/r/20210810130038.16927-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-09net: hns3: support skb's frag page recycling based on page poolYunsheng Lin3-5/+78
This patch adds skb's frag page recycling support based on the frag page support in page pool. The performance improves above 10~20% for single thread iperf TCP flow with IOMMU disabled when iperf server and irq/NAPI have a different CPU. The performance improves about 135%(14Gbit to 33Gbit) for single thread iperf TCP flow when IOMMU is in strict mode and iperf server shares the same cpu with irq/NAPI. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-09page_pool: keep pp info as long as page pool owns the pageYunsheng Lin4-8/+4
Currently, page->pp is cleared and set everytime the page is recycled, which is unnecessary. So only set the page->pp when the page is added to the page pool and only clear it when the page is released from the page pool. This is also a preparation to support allocating frag page in page pool. Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-09net, bonding: Add XDP support to the bonding driverJussi Maki1-1/+308
XDP is implemented in the bonding driver by transparently delegating the XDP program loading, removal and xmit operations to the bonding slave devices. The overall goal of this work is that XDP programs can be attached to a bond device *without* any further changes (or awareness) necessary to the program itself, meaning the same XDP program can be attached to a native device but also a bonding device. Semantics of XDP_TX when attached to a bond are equivalent in such setting to the case when a tc/BPF program would be attached to the bond, meaning transmitting the packet out of the bond itself using one of the bond's configured xmit methods to select a slave device (rather than XDP_TX on the slave itself). Handling of XDP_TX to transmit using the configured bonding mechanism is therefore implemented by rewriting the BPF program return value in bpf_prog_run_xdp. To avoid performance impact this check is guarded by a static key, which is incremented when a XDP program is loaded onto a bond device. This approach was chosen to avoid changes to drivers implementing XDP. If the slave device does not match the receive device, then XDP_REDIRECT is transparently used to perform the redirection in order to have the network driver release the packet from its RX ring. The bonding driver hashing functions have been refactored to allow reuse with xdp_buff's to avoid code duplication. The motivation for this change is to enable use of bonding (and 802.3ad) in hairpinning L4 load-balancers such as [1] implemented with XDP and also to transparently support bond devices for projects that use XDP given most modern NICs have dual port adapters. An alternative to this approach would be to implement 802.3ad in user-space and implement the bonding load-balancing in the XDP program itself, but is rather a cumbersome endeavor in terms of slave device management (e.g. by watching netlink) and requires separate programs for native vs bond cases for the orchestrator. A native in-kernel implementation overcomes these issues and provides more flexibility. Below are benchmark results done on two machines with 100Gbit Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and 16-core 3950X on receiving machine. 64 byte packets were sent with pktgen-dpdk at full rate. Two issues [2, 3] were identified with the ice driver, so the tests were performed with iommu=off and patch [2] applied. Additionally the bonding round robin algorithm was modified to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate of cache misses were caused by the shared rr_tx_counter (see patch 2/3). The statistics were collected using "sar -n dev -u 1 10". On top of that, for ice, further work is in progress on improving the XDP_TX numbers [4]. -----------------------| CPU |--| rxpck/s |--| txpck/s |---- without patch (1 dev): XDP_DROP: 3.15% 48.6Mpps XDP_TX: 3.12% 18.3Mpps 18.3Mpps XDP_DROP (RSS): 9.47% 116.5Mpps XDP_TX (RSS): 9.67% 25.3Mpps 24.2Mpps ----------------------- with patch, bond (1 dev): XDP_DROP: 3.14% 46.7Mpps XDP_TX: 3.15% 13.9Mpps 13.9Mpps XDP_DROP (RSS): 10.33% 117.2Mpps XDP_TX (RSS): 10.64% 25.1Mpps 24.0Mpps ----------------------- with patch, bond (2 devs): XDP_DROP: 6.27% 92.7Mpps XDP_TX: 6.26% 17.6Mpps 17.5Mpps XDP_DROP (RSS): 11.38% 117.2Mpps XDP_TX (RSS): 14.30% 28.7Mpps 27.4Mpps -------------------------------------------------------------- RSS: Receive Side Scaling, e.g. the packets were sent to a range of destination IPs. [1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb [2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t [3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/ [4]: https://lore.kernel.org/bpf/20210805230046.28715-1-maciej.fijalkowski@intel.com/T/#t Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20210731055738.16820-4-joamaki@gmail.com
2021-08-09net, bonding: Refactor bond_xmit_hash for use with xdp_buffJussi Maki1-57/+90
In preparation for adding XDP support to the bonding driver refactor the packet hashing functions to be able to work with any linear data buffer without an skb. Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Link: https://lore.kernel.org/bpf/20210731055738.16820-2-joamaki@gmail.com
2021-08-09net: fec: fix build error for ARCH m68kJoakim Zhang1-0/+2
reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross make.cross ARCH=m68k m5272c3_defconfig make.cross ARCH=m68k drivers/net/ethernet/freescale/fec_main.c: In function 'fec_enet_eee_mode_set': >> drivers/net/ethernet/freescale/fec_main.c:2758:33: error: 'FEC_LPI_SLEEP' undeclared (first use in this function); did you mean 'FEC_ECR_SLEEP'? 2758 | writel(sleep_cycle, fep->hwp + FEC_LPI_SLEEP); | ^~~~~~~~~~~~~ arch/m68k/include/asm/io_no.h:25:66: note: in definition of macro '__raw_writel' 25 | #define __raw_writel(b, addr) (void)((*(__force volatile u32 *) (addr)) = (b)) | ^~~~ drivers/net/ethernet/freescale/fec_main.c:2758:2: note: in expansion of macro 'writel' 2758 | writel(sleep_cycle, fep->hwp + FEC_LPI_SLEEP); | ^~~~~~ drivers/net/ethernet/freescale/fec_main.c:2758:33: note: each undeclared identifier is reported only once for each function it appears in 2758 | writel(sleep_cycle, fep->hwp + FEC_LPI_SLEEP); | ^~~~~~~~~~~~~ arch/m68k/include/asm/io_no.h:25:66: note: in definition of macro '__raw_writel' 25 | #define __raw_writel(b, addr) (void)((*(__force volatile u32 *) (addr)) = (b)) | ^~~~ drivers/net/ethernet/freescale/fec_main.c:2758:2: note: in expansion of macro 'writel' 2758 | writel(sleep_cycle, fep->hwp + FEC_LPI_SLEEP); | ^~~~~~ >> drivers/net/ethernet/freescale/fec_main.c:2759:32: error: 'FEC_LPI_WAKE' undeclared (first use in this function) 2759 | writel(wake_cycle, fep->hwp + FEC_LPI_WAKE); | ^~~~~~~~~~~~ arch/m68k/include/asm/io_no.h:25:66: note: in definition of macro '__raw_writel' 25 | #define __raw_writel(b, addr) (void)((*(__force volatile u32 *) (addr)) = (b)) | ^~~~ drivers/net/ethernet/freescale/fec_main.c:2759:2: note: in expansion of macro 'writel' 2759 | writel(wake_cycle, fep->hwp + FEC_LPI_WAKE); | ^~~~~~ This patch adds register definition for M5272 platform to pass build. Fixes: b82f8c3f1409 ("net: fec: add eee mode tx lpi support") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-09devlink: Set device as early as possibleLeon Romanovsky30-80/+75
All kernel devlink implementations call to devlink_alloc() during initialization routine for specific device which is used later as a parent device for devlink_register(). Such late device assignment causes to the situation which requires us to call to device_register() before setting other parameters, but that call opens devlink to the world and makes accessible for the netlink users. Any attempt to move devlink_register() to be the last call generates the following error due to access to the devlink->dev pointer. [ 8.758862] devlink_nl_param_fill+0x2e8/0xe50 [ 8.760305] devlink_param_notify+0x6d/0x180 [ 8.760435] __devlink_params_register+0x2f1/0x670 [ 8.760558] devlink_params_register+0x1e/0x20 The simple change of API to set devlink device in the devlink_alloc() instead of devlink_register() fixes all this above and ensures that prior to call to devlink_register() everything already set. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-09wwan: mhi: Fix missing spin_lock_init() in mhi_mbim_probe()Wei Yongjun1-0/+1
The driver allocates the spinlock but not initialize it. Use spin_lock_init() on it to initialize it correctly. Fixes: aa730a9905b7 ("net: wwan: Add MHI MBIM network driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-08net: dsa: sja1105: add FDB fast ageing supportVladimir Oltean1-0/+41
Delete the dynamically learned FDB entries when the STP state changes and when address learning is disabled. On sja1105 there is no shorthand SPI command for this, so we need to walk through the entire FDB to delete. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-08net: dsa: sja1105: rely on DSA core tracking of port learning stateVladimir Oltean2-20/+13
Now that DSA keeps track of the port learning state, it becomes superfluous to keep an additional variable with this information in the sja1105 driver. Remove it. The DSA core's learning state is present in struct dsa_port *dp. To avoid the antipattern where we iterate through a DSA switch's ports and then call dsa_to_port to obtain the "dp" reference (which is bad because dsa_to_port iterates through the DSA switch tree once again), just iterate through the dst->ports and operate on those directly. The sja1105 had an extra use of priv->learn_ena on non-user ports. DSA does not touch the learning state of those ports - drivers are free to do what they wish on them. Mark that information with a comment in struct dsa_port and let sja1105 set dp->learning for cascade ports. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-08net: dsa: centralize fast ageing when address learning is turned offVladimir Oltean1-7/+0
Currently DSA leaves it down to device drivers to fast age the FDB on a port when address learning is disabled on it. There are 2 reasons for doing that in the first place: - when address learning is disabled by user space, through IFLA_BRPORT_LEARNING or the brport_attr_learning sysfs, what user space typically wants to achieve is to operate in a mode with no dynamic FDB entry on that port. But if the port is already up, some addresses might have been already learned on it, and it seems silly to wait for 5 minutes for them to expire until something useful can be done. - when a port leaves a bridge and becomes standalone, DSA turns off address learning on it. This also has the nice side effect of flushing the dynamically learned bridge FDB entries on it, which is a good idea because standalone ports should not have bridge FDB entries on them. We let drivers manage fast ageing under this condition because if DSA were to do it, it would need to track each port's learning state, and act upon the transition, which it currently doesn't. But there are 2 reasons why doing it is better after all: - drivers might get it wrong and not do it (see b53_port_set_learning) - we would like to flush the dynamic entries from the software bridge too, and letting drivers do that would be another pain point So track the port learning state and trigger a fast age process automatically within DSA. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-08atm: horizon: Fix spelling mistakes in TX commentJun Miao1-3/+3
It's "must not", not "musn't", meaning "shall not". Let's fix that. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Jun Miao <jun.miao@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-08devlink: Simplify devlink port API callsLeon Romanovsky4-16/+12
Devlink port already has pointer to the devlink instance and all API calls that forward these devlink ports to the drivers perform same "devlink_port->devlink" assignment before actual call. This patch removes useless parameter and allows us in the future to create specific devlink_port_ops to manage user space access with reliable ops assignment. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-07net: bonding: bond_alb: Remove the dependency on ipx network layerCai Huoqing1-32/+0
commit <47595e32869f> ("<MAINTAINERS: Mark some staging directories>") indicated the ipx network layer as obsolete in Jan 2018, updated in the MAINTAINERS file now, after being exposed for 3 years to refactoring, so to delete the ipx net layer related code for good. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-07net: ethernet: stmmac: Do not use unreachable() in ipq806x_gmac_probe()Nathan Chancellor1-10/+8
When compiling with clang in certain configurations, an objtool warning appears: drivers/net/ethernet/stmicro/stmmac/dwmac-ipq806x.o: warning: objtool: ipq806x_gmac_probe() falls through to next function phy_modes() This happens because the unreachable annotation in the third switch statement is not eliminated. The compiler should know that the first default case would prevent the second and third from being reached as the comment notes but sanitizer options can make it harder for the compiler to reason this out. Help the compiler out by eliminating the unreachable() annotation and unifying the default case error handling so that there is no objtool warning, the meaning of the code stays the same, and there is less duplication. Reported-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-07s390/qeth: Update MACs of LEARNING_SYNC deviceAlexandra Winter1-4/+127
Update the MAC addresses that are registered with a LEARNING_SYNC qeth device with the events announced by the attached software bridge. Typically the LEARNING_SYNC qeth bridge port has an isolated sibling (the default interface of an 'HiperSockets Converged Interface' (HSCI)). Update the MACs of isolated siblings as well, to avoid unnecessary flooding in the attached virtualized switches. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-07s390/qeth: Switchdev event handlerAlexandra Winter2-5/+79
QETH HiperSockets devices with LEARNING_SYNC capability can be used to construct a linux bridge with: 2 isolated southbound interfaces: a) a default network interface b) a LEARNING-SYNC HiperSockets interface and 1 non-isolated northbound interface. This is called a 'HiperSockets Converged Interface' (HSCI). The existing LEARNING_SYNC functionality is used to update the bridge fdb with MAC addresses that should be sent-out via the HiperSockets interface, instead of the default network interface. Add handling of switchdev events SWITCHDEV_FDB_ADD_TO_DEVICE and SWITCHDEV_FDB_DEL_TO_DEVICE to the qeth LEARNING_SYNC functionality. Thus if the northbound bridgeport of an HSCI doesn't only have a single static MAC address, but instead is a learning bridgeport, work is enqueued, so the HiperSockets virtual switch (that is external to this Linux instance) can update its fdb. When BRIDGE is a loadable module, QETH_L2 mustn't be built-in: drivers/s390/net/qeth_l2_main.o: in function 'qeth_l2_switchdev_event': drivers/s390/net/qeth_l2_main.c:927: undefined reference to 'br_port_flag_is_set' Add Kconfig dependency to enforce usable configurations. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-07s390/qeth: Register switchdev event handlerAlexandra Winter1-3/+40
Conditionally register a qeth_l2 switchdev_event handler to handle bridge to device switchdev events, when at least one qeth interface has the bridgeport attribute LEARNING_SYNC enabled. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-07tulip: Remove deadcode on startup true conditionColin Ian King1-1/+1
The true check on the variable startable in the ternary operator is always false because the previous if statement handles the true condition for startable. Hence the ternary check is dead code and can be removed. Addresses-Coverity: ("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06net: ethernet: ti: davinci_cpdma: revert "drop frame padding"Grygorii Strashko4-0/+8
This reverts commit 9ffc513f95ee ("net: ethernet: ti: davinci_cpdma: drop frame padding") which has depndency from not yet merged patch [1] and so breaks cpsw_new driver. [1] https://patchwork.kernel.org/project/netdevbpf/patch/20210805145511.12016-1-grygorii.strashko@ti.com/ Fixes: 9ffc513f95ee ("net: ethernet: ti: davinci_cpdma: drop frame padding") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Link: https://lore.kernel.org/r/20210806142809.15069-1-grygorii.strashko@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-06vrf: fix NULL dereference in vrf_finish_output()Dan Carpenter1-1/+1
The "skb" pointer is NULL on this error path so we can't dereference it. Use "dev" instead. Fixes: 14ee70ca89e6 ("vrf: use skb_expand_head in vrf_finish_output") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20210806150435.GB15586@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-06net: dsa: mt7530: drop untagged frames on VLAN-aware ports without PVIDDENG Qingfang2-2/+37
The driver currently still accepts untagged frames on VLAN-aware ports without PVID. Use PVC.ACC_FRM to drop untagged frames in that case. Signed-off-by: DENG Qingfang <dqfext@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06net: dsa: don't disable multicast flooding to the CPU even without an IGMP ↵Vladimir Oltean4-31/+0
querier Commit 08cc83cc7fd8 ("net: dsa: add support for BRIDGE_MROUTER attribute") added an option for users to turn off multicast flooding towards the CPU if they turn off the IGMP querier on a bridge which already has enslaved ports (echo 0 > /sys/class/net/br0/bridge/multicast_router). And commit a8b659e7ff75 ("net: dsa: act as passthrough for bridge port flags") simply papered over that issue, because it moved the decision to flood the CPU with multicast (or not) from the DSA core down to individual drivers, instead of taking a more radical position then. The truth is that disabling multicast flooding to the CPU is simply something we are not prepared to do now, if at all. Some reasons: - ICMP6 neighbor solicitation messages are unregistered multicast packets as far as the bridge is concerned. So if we stop flooding multicast, the outside world cannot ping the bridge device's IPv6 link-local address. - There might be foreign interfaces bridged with our DSA switch ports (sending a packet towards the host does not necessarily equal termination, but maybe software forwarding). So if there is no one interested in that multicast traffic in the local network stack, that doesn't mean nobody is. - PTP over L4 (IPv4, IPv6) is multicast, but is unregistered as far as the bridge is concerned. This should reach the CPU port. - The switch driver might not do FDB partitioning. And since we don't even bother to do more fine-grained flood disabling (such as "disable flooding _from_port_N_ towards the CPU port" as opposed to "disable flooding _from_any_port_ towards the CPU port"), this breaks standalone ports, or even multiple bridges where one has an IGMP querier and one doesn't. Reverting the logic makes all of the above work. Fixes: a8b659e7ff75 ("net: dsa: act as passthrough for bridge port flags") Fixes: 08cc83cc7fd8 ("net: dsa: add support for BRIDGE_MROUTER attribute") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06net: dsa: mt7530: remove the .port_set_mrouter implementationVladimir Oltean1-13/+0
DSA's idea of optimizing out multicast flooding to the CPU port leaves quite a few holes open, so it should be reverted. The mt7530 driver is the only new driver which added a .port_set_mrouter implementation after the reorg from commit a8b659e7ff75 ("net: dsa: act as passthrough for bridge port flags"), so it needs to be reverted separately so that the other revert commit can go a bit further down the git history. Fixes: 5a30833b9a16 ("net: dsa: mt7530: support MDB and bridge flag operations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06net: ethernet: ti: am65-cpsw: use napi_complete_done() in TX completionGrygorii Strashko1-5/+5
This patch enables support for hard irqs deferral feature from Eric Dumazet [1] for TI K3 CPSW driver by using napi_complete_done() in TX completion path. Depending on gro_flush_timeout and napi_defer_hard_irqs at gives up to 30% CPU utilization reduction: gro_flush_timeout=50000 napi_defer_hard_irqs=2 netperf -l 10 -H 192.168.1.1 -t UDP_STREAM -c -C -- -m 1470 MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB before: 212992 1470 10.00 809632 0 952.0 42.98 14.792 212992 10.00 809630 952.0 50.66 8.719 after: 212992 1470 10.00 813686 0 956.8 32.14 11.009 212992 10.00 813686 956.8 50.05 8.570 [1] https://lore.kernel.org/netdev/20200422161329.56026-1-edumazet@google.com/ Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06net: ti: am65-cpsw-nuss: fix RX IRQ state after .ndo_stop()Vignesh Raghavendra2-2/+13
On TI K3 am64x platform the issue with RX IRQ is observed - it's become disabled forever after .ndo_stop(). The K3 CPSW driver manipulates RX IRQ by using standard Linux enable_irq()/disable_irq_nosync() API as there is no IRQ enable/disable options in CPSW HW itself, as result during .ndo_stop() following sequence happens phy_stop() teardown TX/RX channels wait for TX tdown complete napi_disable(TX) clean up TX channels (a) napi_disable(RX) At point (a) it's not possible to predict if RX IRQ was triggered or not. if RX IRQ was triggered then it also not possible to definitely say if RX NAPI was run or only scheduled and immediately canceled by napi_disable(RX). Actually the last case causes RX IRQ to be permanently disabled. Another observed issue is that RX IRQ enable counter become unbalanced if (gro_flush_timeout =! 0) while (napi_defer_hard_irqs == 0): Unbalanced enable for IRQ 44 WARNING: CPU: 0 PID: 10 at ../kernel/irq/manage.c:776 __enable_irq+0x38/0x80 __enable_irq+0x38/0x80 enable_irq+0x54/0xb0 am65_cpsw_nuss_rx_poll+0x2f4/0x368 __napi_poll+0x34/0x1b8 net_rx_action+0xe4/0x220 _stext+0x11c/0x284 run_ksoftirqd+0x4c/0x60 To avoid above issues introduce flag indicating if RX was actually disabled before enabling it in am65_cpsw_nuss_rx_poll() and restore RX IRQ state in .ndo_open() Fixes: 4f7cce272403 ("net: ethernet: ti: am65-cpsw: add support for am64x cpsw3g") Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06ptp: ocp: Remove pending_image indicator from devlinkJonathan Lemon1-10/+0
After writing an image blob to the flash memory, a reboot is required to reload the FPGA. There is no versioning prsent in the FPGA image file, so only a running version is available. The 'stored version' was set to 'pending' in order to indicate a reboot was needed. This isn't reliable, as the module could be unloaded/loaded, losing the "reboot needed" indicator. Also, the devlink 'stored version' information is designed to refer to the actual image version. Unfortunately, there is no method to determine the flash image version other than booting it, so remove the devlink stored version setting. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06ptp: ocp: Rename version string shown by devlink.Jonathan Lemon1-2/+2
The TimeCard has two FPGA images in the flash: the actual firmware, and a manufacturing fallback version which is intended to act as a loader in case the flash update failed. Name these "fw" and "loader", which are reflected in devlink: [root@timecard drv]# devlink dev info pci/0000:04:00.0: driver ptp_ocp serial_number fc:c2:3d:2e:d7:c0 versions: running: fw 5 Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06ptp: ocp: Use 'gnss' naming instead of 'gps'Jonathan Lemon1-21/+21
GPS is not the only available positioning system. Use the generic naming of "GNSS" instead. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06ptp: ocp: Remove devlink health and unused parameters.Jonathan Lemon1-80/+0
"devlink health" was used as a way to monitor the GNSS signal status. This isn't really the intended use, and the same functionality can be achived by monitoring the status file. Remove the devlink heath support entirely, and also remove the currently unused devlink parameters. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06ptp: ocp: Add the mapping for the external PPS registers.Jonathan Lemon1-4/+11
There are two PPS blocks: one handles the external PPS signal output, with the other handling the PPS signal input to the internal clock. Add controls for the external PPS block. Rename the fields so they match their function. Add cable_delay to the register definitions. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06ptp: ocp: Fix the error handling path for the class device.Jonathan Lemon1-1/+1
Move the put_device() call to the error handling path, so the device is released after the .release callback, avoiding a use-after-free. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06netdevsim: Protect both reload_down and reload_up pathsLeon Romanovsky3-1/+22
Don't progress with adding and deleting ports as long as devlink reload is running. Fixes: 23809a726c0d ("netdevsim: Forbid devlink reload when adding or deleting ports") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06net: ethernet: ti: davinci_cpdma: drop frame paddingGrygorii Strashko4-8/+0
Hence all users of davinci_cpdma switched to skb_put_padto() the frame padding can be removed from it. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06net: ethernet: ti: davinci_emac: switch to use skb_put_padto()Grygorii Strashko1-1/+1
Use skb_put_padto() instead of skb_padto() so skb->len also got updated, as preparation for further removing frame padding from cpdma. It also makes xmit path more clear and linear. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06net: ethernet: ti: cpsw: switch to use skb_put_padto()Grygorii Strashko1-1/+1
Use skb_put_padto() instead of skb_padto() so skb->len also got updated, as preparation for further removing frame padding from cpdma. It also makes xmit path more clear and linear. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski47-196/+392
Build failure in drivers/net/wwan/mhi_wwan_mbim.c: add missing parameter (0, assuming we don't want buffer pre-alloc). Conflict in drivers/net/dsa/sja1105/sja1105_main.c between: 589918df9322 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too") 0fac6aa098ed ("net: dsa: sja1105: delete the best_effort_vlan_filtering mode") Follow the instructions from the commit message of the former commit - removed the if conditions. When looking at commit 589918df9322 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too") note that the mask_iotag fields get removed by the following patch. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-05net/mlx5: Lag, Create shared FDB when in switchdev modeMark Bloch3-18/+105
If both eswitches are in switchdev mode and the uplink representors are enslaved to the same bond device create a shared FDB configuration. When moving to shared FDB mode not only the hardware needs be configured but the RDMA driver needs to reconfigure itself. When such change is done, unload the RDMA devices, configure the hardware and load the RDMA representors. When destroying the lag (can happen if a PCI function is unbinded, driver is unloaded or by just removing a netdev from the bond) make sure to restore the system to the previous state only if possible. For example, if a PCI function is unbinded there is no need to load the representors as the device is going away. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: E-Switch, add logic to enable shared FDBMark Bloch6-2/+394
Shared FDB allows to direct traffic from all the vports in the HCA to a single eswitch. In order to do that three things are needed. 1) Point the ingress ACL of the slave uplink to that of the master. With this, wire traffic from both uplinks will reach the same eswitch with the same metadata where a single steering rule can catch traffic from both ports. 2) Set the FDB root flow table of the slave's eswitch to that of the master. As this flow table can change dynamically make sure to sync it on any set root flow table FDB command. This will make sure traffic from SFs, VFs, ECPFs and PFs reach the master eswitch. 3) Split wire traffic at the eswitch manager egress ACL so that it's directed to the native eswitch manager. We only treat wire traffic from both ports the same at the eswitch level. If such traffic wasn't handled in the eswitch it needs to reach the right representor to be processed by software. For example LACP packets should *always* reach the right uplink representor for correct operation. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: Lag, move lag destruction to a workqueueMark Bloch1-8/+9
If a netdev is removed from the lag the lag should be destroyed. With downstream patches this might trigger a reconfiguration of representors on a different eswitch and such we don't have the proper locking to so from this path. Move the destruction to be done by the workqueue. As the destruction won't affect the netdev side it okay to do so. The RDMA side will be reconfigured and it already coded to handle such reconfiguration. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: Lag, properly lock eswitch if neededMark Bloch7-18/+107
Currently when doing hardware lag we check the eswitch mode but as this isn't done under a lock the check isn't valid. As the code needs to sync between two different devices an extra care is needed. - When going to change eswitch mode, if hardware lag is active destroy it. - While changing eswitch modes block any hardware bond creation. - Delay handling bonding events until there are no mode changes in progress. - When attaching a new mdev to lag, block until there is no mode change in progress. In order for the mode change to finish the interface lock will have to be taken. Release the lock and sleep for 100ms to allow forward progress. As this is a very rare condition (can happen if the user unbinds and binds a PCI function while also changing eswitch mode of the other PCI function) it has no real world impact. As taking multiple eswitch mode locks is now required lockdep will complain about a possible deadlock. Register a key per eswitch to make lockdep happy. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: Add send to vport rules on paired deviceMark Bloch3-3/+101
When two mlx5 devices are paired in switchdev mode, always offload the send-to-vport rule to the peer E-Switch. This allows to abstract the logic when this is really necessary (single FDB) and combine the logic of both cases into one. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>