summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2020-04-24net: phylink, dsa: eliminate phylink_fixed_state_cb()Russell King3-43/+29
Move the callback into the phylink_config structure, rather than providing a callback to set this up. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24Merge branch 'qdisc-noop'David S. Miller2-3/+6
Jesper Dangaard Brouer says: ==================== Fix qdisc noop issue caused by driver and identify future bugs I've been very puzzled why networking on my NXP development board, using driver dpaa2-eth, stopped working when I updated the kernel version >= 5.3. The observable issue were that interface would drop all TX packets, because it had assigned the qdisc noop. This turned out the be a NIC driver bug, that would only get triggered when using sysctl net/core/default_qdisc=fq_codel. It was non-trivial to find out[1] this was driver related. Thus, this patchset besides fixing the driver bug, also helps end-user identify the issue. [1]: https://github.com/xdp-project/xdp-project/blob/master/areas/arm64/board_nxp_ls1088/nxp-board04-troubleshoot-qdisc.org ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24dpaa2-eth: fix return codes used in ndo_setup_tcJesper Dangaard Brouer1-2/+2
Drivers ndo_setup_tc call should return -EOPNOTSUPP, when it cannot support the qdisc type. Other return values will result in failing the qdisc setup. This lead to qdisc noop getting assigned, which will drop all TX packets on the interface. Fixes: ab1e6de2bd49 ("dpaa2-eth: Add mqprio support") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24net: sched: report ndo_setup_tc failures via extackJesper Dangaard Brouer1-1/+4
Help end-users of the 'tc' command to see if the drivers ndo_setup_tc function call fails. Troubleshooting when this happens is non-trivial (see full process here[1]), and results in net_device getting assigned the 'qdisc noop', which will drop all TX packets on the interface. [1]: https://github.com/xdp-project/xdp-project/blob/master/areas/arm64/board_nxp_ls1088/nxp-board04-troubleshoot-qdisc.org Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24Merge branch 'mlxsw-Mirroring-cleanups'David S. Miller2-34/+39
Ido Schimmel says: ==================== mlxsw: Mirroring cleanups This patch set contains various cleanups in SPAN (mirroring) code noticed by Amit and I while working on future enhancements in this area. No functional changes intended. Tested by current mirroring selftests. Patches #1-#2 from Amit reduce nesting in a certain function and rename a callback to a more meaningful name. Patch #3 removes debug prints that have little value. Patch #4 converts a reference count to 'refcount_t' in order to catch over/under flows. Patch #5 replaces a zero-length array with flexible-array member in order to get a compiler warning in case the flexible array does not occur last in the structure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24mlxsw: spectrum_span: Replace zero-length array with flexible-array memberIdo Schimmel1-1/+1
In a similar fashion to commit e99f8e7f88b5 ("mlxsw: Replace zero-length array with flexible-array member"), use a flexible-array member to get a compiler warning in case the flexible array does not occur last in the structure. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24mlxsw: spectrum_span: Use 'refcount_t' for reference countingIdo Schimmel2-9/+10
'refcount_t' is very useful for catching over/under flows. Convert the SPAN agent objects to use it instead of 'int' for their reference count. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24mlxsw: spectrum_span: Remove unnecessary debug printsIdo Schimmel1-5/+0
To the best of my knowledge, these debug prints were never used. Remove them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24mlxsw: spectrum_span: Rename parms() to parms_set()Amit Cohen2-9/+9
Use a more meaningful name for parms() function. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24mlxsw: spectrum_span: Reduce nesting in mlxsw_sp_span_entry_configure()Amit Cohen1-10/+19
Use early return to avoid unnecessary nesting. Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23Merge branch 'ovs-meter-tables'David S. Miller3-78/+247
Tonghao Zhang says: ==================== openvswitch: expand meter tables and fix bug The patch set expand or shrink the meter table when necessary. and other patches fix bug or improve codes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: openvswitch: use u64 for meter bucketTonghao Zhang2-2/+2
When setting the meter rate to 4+Gbps, there is an overflow, the meters don't work as expected. Cc: Pravin B Shelar <pshelar@ovn.org> Cc: Andy Zhou <azhou@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: openvswitch: make EINVAL return value more obviousTonghao Zhang1-3/+2
Cc: Pravin B Shelar <pshelar@ovn.org> Cc: Andy Zhou <azhou@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: openvswitch: remove the unnecessary checkTonghao Zhang1-5/+4
Before invoking the ovs_meter_cmd_reply_stats, "meter" was checked, so don't check it agin in that function. Cc: Pravin B Shelar <pshelar@ovn.org> Cc: Andy Zhou <azhou@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: openvswitch: set max limitation to metersTonghao Zhang2-10/+49
Don't allow user to create meter unlimitedly, which may cause to consume a large amount of kernel memory. The max number supported is decided by physical memory and 20K meters as default. Cc: Pravin B Shelar <pshelar@ovn.org> Cc: Andy Zhou <azhou@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: openvswitch: expand the meters supported numberTonghao Zhang3-63/+195
In kernel datapath of Open vSwitch, there are only 1024 buckets of meter in one datapath. If installing more than 1024 (e.g. 8192) meters, it may lead to the performance drop. But in some case, for example, Open vSwitch used as edge gateway, there should be 20K at least, where meters used for IP address bandwidth limitation. [Open vSwitch userspace datapath has this issue too.] For more scalable meter, this patch use meter array instead of hash tables, and expand/shrink the array when necessary. So we can install more meters than before in the datapath. Introducing the struct *dp_meter_instance, it's easy to expand meter though changing the *ti point in the struct *dp_meter_table. Cc: Pravin B Shelar <pshelar@ovn.org> Cc: Andy Zhou <azhou@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: phy: bcm54140: fix less than zero comparison on an unsignedColin Ian King1-2/+4
Currently the unsigned variable tmp is being checked for an negative error return from the call to bcm_phy_read_rdb and this can never be true since tmp is unsigned. Fix this by making tmp a plain int. Addresses-Coverity: ("Unsigned compared against 0") Fixes: 4406d36dfdf1 ("net: phy: bcm54140: add hwmon support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23qed: Make ll2_cbs staticZou Wei1-1/+1
Fix the following sparse warning: drivers/net/ethernet/qlogic/qed/qed_ll2.c:2334:20: warning: symbol 'll2_cbs' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: sched : Remove unnecessary cast in kfreeXu Wang1-1/+1
Remove unnecassary casts in the argument to kfree. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23Merge branch 'net-ethernet-ti-cpts-add-irq-and-HW_TS_PUSH-events'David S. Miller7-149/+363
Grygorii Strashko says: ==================== net: ethernet: ti: cpts: add irq and HW_TS_PUSH events This is re-spin of patches to add CPSW IRQ and HW_TS_PUSH events support I've sent long time ago [1]. In this series, I've tried to restructure and split changes, and also add few additional optimizations comparing to initial RFC submission [1]. The HW_TS_PUSH events intended to serve for different timesync purposes on of which is to add PPS generation function, which can be implemented as below: +-----------------+ | Control | | application | +------->+ +----------+ | | | | | | | | | +-----------------+ | | | | | | PTP_EXTTS_REQUEST | | | | | +----------------------------------------------------------------+ | | Kernel +-------+----------+ +-------v--------+ | \dev\ptpX | | /sys/class/pwm/| | | | | +-------^----------+ +-------+--------+ | | | | | +-------v-------------------+ +-------+----------+ | | | CPTS driver | |pwm/pwm-omap-dmtimer.c | | | +---------------------------+ +-------^----------+ |clocksource/timer_ti_dm.c | | +-------+-------------------+ |HWx_TS_PUSH evt | +----------------------------------------------------------------+ | | HW +-------+----------+ +-------v--------+ | CPTS | | DMTimer | | | | | | HWx_TS_PUSH X<-----------------+ | | + | | +------------------+ +-------+--------+ | X timer4 As per my knowledge there is at least one public implemented above PPS generation schema from Tusori Tibor [2] based on initial HW_TS_PUSH enable submission[1]. And now there is work done by Lokesh Vutla <lokeshvutla@ti.com> published to enable PWM enable/improve PWM adjustment from user space [3][4][5]. Main changes comparing to initial submission: - TX timestamp processing deferred to ptp worker only - both CPTS IRQ and polling events processing supported to make it work for Keystone 2 also - switch to use new .gettimex64() interface - no DT updates as number of HWx_TS_PUSH inputs is static per HW Testing on am571x-idk/omap2plus_defconfig/+CONFIG_PREEMPT=y: 1) testing HW_TS_PUSH - enable pwm in DT pwm16: dmtimer-pwm { compatible = "ti,omap-dmtimer-pwm"; ti,timers = <&timer16>; #pwm-cells = <3>; }; - configure and start pwm echo 0 > /sys/class/pwm/pwmchip0/export echo 1000000000 > /sys/class/pwm/pwmchip0/pwm0/period echo 500000000 > /sys/class/pwm/pwmchip0/pwm0/duty_cycle echo 1 > /sys/class/pwm/pwmchip0/pwm0/enable - test HWx_TS_PUSH using Kernel selftest testptp application ./tools/testing/selftests/ptp/testptp -d /dev/ptp0 -e 1000 -i 3 2) testing phc2sys phc2sys[1616.791]: eth0 rms 408190379792180864 max 1580914543017209856 freq +864 +/- 4635 delay 645 +/- 29 phc2sys[1646.795]: eth0 rms 41 max 108 freq +0 +/- 36 delay 656 +/- 29 phc2sys[1676.800]: eth0 rms 43 max 83 freq +2 +/- 38 delay 650 +/- 0 phc2sys[1706.804]: eth0 rms 39 max 87 freq +4 +/- 34 delay 672 +/- 55 phc2sys[1736.808]: eth0 rms 35 max 66 freq +1 +/- 30 delay 667 +/- 49 phc2sys[1766.813]: eth0 rms 38 max 79 freq +2 +/- 33 delay 656 +/- 29 phc2sys[1796.817]: eth0 rms 45 max 98 freq +1 +/- 39 delay 656 +/- 29 phc2sys[1826.821]: eth0 rms 40 max 87 freq +5 +/- 35 delay 650 +/- 0 phc2sys[1856.826]: eth0 rms 29 max 76 freq -0 +/- 25 delay 656 +/- 29 phc2sys[1886.830]: eth0 rms 40 max 97 freq +4 +/- 35 delay 667 +/- 49 phc2sys[1916.834]: eth0 rms 42 max 94 freq +2 +/- 36 delay 661 +/- 41 phc2sys[1946.839]: eth0 rms 40 max 91 freq +2 +/- 35 delay 661 +/- 41 phc2sys[1976.843]: eth0 rms 46 max 88 freq -0 +/- 40 delay 667 +/- 49 phc2sys[2006.847]: eth0 rms 49 max 97 freq +2 +/- 43 delay 650 +/- 0 3) testing ptp4l - 1G connection ptp4l[862.891]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[923.894]: rms 1019697354682 max 5768279314068 freq +26053 +/- 72 delay 488 +/- 1 ptp4l[987.896]: rms 13 max 26 freq +26005 +/- 29 delay 488 +/- 1 ptp4l[1051.899]: rms 14 max 50 freq +25895 +/- 21 delay 488 +/- 1 ptp4l[1115.901]: rms 11 max 27 freq +25878 +/- 17 delay 488 +/- 1 ptp4l[1179.904]: rms 10 max 27 freq +25857 +/- 12 delay 488 +/- 1 ptp4l[1243.906]: rms 14 max 37 freq +25851 +/- 15 delay 488 +/- 1 ptp4l[1307.909]: rms 12 max 33 freq +25835 +/- 15 delay 488 +/- 1 ptp4l[1371.911]: rms 11 max 27 freq +25832 +/- 14 delay 488 +/- 1 ptp4l[1435.914]: rms 11 max 26 freq +25823 +/- 11 delay 488 +/- 1 ptp4l[1499.916]: rms 10 max 29 freq +25829 +/- 11 delay 489 +/- 1 ptp4l[1563.919]: rms 11 max 27 freq +25827 +/- 12 delay 488 +/- 1 - 10M connection ptp4l[51.955]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[112.957]: rms 279468848453933920 max 1580914542977391360 freq +25390 +/- 3207 delay 8222 +/- 36 ptp4l[176.960]: rms 254 max 522 freq +25809 +/- 219 delay 8271 +/- 30 ptp4l[240.962]: rms 271 max 684 freq +25868 +/- 234 delay 8249 +/- 22 ptp4l[304.965]: rms 263 max 556 freq +25894 +/- 227 delay 8225 +/- 47 ptp4l[368.967]: rms 238 max 648 freq +25908 +/- 204 delay 8234 +/- 40 ptp4l[432.970]: rms 274 max 658 freq +25932 +/- 237 delay 8241 +/- 22 ptp4l[496.972]: rms 247 max 557 freq +25943 +/- 213 delay 8223 +/- 26 ptp4l[560.974]: rms 291 max 756 freq +25968 +/- 251 delay 8244 +/- 41 ptp4l[624.977]: rms 249 max 697 freq +25975 +/- 216 delay 8258 +/- 22 Changes in v5: - fixed build issue Changes in v4: - fixed comments from Richard Cochran - dropped patch "net: ethernet: ti: cpts: move rx timestamp processing to ptp worker only" - added "Acked-by" from Richard Cochran <richardcochran@gmail.com> - dependencies resolved, patch merged Changes in v3: - fixed rebase mess - fixed build issues Changes in v2 (broken): - fixed (formatting) comments from David Miller <davem@davemloft.net> v4: https://patchwork.ozlabs.org/project/netdev/cover/20200422201254.15232-1-grygorii.strashko@ti.com/ v3: https://patchwork.ozlabs.org/project/netdev/cover/20200320194244.4703-1-grygorii.strashko@ti.com/ v2: https://patchwork.ozlabs.org/cover/1258339/ v1: https://patchwork.ozlabs.org/cover/1254708/ [1] https://lore.kernel.org/patchwork/cover/799251/ [2] https://usermanual.wiki/Document/SetupGuide.632280828.pdf https://github.com/t-tibor/msc_thesis [3] https://patchwork.kernel.org/cover/11421329/ [4] https://patchwork.kernel.org/cover/11433197/ [5] https://sourceforge.net/p/linuxptp/mailman/message/36943248/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: ethernet: ti: cpsw: enable cpts irqGrygorii Strashko4-0/+55
The CPSW misc IRQ need be enabled for CPTS event_pend IRQs processing. This patch adds corresponding support to CPSW driver. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: ethernet: ti: cpts: add support for HW_TS_PUSH eventsGrygorii Strashko4-5/+57
Hence CPTS IRQ support is in place the W_TS_PUSH events can be added. PWM capable DmTimers can be used to generete input signals for CPTS on TI AM335x/AM437x/DRA7 SoCs to be timestamped: AM335x/AM437x: timer4 - timer7 DRA7/AM57xx: timer13 - timer16 Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: ethernet: ti: cpts: add irq supportGrygorii Strashko2-1/+38
Add CPTS IRQ support, but do not enable it. By default, the CPTS driver will continue working using polling mode which is required for CPTS to continue working on platforms other than CPSW, like Keystone 2. The CPTS IRQ support is required to enable support for HW_TS_PUSH events. The CPSW CPTS IRQ and HW_TS_PUSH events support will be enabled in follow up patches. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: ethernet: ti: cpts: rework lockingGrygorii Strashko2-24/+32
Now spinlock is used to synchronize everything which is not required. Add mutex and use to sync access to PTP interface and PTP worker and use spinlock only to sync FIFO/events processing. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: ethernet: ti: cpts: move tx timestamp processing to ptp worker onlyGrygorii Strashko1-71/+94
Now the tx timestamp processing happens from different contexts - softirq and thread/PTP worker. Enabling IRQ will add one more hard_irq context. This makes over all defered TX timestamp processing and locking overcomplicated. Move tx timestamp processing to PTP worker always instead. napi_rx->cpts_tx_timestamp if ptp_packet then push to txq ptp_schedule_worker() do_aux_work->cpts_overflow_check cpts_process_events() Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: ethernet: ti: cpts: optimize packet to event matchingGrygorii Strashko1-33/+58
Now the CPTS driver performs packet (skb) parsing every time when it needs to match packet to CPTS event (including ptp_classify_raw() calls). This patch optimizes matching process by parsing packet only once upon arrival and stores PTP specific data in skb->cb using the same fromat as in CPTS HW event. As result, all future matching reduces to comparing two u32 values. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: ethernet: ti: cpts: switch to use new .gettimex64() interfaceGrygorii Strashko1-8/+14
The CPTS HW latches and saves CPTS counter value in CPTS fifo immediately after writing to CPSW_CPTS_PUSH.TS_PUSH (bit 0), so the total time that the driver needs to read the CPTS timestamp is the time required CPSW_CPTS_PUSH write to actually reach HW. Hence switch CPTS driver to implement new .gettimex64() callback for more precise measurement of the offset between a PHC and the system clock which is measured as time between write(CPSW_CPTS_PUSH) read(CPSW_CPTS_PUSH) Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: ethernet: ti: cpts: move tc mult update in cpts_fifo_read()Grygorii Strashko2-2/+7
Now CPTS driver .adjfreq() generates request to read CPTS current time (CPTS_EV_PUSH) with intention to process all pending event using previous frequency adjustment values before switching to the new ones. So CPTS_EV_PUSH works as a marker to switch to the new frequency adjustment values. Current code assumes that all job is done in .adjfreq(), but after enabling IRQ this will not be true any more. Hence save new frequency adjustment values (mult) and perform actual freq adjustment in cpts_fifo_read() immediately after CPTS_EV_PUSH is received. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: ethernet: ti: cpts: separate hw counter read from timecounterGrygorii Strashko2-26/+29
Now CPTS HW time reading code is implemented in timecounter->cyclecounter .read() callback and performs following operations: timecounter_read() ->cc.read() -> cpts_systim_read() - request current CPTS HW time CPTS_TS_PUSH.TS_PUSH = 1 - poll CPTS FIFO for CPTS_EV_PUSH event with current HW timestamp This approach need to be changed for the future switch to PTP PHC .gettimex64() callback, which require to separate requesting current CPTS HW time and processing CPTS FIFO. And for the follow up patch, which improves .adjfreq() implementation. This patch moves code accessing CPTS HW out of timecounter code as following: - convert HW timestamp of every CPTS event to PTP time (us) and store it as part struct cpts_event; - add CPTS context field to store current CPTS HW time (counter) value and update it on CPTS_EV_PUSH reception; - move code accessing CPTS HW out of timecounter code and use current CPTS HW time (counter) from CPTS context instead; - ensure timecounter->cycle_last is updated on CPTS_EV_PUSH reception. After this change CPTS timecounter will only perform timekeeper role without actually accessing CPTS HW. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: ethernet: ti: cpts: use dev_yy() api for logsGrygorii Strashko1-6/+6
Use dev_yy() API instead of pr_yy() for log outputs. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23Merge branch 'net-napi-addition-of-napi_defer_hard_irqs'David S. Miller6-25/+52
Eric Dumazet says: ==================== net: napi: addition of napi_defer_hard_irqs This patch series augments gro_glush_timeout feature with napi_defer_hard_irqs As extensively described in first patch changelog, this can suppresss the chit-chat traffic between NIC and host to signal interrupts and re-arming them, since this can be an issue on high speed NIC with many queues. The last patch in this series converts mlx4 TX completion to napi_complete_done(), to enable this new mechanism. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net/mlx4_en: use napi_complete_done() in TX completionEric Dumazet3-13/+13
In order to benefit from the new napi_defer_hard_irqs feature, we need to use napi_complete_done() variant in this driver. RX path is already using it, this patch implements TX completion side. mlx4_en_process_tx_cq() now returns the amount of retired packets, instead of a boolean, so that mlx4_en_poll_tx_cq() can pass this value to napi_complete_done(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: napi: use READ_ONCE()/WRITE_ONCE()Eric Dumazet2-5/+5
gro_flush_timeout and napi_defer_hard_irqs can be read from napi_complete_done() while other cpus write the value, whithout explicit synchronization. Use READ_ONCE()/WRITE_ONCE() to annotate the races. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: napi: add hard irqs deferral featureEric Dumazet3-11/+38
Back in commit 3b47d30396ba ("net: gro: add a per device gro flush timer") we added the ability to arm one high resolution timer, that we used to keep not-complete packets in GRO engine a bit longer, hoping that further frames might be added to them. Since then, we added the napi_complete_done() interface, and commit 364b6055738b ("net: busy-poll: return busypolling status to drivers") allowed drivers to avoid re-arming NIC interrupts if we made a promise that their NAPI poll() handler would be called in the near future. This infrastructure can be leveraged, thanks to a new device parameter, which allows to arm the napi hrtimer, instead of re-arming the device hard IRQ. We have noticed that on some servers with 32 RX queues or more, the chit-chat between the NIC and the host caused by IRQ delivery and re-arming could hurt throughput by ~20% on 100Gbit NIC. In contrast, hrtimers are using local (percpu) resources and might have lower cost. The new tunable, named napi_defer_hard_irqs, is placed in the same hierarchy than gro_flush_timeout (/sys/class/net/ethX/) By default, both gro_flush_timeout and napi_defer_hard_irqs are zero. This patch does not change the prior behavior of gro_flush_timeout if used alone : NIC hard irqs should be rearmed as before. One concrete usage can be : echo 20000 >/sys/class/net/eth1/gro_flush_timeout echo 10 >/sys/class/net/eth1/napi_defer_hard_irqs If at least one packet is retired, then we will reset napi counter to 10 (napi_defer_hard_irqs), ensuring at least 10 periodic scans of the queue. On busy queues, this should avoid NIC hard IRQ, while before this patch IRQ avoidance was only possible if napi->poll() was exhausting its budget and not call napi_complete_done(). This feature also can be used to work around some non-optimal NIC irq coalescing strategies. Having the ability to insert XX usec delays between each napi->poll() can increase cache efficiency, since we increase batch sizes. It also keeps serving cpus not idle too long, reducing tail latencies. Co-developed-by: Luigi Rizzo <lrizzo@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23Merge branch 'qed-aer'David S. Miller3-1/+77
Sudarsana Reddy Kalluru says: ==================== qed*: Add support for pcie advanced error recovery. The patch series adds qed/qede driver changes for PCIe Advanced Error Recovery (AER) support. Patch (1) adds qed changes to enable the device to send error messages to root port when detected. Patch (2) adds qede support for handling the detected errors (AERs). Changes from previous version: ------------------------------- v2: use pci_num_vf() instead of caching the value in edev. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23qede: Add support for handling the pcie errors.Sudarsana Reddy Kalluru2-1/+68
The error recovery is handled by management firmware (MFW) with the help of qed/qede drivers. Upon detecting the errors, driver informs MFW about this event which in turn starts a recovery process. MFW sends ERROR_RECOVERY notification to the driver which performs the required cleanup/recovery from the driver side. Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23qed: Enable device error reporting capability.Sudarsana Reddy Kalluru1-0/+9
The patch enables the device to send error messages to root port when an error is detected. Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net: dsa: add GRO support via gro_cellsAlexander Lobakin4-2/+14
gro_cells lib is used by different encapsulating netdevices, such as geneve, macsec, vxlan etc. to speed up decapsulated traffic processing. CPU tag is a sort of "encapsulation", and we can use the same mechs to greatly improve overall DSA performance. skbs are passed to the GRO layer after removing CPU tags, so we don't need any new packet offload types as it was firstly proposed by me in the first GRO-over-DSA variant [1]. The size of struct gro_cells is sizeof(void *), so hot struct dsa_slave_priv becomes only 4/8 bytes bigger, and all critical fields remain in one 32-byte cacheline. The other positive side effect is that drivers for network devices that can be shipped as CPU ports of DSA-driven switches can now use napi_gro_frags() to pass skbs to kernel. Packets built that way are completely non-linear and are likely being dropped without GRO. This was tested on to-be-mainlined-soon Ethernet driver that uses napi_gro_frags(), and the overall performance was on par with the variant from [1], sometimes even better due to minimal overhead. net.core.gro_normal_batch tuning may help to push it to the limit on particular setups and platforms. iperf3 IPoE VLAN NAT TCP forwarding (port1.218 -> port0) setup on 1.2 GHz MIPS board: 5.7-rc2 baseline: [ID] Interval Transfer Bitrate Retr [ 5] 0.00-120.01 sec 9.00 GBytes 644 Mbits/sec 413 sender [ 5] 0.00-120.00 sec 8.99 GBytes 644 Mbits/sec receiver Iface RX packets TX packets eth0 7097731 7097702 port0 426050 6671829 port1 6671681 425862 port1.218 6671677 425851 With this patch: [ID] Interval Transfer Bitrate Retr [ 5] 0.00-120.01 sec 12.2 GBytes 870 Mbits/sec 122 sender [ 5] 0.00-120.00 sec 12.2 GBytes 870 Mbits/sec receiver Iface RX packets TX packets eth0 9474792 9474777 port0 455200 353288 port1 9019592 455035 port1.218 353144 455024 v2: - Add some performance examples in the commit message; - No functional changes. [1] https://lore.kernel.org/netdev/20191230143028.27313-1-alobakin@dlink.ru/ Signed-off-by: Alexander Lobakin <bloodyreaper@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23ipv6: Honor all IPv6 PIO Valid Lifetime valuesFernando Gont2-22/+7
RFC4862 5.5.3 e) prevents received Router Advertisements from reducing the Valid Lifetime of configured addresses to less than two hours, thus preventing hosts from reacting to the information provided by a router that has positive knowledge that a prefix has become invalid. This patch makes hosts honor all Valid Lifetime values, as per draft-gont-6man-slaac-renum-06, Section 4.2. This is meant to help mitigate the problem discussed in draft-ietf-v6ops-slaac-renum. Note: Attacks aiming at disabling an advertised prefix via a Valid Lifetime of 0 are not really more harmful than other attacks that can be performed via forged RA messages, such as those aiming at completely disabling a next-hop router via an RA that advertises a Router Lifetime of 0, or performing a Denial of Service (DoS) attack by advertising illegitimate prefixes via forged PIOs. In scenarios where RA-based attacks are of concern, proper mitigations such as RA-Guard [RFC6105] [RFC7113] should be implemented. Signed-off-by: Fernando Gont <fgont@si6networks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22Merge branch 'dpaa2-eth-add-support-for-xdp-bulk-enqueue'David S. Miller4-60/+87
Ioana Ciornei says: ==================== dpaa2-eth: add support for xdp bulk enqueue The first patch moves the DEV_MAP_BULK_SIZE macro into the xdp.h header file so that drivers can take advantage of it and use it. The following 3 patches are there to setup the scene for using the bulk enqueue feature. First of all, the prototype of the enqueue function is changed so that it returns the number of enqueued frames. Second, the bulk enqueue interface is used but without any functional changes, still one frame at a time is enqueued. Third, the .ndo_xdp_xmit callback is split into two stages, create all FDs for the xdp_frames received and then enqueue them. The last patch of the series builds on top of the others and instead of issuing an enqueue operation for each FD it issues a bulk enqueue call for as many frames as possible. This is repeated until all frames are enqueued or the maximum number of retries is hit. We do not use the XDP_XMIT_FLUSH flag since the architecture is not capable to store all frames dequeued in a NAPI cycle, instead we send out right away all frames received in a .ndo_xdp_xmit call. Changes in v2: - statically allocate an array of dpaa2_fd by frame queue - use the DEV_MAP_BULK_SIZE as the maximum number of xdp_frames received in .ndo_xdp_xmit() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22dpaa2-eth: use bulk enqueue in .ndo_xdp_xmitIoana Ciornei2-29/+30
Take advantage of the bulk enqueue feature in .ndo_xdp_xmit. We cannot use the XDP_XMIT_FLUSH since the architecture is not capable to store all the frames dequeued in a NAPI cycle so we instead are enqueueing all the frames received in a ndo_xdp_xmit call right away. After setting up all FDs for the xdp_frames received, enqueue multiple frames at a time until all are sent or the maximum number of retries is hit. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22dpaa2-eth: split the .ndo_xdp_xmit callback into two stagesIoana Ciornei1-36/+40
Instead of having a function that both creates a frame descriptor from an xdp_frame and enqueues it, split this into two stages. Add the dpaa2_eth_xdp_create_fd that just transforms an xdp_frame into a FD while the actual enqueue callback is called directly from the ndo for each frame. This is particulary useful in conjunction with bulk enqueue. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22dpaa2-eth: use the bulk ring mode enqueue interfaceIoana Ciornei2-14/+22
Update the dpaa2-eth driver to use the bulk enqueue function introduced with the change to QBMAN ring mode. At the moment, no functional changes are made but rather the driver just transitions to the new interface while still enqueuing just one frame at a time. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22dpaa2-eth: return num_enqueued frames from enqueue callbackIoana Ciornei2-13/+26
The enqueue dpaa2-eth callback now returns the number of successfully enqueued frames. This is a preliminary patch necessary for adding support for bulk ring mode enqueue. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22xdp: export the DEV_MAP_BULK_SIZE macroIoana Ciornei2-1/+2
Export the DEV_MAP_BULK_SIZE macro to the header file so that drivers can directly use it as the maximum number of xdp_frames received in the .ndo_xdp_xmit() callback. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22selftests: A few improvements to fib_nexthops.shDavid Ahern1-11/+14
Add nodad when adding IPv6 addresses and remove the sleep. A recent change to iproute2 moved the 'pref medium' to the prefix (where it belongs). Change the expected route check to strip 'pref medium' to be compatible with old and new iproute2. Add IPv4 runtime test with an IPv6 address as the gateway in the default route. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22Merge branch 'Add-selftests-for-pedit-ex-munge-ip6-dsfield'David S. Miller2-0/+91
Petr Machata says: ==================== Add selftests for pedit ex munge ip6 dsfield Patch #1 extends the existing generic forwarding selftests to cover pedit ex munge ip6 traffic_class as well. Patch #2 adds TDC test coverage. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22selftests: tc-testing: Add a TDC test for pedit munge ip6 dsfieldPetr Machata1-0/+25
Add a self-test for the IPv6 dsfield munge that iproute2 will support. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22selftests: forwarding: pedit_dsfield: Add pedit munge ip6 dsfieldPetr Machata1-0/+66
Extend the pedit_dsfield forwarding selftest with coverage of "pedit ex munge ip6 dsfield set". Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22Merge branch 'add-TJA1102-support'David S. Miller4-38/+308
Oleksij Rempel says: ==================== add TJA1102 support changes v5: - rename __of_mdiobus_register_phy() to of_mdiobus_phy_device_register() changes v4: - remove unused phy_id variable changes v3: - export part of of_mdiobus_register_phy() and reuse it in tja11xx driver - coding style fixes changes v2: - use .match_phy_device - add irq support - add add delayed registration for PHY1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>