summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/intel/i40e/i40e_txrx.c
AgeCommit message (Collapse)AuthorFilesLines
2019-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-2/+3
Pull networking updates from David Miller: "Highlights: 1) Support AES128-CCM ciphers in kTLS, from Vakul Garg. 2) Add fib_sync_mem to control the amount of dirty memory we allow to queue up between synchronize RCU calls, from David Ahern. 3) Make flow classifier more lockless, from Vlad Buslov. 4) Add PHY downshift support to aquantia driver, from Heiner Kallweit. 5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces contention on SLAB spinlocks in heavy RPC workloads. 6) Partial GSO offload support in XFRM, from Boris Pismenny. 7) Add fast link down support to ethtool, from Heiner Kallweit. 8) Use siphash for IP ID generator, from Eric Dumazet. 9) Pull nexthops even further out from ipv4/ipv6 routes and FIB entries, from David Ahern. 10) Move skb->xmit_more into a per-cpu variable, from Florian Westphal. 11) Improve eBPF verifier speed and increase maximum program size, from Alexei Starovoitov. 12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit spinlocks. From Neil Brown. 13) Allow tunneling with GUE encap in ipvs, from Jacky Hu. 14) Improve link partner cap detection in generic PHY code, from Heiner Kallweit. 15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan Maguire. 16) Remove SKB list implementation assumptions in SCTP, your's truly. 17) Various cleanups, optimizations, and simplifications in r8169 driver. From Heiner Kallweit. 18) Add memory accounting on TX and RX path of SCTP, from Xin Long. 19) Switch PHY drivers over to use dynamic featue detection, from Heiner Kallweit. 20) Support flow steering without masking in dpaa2-eth, from Ioana Ciocoi. 21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri Pirko. 22) Increase the strict parsing of current and future netlink attributes, also export such policies to userspace. From Johannes Berg. 23) Allow DSA tag drivers to be modular, from Andrew Lunn. 24) Remove legacy DSA probing support, also from Andrew Lunn. 25) Allow ll_temac driver to be used on non-x86 platforms, from Esben Haabendal. 26) Add a generic tracepoint for TX queue timeouts to ease debugging, from Cong Wang. 27) More indirect call optimizations, from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits) cxgb4: Fix error path in cxgb4_init_module net: phy: improve pause mode reporting in phy_print_status dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings net: macb: Change interrupt and napi enable order in open net: ll_temac: Improve error message on error IRQ net/sched: remove block pointer from common offload structure net: ethernet: support of_get_mac_address new ERR_PTR error net: usb: smsc: fix warning reported by kbuild test robot staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check net: dsa: support of_get_mac_address new ERR_PTR error net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats vrf: sit mtu should not be updated when vrf netdev is the link net: dsa: Fix error cleanup path in dsa_init_module l2tp: Fix possible NULL pointer dereference taprio: add null check on sched_nest to avoid potential null pointer dereference net: mvpp2: cls: fix less than zero check on a u32 variable net_sched: sch_fq: handle non connected flows net_sched: sch_fq: do not assume EDT packets are ordered net: hns3: use devm_kcalloc when allocating desc_cb net: hns3: some cleanup for struct hns3_enet_ring ...
2019-04-23net: pass net_device argument to the eth_get_headlenStanislav Fomichev1-1/+2
Update all users of eth_get_headlen to pass network device, fetch network namespace from it and pass it down to the flow dissector. This commit is a noop until administrator inserts BPF flow dissector program. Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: Yisen Zhuang <yisen.zhuang@huawei.com> Cc: Salil Mehta <salil.mehta@huawei.com> Cc: Michael Chan <michael.chan@broadcom.com> Cc: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-08drivers: Remove explicit invocations of mmiowb()Will Deacon1-5/+0
mmiowb() is now implied by spin_unlock() on architectures that require it, so there is no reason to call it from driver code. This patch was generated using coccinelle: @mmiowb@ @@ - mmiowb(); and invoked as: $ for d in drivers include/linux/qed sound; do \ spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done NOTE: mmiowb() has only ever guaranteed ordering in conjunction with spin_unlock(). However, pairing each mmiowb() removal in this patch with the corresponding call to spin_unlock() is not at all trivial, so there is a small chance that this change may regress any drivers incorrectly relying on mmiowb() to order MMIO writes between CPUs using lock-free synchronisation. If you've ended up bisecting to this commit, you can reintroduce the mmiowb() calls using wmb() instead, which should restore the old behaviour on all architectures other than some esoteric ia64 systems. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-01net: move skb->xmit_more hint to softnet dataFlorian Westphal1-1/+1
There are two reasons for this. First, the xmit_more flag conceptually doesn't fit into the skb, as xmit_more is not a property related to the skb. Its only a hint to the driver that the stack is about to transmit another packet immediately. Second, it was only done this way to not have to pass another argument to ndo_start_xmit(). We can place xmit_more in the softnet data, next to the device recursion. The recursion counter is already written to on each transmit. The "more" indicator is placed right next to it. Drivers can use the netdev_xmit_more() helper instead of skb->xmit_more to check the "more packets coming" hint. skb->xmit_more is retained (but always 0) to not cause build breakage. This change takes care of the simple s/skb->xmit_more/netdev_xmit_more()/ conversions. Remaining drivers are converted in the next patches. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21i40e: fix XDP_REDIRECT/XDP xmit ring cleanup raceBjörn Töpel1-1/+3
When the driver clears the XDP xmit ring due to re-configuration or teardown, in-progress ndo_xdp_xmit must be taken into consideration. The ndo_xdp_xmit function is typically called from a NAPI context that the driver does not control. Therefore, we must be careful not to clear the XDP ring, while the call is on-going. This patch adds a synchronize_rcu() to wait for napi(s) (preempt-disable regions and softirqs), prior clearing the queue. Further, the __I40E_CONFIG_BUSY flag is checked in the ndo_xdp_xmit implementation to avoid touching the XDP xmit queue during re-configuration. Fixes: d9314c474d4f ("i40e: add support for XDP_REDIRECT") Fixes: 123cecd427b6 ("i40e: added queue pair disable/enable functions") Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-31/+12
Lots of conflicts, by happily all cases of overlapping changes, parallel adds, things of that nature. Thanks to Stephen Rothwell, Saeed Mahameed, and others for their guidance in these resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12i40e: DRY rx_ptype handling codeMichał Mirosław1-8/+4
Move rx_ptype extracting to i40e_process_skb_fields() to avoid duplicating the code. Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-12i40e: fix VLAN.TCI == 0 RX HW offloadMichał Mirosław1-23/+8
This fixes two bugs in hardware VLAN offload: 1. VLAN.TCI == 0 was being dropped 2. there was a race between disabling of VLAN RX feature in hardware and processing RX queue, where packets processed in this window could have their VLAN information dropped Fix moves the VLAN handling into i40e_process_skb_fields() to save on duplicated code. i40e_receive_skb() becomes trivial and so is removed. Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-21ethernet/intel: consolidate NAPI and NAPI exitJesse Brandeburg1-4/+5
While reviewing code, I noticed that Eric Dumazet recommends that drivers check the return code of napi_complete_done, and use that to decide to enable interrupts or not when exiting poll. One of the Intel drivers was already fixed (ixgbe). Upon looking at the Intel drivers as a whole, we are handling our polling and NAPI exit in a few different ways based on whether we have multiqueue and whether we have Tx cleanup included. Several drivers had the bug of exiting NAPI with return 0, which appears to mess up the accounting in the stack. Consolidate all the NAPI routines to do best known way of exiting and to just mostly look like each other. 1) check return code of napi_complete_done to control interrupt enable 2) return the actual amount of work done. 3) return budget immediately if need NAPI poll again Tested the changes on e1000e with a high interrupt rate set, and it shows about an 8% reduction in the CPU utilization when busy polling because we aren't re-enabling interrupts when we're about to be polled. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-14i40e: Use a local variable for readabilityJan Sokolowski1-2/+2
Use a local variable to make the code a bit more readable. Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-07intel-ethernet: software timestamp skbs as late as possibleJacob Keller1-2/+2
Many of the Intel Ethernet drivers call skb_tx_timestamp() earlier than necessary. Move the calls to this function to the latest point possible, just prior to notifying hardware of the new Tx packet when we bump the tail register. This affects i40e, iavf, igb, igc, and ixgbe. The e100, e1000, e1000e, fm10k, and ice drivers already call the skb_tx_timestamp() function just prior to indicating the Tx packet to hardware, so they do not need to be changed. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-25drivers: net: remove <net/busy_poll.h> inclusion when not neededEric Dumazet1-1/+0
Drivers using generic NAPI interface no longer need to include <net/busy_poll.h>, since busy polling was moved to core networking stack long ago. See commit 79e7fff47b7b ("net: remove support for per driver ndo_busy_poll()") for reference. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25i40e: clean zero-copy XDP Rx ring on shutdown/resetBjörn Töpel1-1/+3
Outstanding Rx descriptors are temporarily stored on a stash/reuse queue. When/if the HW rings comes up again, entries from the stash are used to re-populate the ring. The latter required some restructuring of the allocation scheme for the AF_XDP zero-copy implementation. There is now a fast, and a slow allocation. The "fast allocation" is used from the fast-path and obtains free buffers from the fill ring and the internal recycle mechanism. The "slow allocation" is only used in ring setup, and obtains buffers from the fill ring and the stash (if any). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-25i40e: clean zero-copy XDP Tx ring on shutdown/resetBjörn Töpel1-6/+11
When the zero-copy enabled XDP Tx ring is torn down, due to configuration changes, outstanding frames on the hardware descriptor ring are queued on the completion ring. The completion ring has a back-pressure mechanism that will guarantee that there is sufficient space on the ring. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-29i40e: add AF_XDP zero-copy Tx supportMagnus Karlsson1-1/+5
This patch adds zero-copy Tx support for AF_XDP sockets. It implements the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a NAPI context. This means pulling egress packets from the Tx ring, placing the frames on the NIC HW descriptor ring and completing sent frames back to the application via the completion ring. The regular XDP Tx ring is used for AF_XDP as well. This rationale for this is as follows: XDP_REDIRECT guarantees mutual exclusion between different NAPI contexts based on CPU id. In other words, a netdev can XDP_REDIRECT to another netdev with a different NAPI context, since the operation is bound to a specific core and each core has its own hardware ring. As the AF_XDP Tx action is running in the same NAPI context and using the same ring, it will also be protected from XDP_REDIRECT actions with the exact same mechanism. As with AF_XDP Rx, all AF_XDP Tx specific functions are added to i40e_xsk.c. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29i40e: move common Tx functions to i40e_txrx_common.hMagnus Karlsson1-33/+2
This patch prepares for the upcoming zero-copy Tx functionality, by moving common functions and refactor chunks of code into re-usable functions, used both by the regular path and zero-copy path. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29i40e: add AF_XDP zero-copy Rx supportBjörn Töpel1-1/+8
This patch adds zero-copy Rx support for AF_XDP sockets. Instead of allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain queue. All AF_XDP specific functions are added to a new file, i40e_xsk.c. Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS will allocate a new buffer and copy the zero-copy frame prior passing it to the kernel stack. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29i40e: move common Rx functions to i40e_txrx_common.hBjörn Töpel1-20/+13
This patch prepares for the upcoming zero-copy Rx functionality, by moving/changing linkage of common functions, used both by the regular path and zero-copy path. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29i40e: refactor Rx path for re-useBjörn Töpel1-34/+77
In this commit, the Rx path is refactored some, as a step torwards the introduction AF_XDP Rx zero-copy. The page re-use counter is moved into the i40e_reuse_rx_page, instead of bumping the counter in many places. The Rx buffer page clearing is moved for better readability. Lastely, functions to update statistics and bump the XDP Tx ring are introduced. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-07i40e_txrx: mark expected switch fall-throughGustavo A. R. Silva1-1/+2
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 114791 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28i40e: split XDP_TX tail and XDP_REDIRECT map flushingJesper Dangaard Brouer1-9/+15
The driver was combining the XDP_TX tail flush and XDP_REDIRECT map flushing (xdp_do_flush_map). This is suboptimal, these two flush operations should be kept separate. It looks like the mistake was copy-pasted from ixgbe. Fixes: d9314c474d4f ("i40e: add support for XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20bpf, xdp, i40e: fix i40e_build_skb skb reserve and truesizeDaniel Borkmann1-4/+3
Using skb_reserve(skb, I40E_SKB_PAD + (xdp->data - xdp->data_hard_start)) is clearly wrong since I40E_SKB_PAD already points to the offset where the original xdp->data was sitting since xdp->data_hard_start is defined as xdp->data - i40e_rx_offset(rx_ring) where latter offsets to I40E_SKB_PAD when build skb is used. However, also before cc5b114dcf98 ("bpf, i40e: add meta data support") this seems broken since bpf_xdp_adjust_head() helper could have been used to alter headroom and enlarge / shrink the frame and with that the assumption that the xdp->data remains unchanged does not hold and would push a bogus packet to upper stack. ixgbe got this right in 924708081629 ("ixgbe: add XDP support for pass and drop actions"). In any case, fix it by removing the I40E_SKB_PAD from both skb_reserve() and truesize calculation. Fixes: cc5b114dcf98 ("bpf, i40e: add meta data support") Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions") Reported-by: Keith Busch <keith.busch@linux.intel.com> Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: John Fastabend <john.fastabend@gmail.com> Tested-by: Keith Busch <keith.busch@linux.intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-21/+12
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-06-05 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add a new BPF hook for sendmsg similar to existing hooks for bind and connect: "This allows to override source IP (including the case when it's set via cmsg(3)) and destination IP:port for unconnected UDP (slow path). TCP and connected UDP (fast path) are not affected. This makes UDP support complete, that is, connected UDP is handled by connect hooks, unconnected by sendmsg ones.", from Andrey. 2) Rework of the AF_XDP API to allow extending it in future for type writer model if necessary. In this mode a memory window is passed to hardware and multiple frames might be filled into that window instead of just one that is the case in the current fixed frame-size model. With the new changes made this can be supported without having to add a new descriptor format. Also, core bits for the zero-copy support for AF_XDP have been merged as agreed upon, where i40e bits will be routed via Jeff later on. Various improvements to documentation and sample programs included as well, all from Björn and Magnus. 3) Given BPF's flexibility, a new program type has been added to implement infrared decoders. Quote: "The kernel IR decoders support the most widely used IR protocols, but there are many protocols which are not supported. [...] There is a 'long tail' of unsupported IR protocols, for which lircd is need to decode the IR. IR encoding is done in such a way that some simple circuit can decode it; therefore, BPF is ideal. [...] user-space can define a decoder in BPF, attach it to the rc device through the lirc chardev.", from Sean. 4) Several improvements and fixes to BPF core, among others, dumping map and prog IDs into fdinfo which is a straight forward way to correlate BPF objects used by applications, removing an indirect call and therefore retpoline in all map lookup/update/delete calls by invoking the callback directly for 64 bit archs, adding a new bpf_skb_cgroup_id() BPF helper for tc BPF programs to have an efficient way of looking up cgroup v2 id for policy or other use cases. Fixes to make sure we zero tunnel/xfrm state that hasn't been filled, to allow context access wrt pt_regs in 32 bit archs for tracing, and last but not least various test cases for fixes that landed in bpf earlier, from Daniel. 5) Get rid of the ndo_xdp_flush API and extend the ndo_xdp_xmit with a XDP_XMIT_FLUSH flag instead which allows to avoid one indirect call as flushing is now merged directly into ndo_xdp_xmit(), from Jesper. 6) Add a new bpf_get_current_cgroup_id() helper that can be used in tracing to retrieve the cgroup id from the current process in order to allow for e.g. aggregation of container-level events, from Yonghong. 7) Two follow-up fixes for BTF to reject invalid input values and related to that also two test cases for BPF kselftests, from Martin. 8) Various API improvements to the bpf_fib_lookup() helper, that is, dropping MPLS bits which are not fully hashed out yet, rejecting invalid helper flags, returning error for unsupported address families as well as renaming flowlabel to flowinfo, from David. 9) Various fixes and improvements to sockmap BPF kselftests in particular in proper error detection and data verification, from Prashant. 10) Two arm32 BPF JIT improvements. One is to fix imm range check with regards to whether immediate fits into 24 bits, and a naming cleanup to get functions related to rsh handling consistent to those handling lsh, from Wang. 11) Two compile warning fixes in BPF, one for BTF and a false positive to silent gcc in stack_map_get_build_id_offset(), from Arnd. 12) Add missing seg6.h header into tools include infrastructure in order to fix compilation of BPF kselftests, from Mathieu. 13) Several formatting cleanups in the BPF UAPI helper description that also fix an error during rst2man compilation, from Quentin. 14) Hide an unused variable in sk_msg_convert_ctx_access() when IPv6 is not built into the kernel, from Yue. 15) Remove a useless double assignment in dev_map_enqueue(), from Colin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05i40e: remove ndo_xdp_flush call i40e_xdp_flushJesper Dangaard Brouer1-19/+0
Remove the ndo_xdp_flush call implementation i40e_xdp_flush as no callers of ndo_xdp_flush are left. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-04bpf, i40e: add meta data supportDaniel Borkmann1-8/+31
Add support for XDP meta data when using build skb variant of the i40e driver. Implementation is analogous to the existing ixgbe and ixgbevf support for meta data from 366a88fe2f40 ("bpf, ixgbe: add meta data support") and be8333322eff ("ixgbevf: Add support for meta data"). With the build skb variant we get 192 bytes of extra headroom which can be used for encaps or meta data. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Tested-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-03i40e: implement flush flag for ndo_xdp_xmitJesper Dangaard Brouer1-2/+8
When passed the XDP_XMIT_FLUSH flag i40e_xdp_xmit now performs the same kind of ring tail update as in i40e_xdp_flush. The advantage is that all the necessary checks have been performed and xdp_ring can be updated, instead of having to perform the exact same steps/checks in i40e_xdp_flush Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-03xdp: add flags argument to ndo_xdp_xmit APIJesper Dangaard Brouer1-1/+5
This patch only change the API and reject any use of flags. This is an intermediate step that allows us to implement the flush flag operation later, for each individual driver in a separate patch. The plan is to implement flush operation via XDP_XMIT_FLUSH flag and then remove XDP_XMIT_FLAGS_NONE when done. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24xdp: change ndo_xdp_xmit API to support bulkingJesper Dangaard Brouer1-7/+19
This patch change the API for ndo_xdp_xmit to support bulking xdp_frames. When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown. Most of the slowdown is caused by DMA API indirect function calls, but also the net_device->ndo_xdp_xmit() call. Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed performance improved: for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps With frames avail as a bulk inside the driver ndo_xdp_xmit call, further optimizations are possible, like bulk DMA-mapping for TX. Testing without CONFIG_RETPOLINE show the same performance for physical NIC drivers. The virtual NIC driver tun sees a huge performance boost, as it can avoid doing per frame producer locking, but instead amortize the locking cost over the bulk. V2: Fix compile errors reported by kbuild test robot <lkp@intel.com> V4: Isolated ndo, driver changes and callers. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-30i40e/i40evf: cleanup incorrect function doxygen commentsJacob Keller1-2/+4
Recent versions of the Linux kernel now warn about incorrect parameter definitions for function comments. Fix up several function comments to correctly reflect the current function arguments. This cleans up the warnings and helps ensure our documentation is accurate. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-27net: intel: Cleanup the copyright/license headersJeff Kirsher1-25/+1
After many years of having a ~30 line copyright and license header to our source files, we are finally able to reduce that to one line with the advent of the SPDX identifier. Also caught a few files missing the SPDX license identifier, so fixed them up. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17xdp: transition into using xdp_frame for ndo_xdp_xmitJesper Dangaard Brouer1-13/+17
Changing API ndo_xdp_xmit to take a struct xdp_frame instead of struct xdp_buff. This brings xdp_return_frame and ndp_xdp_xmit in sync. This builds towards changing the API further to become a bulk API, because xdp_buff is not a queue-able object while xdp_frame is. V4: Adjust for commit 59655a5b6c83 ("tuntap: XDP_TX can use native XDP") V7: Adjust for commit d9314c474d4f ("i40e: add support for XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17xdp: transition into using xdp_frame for return APIJesper Dangaard Brouer1-3/+2
Changing API xdp_return_frame() to take struct xdp_frame as argument, seems like a natural choice. But there are some subtle performance details here that needs extra care, which is a deliberate choice. When de-referencing xdp_frame on a remote CPU during DMA-TX completion, result in the cache-line is change to "Shared" state. Later when the page is reused for RX, then this xdp_frame cache-line is written, which change the state to "Modified". This situation already happens (naturally) for, virtio_net, tun and cpumap as the xdp_frame pointer is the queued object. In tun and cpumap, the ptr_ring is used for efficiently transferring cache-lines (with pointers) between CPUs. Thus, the only option is to de-referencing xdp_frame. It is only the ixgbe driver that had an optimization, in which it can avoid doing the de-reference of xdp_frame. The driver already have TX-ring queue, which (in case of remote DMA-TX completion) have to be transferred between CPUs anyhow. In this data area, we stored a struct xdp_mem_info and a data pointer, which allowed us to avoid de-referencing xdp_frame. To compensate for this, a prefetchw is used for telling the cache coherency protocol about our access pattern. My benchmarks show that this prefetchw is enough to compensate the ixgbe driver. V7: Adjust for commit d9314c474d4f ("i40e: add support for XDP_REDIRECT") V8: Adjust for commit bd658dda4237 ("net/mlx5e: Separate dma base address and offset in dma_sync call") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17i40e: convert to use generic xdp_frame and xdp_return_frame APIJesper Dangaard Brouer1-5/+15
Also convert driver i40e, which very recently got XDP_REDIRECT support in commit d9314c474d4f ("i40e: add support for XDP_REDIRECT"). V7: This patch got added in V7 of this patchset. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26i40e: add support for XDP_REDIRECTBjörn Töpel1-10/+64
The driver now acts upon the XDP_REDIRECT return action. Two new ndos are implemented, ndo_xdp_xmit and ndo_xdp_flush. XDP_REDIRECT action enables XDP program to redirect frames to other netdevs. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26i40e: tweak page counting for XDP_REDIRECTBjörn Töpel1-5/+4
This commit tweaks the page counting for XDP_REDIRECT to function properly. XDP_REDIRECT support will be added in a future commit. The current page counting scheme assumes that the reference count cannot decrease until the received frame is sent to the upper layers of the networking stack. This assumption does not hold for the XDP_REDIRECT action, since a page (pointed out by xdp_buff) can have its reference count decreased via the xdp_do_redirect call. To work around that, we now start off by a large page count and then don't allow a refcount less than two. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26i40e: move AUTO_DISABLED flags into the state fieldJacob Keller1-8/+13
The two Flow Directory auto disable flags are used at run time to mark when the flow director features needed to be disabled. Thus the flags could change even when the RTNL lock is not held. They also have some code constructions which really should be test_and_set or test_and_clear using atomic bit operations. Create new state fields to mark this, and stop including them in pf->flags. This is part of a larger effort to remove the need for cmpxchg64 in i40e_set_priv_flags(). Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23intel: add SPDX identifiers to all the Intel driversJeff Kirsher1-0/+1
Add the SPDX identifiers to all the Intel wired LAN driver files, as outlined in Documentation/process/license-rules.rst. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26i40e/i40evf: use SW variables for hang detectionAlan Brady1-5/+11
The i40e_detect_recover_hung function uses the i40e_get_tx_pending function to determine if there are packets stalled on the ring. i40e_get_tx_pending calculates the pending packets using the head writeback value and HW tail. If the queue is stopped and we lose the interrupt to update our next_to_clean then we a) won't get another interrupt to clean because queue is stopped b) we won't catch the problem with i40e_detect_recover_hung because the HW values look like there's no packets waiting to be transmitted. Using the SW values we can catch the issue because next_to_clean will be out of sync with head writeback. This has the added benefit being less CPU intensive because we don't need to reach into the hardware to get the values. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12i40e/i40evf: Add support for new mechanism of updating adaptive ITRAlexander Duyck1-111/+251
This patch replaces the existing mechanism for determining the correct value to program for adaptive ITR with yet another new and more complicated approach. The basic idea from a 30K foot view is that this new approach will push the Rx interrupt moderation up so that by default it starts in low latency and is gradually pushed up into a higher latency setup as long as doing so increases the number of packets processed, if the number of packets drops to 4 to 1 per packet we will reset and just base our ITR on the size of the packets being received. For Tx we leave it floating at a high interrupt delay and do not pull it down unless we start processing more than 112 packets per interrupt. If we start exceeding that we will cut our interrupt rates in half until we are back below 112. The side effect of these patches are that we will be processing more packets per interrupt. This is both a good and a bad thing as it means we will not be blocking processing in the case of things like pktgen and XDP, but we will also be consuming a bit more CPU in the cases of things such as network throughput tests using netperf. One delta from this versus the ixgbe version of the changes is that I have made the interrupt moderation a bit more aggressive when we are in bulk mode by moving our "goldilocks zone" up from 48 to 96 to 56 to 112. The main motivation behind moving this is to address the fact that we need to update less frequently, and have more fine grained control due to the separate Tx and Rx ITR times. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12i40e/i40evf: Split container ITR into current_itr and target_itrAlexander Duyck1-29/+39
This patch is mostly prep-work for replacing the current approach to programming the dynamic aka adaptive ITR. Specifically here what we are doing is splitting the Tx and Rx ITR each into two separate values. The first value current_itr represents the current value of the register. The second value target_itr represents the desired value of the register. The general plan by doing this is to allow for deferring the update of the ITR value under certain circumstances. For now we will work with what we have, but in the future I hope to change the behavior so that we always only update one ITR at a time using some simple logic to determine which ITR requires an update. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12i40e/i40evf: Use usec value instead of reg value for ITR definesAlexander Duyck1-2/+9
Instead of using the register value for the defines when setting up the ring ITR we can just use the actual values and avoid the use of shifts and macros to translate between the values we have and the values we want. This helps to make the code more readable as we can quickly translate from one value to the other. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12i40e/i40evf: Don't bother setting the CLEARPBA bitAlexander Duyck1-1/+10
The CLEARPBA bit in the dynamic interrupt control register actually has no effect either way on the hardware. As per errata 28 in the XL710 specification update the interrupt is actually cleared any time the register is written with the INTENA_MSK bit set to 0. As such the act of toggling the enable bit actually will trigger the interrupt being cleared and could lead to potential lost events if auto-masking is not enabled. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12i40e/i40evf: Clean up logic for adaptive ITRAlexander Duyck1-41/+14
The logic for dynamic ITR update is confusing at best as there were odd paths chosen for how to find the rings associated with a given queue based on the vector index and other inconsistencies throughout the code. This patch is an attempt to clean up the logic so that we can more easily understand what is going on. Specifically if there is a Rx or Tx ring that is enabled in dynamic mode on the q_vector it is allowed to override the other side of the interrupt moderation. While it isn't correct all this patch is doing is cleaning up the logic for now so that when we come through and fix it we can more easily identify that this is wrong. The other big change made here is that we replace references to: vsi->rx_rings[q_vector->v_idx]->itr_setting with: q_vector->rx.ring->itr_setting The general idea is we can avoid the long pointer chase since just accessing q_vector->rx.ring is a single pointer access versus having to chase down vsi->rx_rings, and then finding the pointer in the array, and finally chasing down the itr_setting from there. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12i40e/i40evf: Only track one ITR setting per ring instead of Tx/RxAlexander Duyck1-3/+3
The rings are already split out into Tx and Rx rings so it doesn't make sense to have any single ring store both a Tx and Rx itr_setting value. Since that is the case drop the pair in favor of storing just a single ITR value. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12i40e: fix typo in function descriptionAlan Brady1-1/+1
'bufer' should be 'buffer' Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26i40e/i40evf: Record ITR register location in the q_vectorAlexander Duyck1-8/+4
The drivers for i40e and i40evf had a reg_idx value stored in the q_vector that was going completely unused. I can only assume this was copied over from ixgbe and nobody knew how to use it. I'm going to make use of the value to avoid having to compute the vector and thus the register index for multiple paths throughout the drivers. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23i40e/i40evf: Detect and recover hung queue scenarioSudheer Mogilappagari1-0/+54
In VFs, there is a known issue which can cause writebacks to not occur when interrupts are disabled and there are less than 4 descriptors resulting in TX timeout. Timeout can also occur due to lost interrupt. The current implementation for detecting and recovering from hung queues in the PF is problematic because it actually actively encourages lost interrupts. By triggering a SW interrupt, interrupts are forced on. If we are already in napi_poll and an interrupt fires, napi_poll will not be rescheduled and the interrupt is effectively lost; thereby potentially *causing* hung queues. This patch checks whether packets are being processed between every watchdog cycle and determine potential hung queue and fires triggers SW interrupt only for that particular queue. Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-3/+23
2018-01-05i40e: setup xdp_rxq_infoJesper Dangaard Brouer1-2/+16
The i40e driver has a special "FDIR" RX-ring (I40E_VSI_FDIR) which is a sideband channel for configuring/updating the flow director tables. This (i40e_vsi_)type does not invoke XDP-ebpf code. As suggested by Björn (V2): Instead of marking this I40E_VSI_FDIR RX-ring a special case, reverse the logic and only select RX-rings of type I40E_VSI_MAIN to register xdp_rxq_info's for. Driver hook points for xdp_rxq_info: * reg : i40e_setup_rx_descriptors (via i40e_vsi_setup_rx_resources) * unreg: i40e_free_rx_resources (via i40e_vsi_free_rx_resources) Tested on actual hardware with samples/bpf program. V2: Fixed bug in i40e_set_ringparam (memset zero) + match on I40E_VSI_MAIN. V4: Update patch desc that got out-of-sync with code. Cc: intel-wired-lan@lists.osuosl.org Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-03i40e/i40evf: Account for frags split over multiple descriptors in check ↵Alexander Duyck1-3/+23
linearize The original code for __i40e_chk_linearize didn't take into account the fact that if a fragment is 16K in size or larger it has to be split over 2 descriptors and the smaller of those 2 descriptors will be on the trailing edge of the transmit. As a result we can get into situations where we didn't catch requests that could result in a Tx hang. This patch takes care of that by subtracting the length of all but the trailing edge of the stale fragment before we test for sum. By doing this we can guarantee that we have all cases covered, including the case of a fragment that spans multiple descriptors. We don't need to worry about checking the inner portions of this since 12K is the maximum aligned DMA size and that is larger than any MSS will ever be since the MTU limit for jumbos is something on the order of 9K. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>