diff options
Diffstat (limited to 'Documentation/networking')
35 files changed, 3393 insertions, 1992 deletions
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX deleted file mode 100644 index 02a323c43261..000000000000 --- a/Documentation/networking/00-INDEX +++ /dev/null @@ -1,234 +0,0 @@ -00-INDEX - - this file -3c509.txt - - information on the 3Com Etherlink III Series Ethernet cards. -6pack.txt - - info on the 6pack protocol, an alternative to KISS for AX.25 -LICENSE.qla3xxx - - GPLv2 for QLogic Linux Networking HBA Driver -LICENSE.qlge - - GPLv2 for QLogic Linux qlge NIC Driver -LICENSE.qlcnic - - GPLv2 for QLogic Linux qlcnic NIC Driver -PLIP.txt - - PLIP: The Parallel Line Internet Protocol device driver -README.ipw2100 - - README for the Intel PRO/Wireless 2100 driver. -README.ipw2200 - - README for the Intel PRO/Wireless 2915ABG and 2200BG driver. -README.sb1000 - - info on General Instrument/NextLevel SURFboard1000 cable modem. -altera_tse.txt - - Altera Triple-Speed Ethernet controller. -arcnet-hardware.txt - - tons of info on ARCnet, hubs, jumper settings for ARCnet cards, etc. -arcnet.txt - - info on the using the ARCnet driver itself. -atm.txt - - info on where to get ATM programs and support for Linux. -ax25.txt - - info on using AX.25 and NET/ROM code for Linux -baycom.txt - - info on the driver for Baycom style amateur radio modems -bonding.txt - - Linux Ethernet Bonding Driver HOWTO: link aggregation in Linux. -bridge.txt - - where to get user space programs for ethernet bridging with Linux. -cdc_mbim.txt - - 3G/LTE USB modem (Mobile Broadband Interface Model) -checksum-offloads.txt - - Explanation of checksum offloads; LCO, RCO -cops.txt - - info on the COPS LocalTalk Linux driver -cs89x0.txt - - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver -cxacru.txt - - Conexant AccessRunner USB ADSL Modem -cxacru-cf.py - - Conexant AccessRunner USB ADSL Modem configuration file parser -cxgb.txt - - Release Notes for the Chelsio N210 Linux device driver. -dccp.txt - - the Datagram Congestion Control Protocol (DCCP) (RFC 4340..42). -dctcp.txt - - DataCenter TCP congestion control -de4x5.txt - - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver -decnet.txt - - info on using the DECnet networking layer in Linux. -dl2k.txt - - README for D-Link DL2000-based Gigabit Ethernet Adapters (dl2k.ko). -dm9000.txt - - README for the Simtec DM9000 Network driver. -dmfe.txt - - info on the Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver. -dns_resolver.txt - - The DNS resolver module allows kernel servies to make DNS queries. -driver.txt - - Softnet driver issues. -ena.txt - - info on Amazon's Elastic Network Adapter (ENA) -e100.txt - - info on Intel's EtherExpress PRO/100 line of 10/100 boards -e1000.txt - - info on Intel's E1000 line of gigabit ethernet boards -e1000e.txt - - README for the Intel Gigabit Ethernet Driver (e1000e). -eql.txt - - serial IP load balancing -fib_trie.txt - - Level Compressed Trie (LC-trie) notes: a structure for routing. -filter.txt - - Linux Socket Filtering -fore200e.txt - - FORE Systems PCA-200E/SBA-200E ATM NIC driver info. -framerelay.txt - - info on using Frame Relay/Data Link Connection Identifier (DLCI). -gen_stats.txt - - Generic networking statistics for netlink users. -generic-hdlc.txt - - The generic High Level Data Link Control (HDLC) layer. -generic_netlink.txt - - info on Generic Netlink -gianfar.txt - - Gianfar Ethernet Driver. -i40e.txt - - README for the Intel Ethernet Controller XL710 Driver (i40e). -i40evf.txt - - Short note on the Driver for the Intel(R) XL710 X710 Virtual Function -ieee802154.txt - - Linux IEEE 802.15.4 implementation, API and drivers -igb.txt - - README for the Intel Gigabit Ethernet Driver (igb). -igbvf.txt - - README for the Intel Gigabit Ethernet Driver (igbvf). -ip-sysctl.txt - - /proc/sys/net/ipv4/* variables -ip_dynaddr.txt - - IP dynamic address hack e.g. for auto-dialup links -ipddp.txt - - AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation -iphase.txt - - Interphase PCI ATM (i)Chip IA Linux driver info. -ipsec.txt - - Note on not compressing IPSec payload and resulting failed policy check. -ipv6.txt - - Options to the ipv6 kernel module. -ipvs-sysctl.txt - - Per-inode explanation of the /proc/sys/net/ipv4/vs interface. -irda.txt - - where to get IrDA (infrared) utilities and info for Linux. -ixgb.txt - - README for the Intel 10 Gigabit Ethernet Driver (ixgb). -ixgbe.txt - - README for the Intel 10 Gigabit Ethernet Driver (ixgbe). -ixgbevf.txt - - README for the Intel Virtual Function (VF) Driver (ixgbevf). -l2tp.txt - - User guide to the L2TP tunnel protocol. -lapb-module.txt - - programming information of the LAPB module. -ltpc.txt - - the Apple or Farallon LocalTalk PC card driver -mac80211-auth-assoc-deauth.txt - - authentication and association / deauth-disassoc with max80211 -mac80211-injection.txt - - HOWTO use packet injection with mac80211 -multiqueue.txt - - HOWTO for multiqueue network device support. -netconsole.txt - - The network console module netconsole.ko: configuration and notes. -netdev-features.txt - - Network interface features API description. -netdevices.txt - - info on network device driver functions exported to the kernel. -netif-msg.txt - - Design of the network interface message level setting (NETIF_MSG_*). -netlink_mmap.txt - - memory mapped I/O with netlink -nf_conntrack-sysctl.txt - - list of netfilter-sysctl knobs. -nfc.txt - - The Linux Near Field Communication (NFS) subsystem. -openvswitch.txt - - Open vSwitch developer documentation. -operstates.txt - - Overview of network interface operational states. -packet_mmap.txt - - User guide to memory mapped packet socket rings (PACKET_[RT]X_RING). -phonet.txt - - The Phonet packet protocol used in Nokia cellular modems. -phy.txt - - The PHY abstraction layer. -pktgen.txt - - User guide to the kernel packet generator (pktgen.ko). -policy-routing.txt - - IP policy-based routing -ppp_generic.txt - - Information about the generic PPP driver. -proc_net_tcp.txt - - Per inode overview of the /proc/net/tcp and /proc/net/tcp6 interfaces. -radiotap-headers.txt - - Background on radiotap headers. -ray_cs.txt - - Raylink Wireless LAN card driver info. -rds.txt - - Background on the reliable, ordered datagram delivery method RDS. -regulatory.txt - - Overview of the Linux wireless regulatory infrastructure. -rxrpc.txt - - Guide to the RxRPC protocol. -s2io.txt - - Release notes for Neterion Xframe I/II 10GbE driver. -scaling.txt - - Explanation of network scaling techniques: RSS, RPS, RFS, aRFS, XPS. -sctp.txt - - Notes on the Linux kernel implementation of the SCTP protocol. -secid.txt - - Explanation of the secid member in flow structures. -skfp.txt - - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. -smc9.txt - - the driver for SMC's 9000 series of Ethernet cards -spider_net.txt - - README for the Spidernet Driver (as found in PS3 / Cell BE). -stmmac.txt - - README for the STMicro Synopsys Ethernet driver. -tc-actions-env-rules.txt - - rules for traffic control (tc) actions. -timestamping.txt - - overview of network packet timestamping variants. -tcp.txt - - short blurb on how TCP output takes place. -tcp-thin.txt - - kernel tuning options for low rate 'thin' TCP streams. -team.txt - - pointer to information for ethernet teaming devices. -tlan.txt - - ThunderLAN (Compaq Netelligent 10/100, Olicom OC-2xxx) driver info. -tproxy.txt - - Transparent proxy support user guide. -tuntap.txt - - TUN/TAP device driver, allowing user space Rx/Tx of packets. -udplite.txt - - UDP-Lite protocol (RFC 3828) introduction. -vortex.txt - - info on using 3Com Vortex (3c590, 3c592, 3c595, 3c597) Ethernet cards. -vxge.txt - - README for the Neterion X3100 PCIe Server Adapter. -vxlan.txt - - Virtual extensible LAN overview -x25.txt - - general info on X.25 development. -x25-iface.txt - - description of the X.25 Packet Layer to LAPB device interface. -xfrm_device.txt - - description of XFRM offload API -xfrm_proc.txt - - description of the statistics package for XFRM. -xfrm_sync.txt - - sync patches for XFRM enable migration of an SA between hosts. -xfrm_sysctl.txt - - description of the XFRM configuration options. -z8530drv.txt - - info about Linux driver for Z8530 based HDLC cards for AX.25 diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst index ff929cfab4f4..4ae4f9d8f8fe 100644 --- a/Documentation/networking/af_xdp.rst +++ b/Documentation/networking/af_xdp.rst @@ -159,8 +159,8 @@ log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050 and 3000 refers to the same chunk. -UMEM Completetion Ring -~~~~~~~~~~~~~~~~~~~~~~ +UMEM Completion Ring +~~~~~~~~~~~~~~~~~~~~ The Completion Ring is used transfer ownership of UMEM frames from kernel-space to user-space. Just like the Fill ring, UMEM indicies are diff --git a/Documentation/networking/defza.txt b/Documentation/networking/defza.txt new file mode 100644 index 000000000000..663e4a906751 --- /dev/null +++ b/Documentation/networking/defza.txt @@ -0,0 +1,57 @@ +Notes on the DEC FDDIcontroller 700 (DEFZA-xx) driver v.1.1.4. + + +DEC FDDIcontroller 700 is DEC's first-generation TURBOchannel FDDI +network card, designed in 1990 specifically for the DECstation 5000 +model 200 workstation. The board is a single attachment station and +it was manufactured in two variations, both of which are supported. + +First is the SAS MMF DEFZA-AA option, the original design implementing +the standard MMF-PMD, however with a pair of ST connectors rather than +the usual MIC connector. The other one is the SAS ThinWire/STP DEFZA-CA +option, denoted 700-C, with the network medium selectable by a switch +between the DEC proprietary ThinWire-PMD using a BNC connector and the +standard STP-PMD using a DE-9F connector. This option can interface to +a DECconcentrator 500 device and, in the case of the STP-PMD, also other +FDDI equipment and was designed to make it easier to transition from +existing IEEE 802.3 10BASE2 Ethernet and IEEE 802.5 Token Ring networks +by providing means to reuse existing cabling. + +This driver handles any number of cards installed in a single system. +They get fddi0, fddi1, etc. interface names assigned in the order of +increasing TURBOchannel slot numbers. + +The board only supports DMA on the receive side. Transmission involves +the use of PIO. As a result under a heavy transmission load there will +be a significant impact on system performance. + +The board supports a 64-entry CAM for matching destination addresses. +Two entries are preoccupied by the Directed Beacon and Ring Purger +multicast addresses and the rest is used as a multicast filter. An +all-multi mode is also supported for LLC frames and it is used if +requested explicitly or if the CAM overflows. The promiscuous mode +supports separate enables for LLC and SMT frames, but this driver +doesn't support changing them individually. + + +Known problems: + +None. + + +To do: + +5. MAC address change. The card does not support changing the Media + Access Controller's address registers but a similar effect can be + achieved by adding an alias to the CAM. There is no way to disable + matching against the original address though. + +7. Queueing incoming/outgoing SMT frames in the driver if the SMT + receive/RMC transmit ring is full. (?) + +8. Retrieving/reporting FDDI/SNMP stats. + + +Both success and failure reports are welcome. + +Maciej W. Rozycki <macro@linux-mips.org> diff --git a/Documentation/networking/devlink-params-bnxt.txt b/Documentation/networking/devlink-params-bnxt.txt new file mode 100644 index 000000000000..481aa303d5b4 --- /dev/null +++ b/Documentation/networking/devlink-params-bnxt.txt @@ -0,0 +1,18 @@ +enable_sriov [DEVICE, GENERIC] + Configuration mode: Permanent + +ignore_ari [DEVICE, GENERIC] + Configuration mode: Permanent + +msix_vec_per_pf_max [DEVICE, GENERIC] + Configuration mode: Permanent + +msix_vec_per_pf_min [DEVICE, GENERIC] + Configuration mode: Permanent + +gre_ver_check [DEVICE, DRIVER-SPECIFIC] + Generic Routing Encapsulation (GRE) version check will + be enabled in the device. If disabled, device skips + version checking for incoming packets. + Type: Boolean + Configuration mode: Permanent diff --git a/Documentation/networking/devlink-params.txt b/Documentation/networking/devlink-params.txt new file mode 100644 index 000000000000..ae444ffe73ac --- /dev/null +++ b/Documentation/networking/devlink-params.txt @@ -0,0 +1,42 @@ +Devlink configuration parameters +================================ +Following is the list of configuration parameters via devlink interface. +Each parameter can be generic or driver specific and are device level +parameters. + +Note that the driver-specific files should contain the generic params +they support to, with supported config modes. + +Each parameter can be set in different configuration modes: + runtime - set while driver is running, no reset required. + driverinit - applied while driver initializes, requires restart + driver by devlink reload command. + permanent - written to device's non-volatile memory, hard reset + required. + +Following is the list of parameters: +==================================== +enable_sriov [DEVICE, GENERIC] + Enable Single Root I/O Virtualisation (SRIOV) in + the device. + Type: Boolean + +ignore_ari [DEVICE, GENERIC] + Ignore Alternative Routing-ID Interpretation (ARI) + capability. If enabled, adapter will ignore ARI + capability even when platforms has the support + enabled and creates same number of partitions when + platform does not support ARI. + Type: Boolean + +msix_vec_per_pf_max [DEVICE, GENERIC] + Provides the maximum number of MSIX interrupts that + a device can create. Value is same across all + physical functions (PFs) in the device. + Type: u32 + +msix_vec_per_pf_min [DEVICE, GENERIC] + Provides the minimum number of MSIX interrupts required + for the device initialization. Value is same across all + physical functions (PFs) in the device. + Type: u32 diff --git a/Documentation/networking/dpaa2/ethernet-driver.rst b/Documentation/networking/dpaa2/ethernet-driver.rst new file mode 100644 index 000000000000..90ec940749e8 --- /dev/null +++ b/Documentation/networking/dpaa2/ethernet-driver.rst @@ -0,0 +1,185 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> + +=============================== +DPAA2 Ethernet driver +=============================== + +:Copyright: |copy| 2017-2018 NXP + +This file provides documentation for the Freescale DPAA2 Ethernet driver. + +Supported Platforms +=================== +This driver provides networking support for Freescale DPAA2 SoCs, e.g. +LS2080A, LS2088A, LS1088A. + + +Architecture Overview +===================== +Unlike regular NICs, in the DPAA2 architecture there is no single hardware block +representing network interfaces; instead, several separate hardware resources +concur to provide the networking functionality: + +- network interfaces +- queues, channels +- buffer pools +- MAC/PHY + +All hardware resources are allocated and configured through the Management +Complex (MC) portals. MC abstracts most of these resources as DPAA2 objects +and exposes ABIs through which they can be configured and controlled. A few +hardware resources, like queues, do not have a corresponding MC object and +are treated as internal resources of other objects. + +For a more detailed description of the DPAA2 architecture and its object +abstractions see *Documentation/networking/dpaa2/overview.rst*. + +Each Linux net device is built on top of a Datapath Network Interface (DPNI) +object and uses Buffer Pools (DPBPs), I/O Portals (DPIOs) and Concentrators +(DPCONs). + +Configuration interface:: + + ----------------------- + | DPAA2 Ethernet Driver | + ----------------------- + . . . + . . . + . . . . . . . . . . . . + . . . + . . . + ---------- ---------- ----------- + | DPBP API | | DPNI API | | DPCON API | + ---------- ---------- ----------- + . . . software + ======= . ========== . ============ . =================== + . . . hardware + ------------------------------------------ + | MC hardware portals | + ------------------------------------------ + . . . + . . . + ------ ------ ------- + | DPBP | | DPNI | | DPCON | + ------ ------ ------- + +The DPNIs are network interfaces without a direct one-on-one mapping to PHYs. +DPBPs represent hardware buffer pools. Packet I/O is performed in the context +of DPCON objects, using DPIO portals for managing and communicating with the +hardware resources. + +Datapath (I/O) interface:: + + ----------------------------------------------- + | DPAA2 Ethernet Driver | + ----------------------------------------------- + | ^ ^ | | + | | | | | + enqueue| dequeue| data | dequeue| seed | + (Tx) | (Rx, TxC)| avail.| request| buffers| + | | notify| | | + | | | | | + V | | V V + ----------------------------------------------- + | DPIO Driver | + ----------------------------------------------- + | | | | | software + | | | | | ================ + | | | | | hardware + ----------------------------------------------- + | I/O hardware portals | + ----------------------------------------------- + | ^ ^ | | + | | | | | + | | | V | + V | ================ V + ---------------------- | ------------- + queues ---------------------- | | Buffer pool | + ---------------------- | ------------- + ======================= + Channel + +Datapath I/O (DPIO) portals provide enqueue and dequeue services, data +availability notifications and buffer pool management. DPIOs are shared between +all DPAA2 objects (and implicitly all DPAA2 kernel drivers) that work with data +frames, but must be affine to the CPUs for the purpose of traffic distribution. + +Frames are transmitted and received through hardware frame queues, which can be +grouped in channels for the purpose of hardware scheduling. The Ethernet driver +enqueues TX frames on egress queues and after transmission is complete a TX +confirmation frame is sent back to the CPU. + +When frames are available on ingress queues, a data availability notification +is sent to the CPU; notifications are raised per channel, so even if multiple +queues in the same channel have available frames, only one notification is sent. +After a channel fires a notification, is must be explicitly rearmed. + +Each network interface can have multiple Rx, Tx and confirmation queues affined +to CPUs, and one channel (DPCON) for each CPU that services at least one queue. +DPCONs are used to distribute ingress traffic to different CPUs via the cores' +affine DPIOs. + +The role of hardware buffer pools is storage of ingress frame data. Each network +interface has a privately owned buffer pool which it seeds with kernel allocated +buffers. + + +DPNIs are decoupled from PHYs; a DPNI can be connected to a PHY through a DPMAC +object or to another DPNI through an internal link, but the connection is +managed by MC and completely transparent to the Ethernet driver. + +:: + + --------- --------- --------- + | eth if1 | | eth if2 | | eth ifn | + --------- --------- --------- + . . . + . . . + . . . + --------------------------- + | DPAA2 Ethernet Driver | + --------------------------- + . . . + . . . + . . . + ------ ------ ------ ------- + | DPNI | | DPNI | | DPNI | | DPMAC |----+ + ------ ------ ------ ------- | + | | | | | + | | | | ----- + =========== ================== | PHY | + ----- + +Creating a Network Interface +============================ +A net device is created for each DPNI object probed on the MC bus. Each DPNI has +a number of properties which determine the network interface configuration +options and associated hardware resources. + +DPNI objects (and the other DPAA2 objects needed for a network interface) can be +added to a container on the MC bus in one of two ways: statically, through a +Datapath Layout Binary file (DPL) that is parsed by MC at boot time; or created +dynamically at runtime, via the DPAA2 objects APIs. + + +Features & Offloads +=================== +Hardware checksum offloading is supported for TCP and UDP over IPv4/6 frames. +The checksum offloads can be independently configured on RX and TX through +ethtool. + +Hardware offload of unicast and multicast MAC filtering is supported on the +ingress path and permanently enabled. + +Scatter-gather frames are supported on both RX and TX paths. On TX, SG support +is configurable via ethtool; on RX it is always enabled. + +The DPAA2 hardware can process jumbo Ethernet frames of up to 10K bytes. + +The Ethernet driver defines a static flow hashing scheme that distributes +traffic based on a 5-tuple key: src IP, dst IP, IP proto, L4 src port, +L4 dst port. No user configuration is supported for now. + +Hardware specific statistics for the network interface as well as some +non-standard driver stats can be consulted through ethtool -S option. diff --git a/Documentation/networking/dpaa2/index.rst b/Documentation/networking/dpaa2/index.rst index 10bea113a7bc..67bd87fe6c53 100644 --- a/Documentation/networking/dpaa2/index.rst +++ b/Documentation/networking/dpaa2/index.rst @@ -7,3 +7,4 @@ DPAA2 Documentation overview dpio-driver + ethernet-driver diff --git a/Documentation/networking/e100.rst b/Documentation/networking/e100.rst index f81111eba9c5..5e2839b4ec92 100644 --- a/Documentation/networking/e100.rst +++ b/Documentation/networking/e100.rst @@ -1,4 +1,5 @@ -============================================================== +.. SPDX-License-Identifier: GPL-2.0+ + Linux* Base Driver for the Intel(R) PRO/100 Family of Adapters ============================================================== diff --git a/Documentation/networking/e1000.rst b/Documentation/networking/e1000.rst index f10dd4086921..6379d4d20771 100644 --- a/Documentation/networking/e1000.rst +++ b/Documentation/networking/e1000.rst @@ -1,4 +1,5 @@ -=========================================================== +.. SPDX-License-Identifier: GPL-2.0+ + Linux* Base Driver for Intel(R) Ethernet Network Connection =========================================================== diff --git a/Documentation/networking/e1000e.rst b/Documentation/networking/e1000e.rst new file mode 100644 index 000000000000..33554e5416c5 --- /dev/null +++ b/Documentation/networking/e1000e.rst @@ -0,0 +1,382 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Driver for Intel(R) Ethernet Network Connection +====================================================== + +Intel Gigabit Linux driver. +Copyright(c) 2008-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Command Line Parameters +- Additional Configurations +- Support + + +Identifying Your Adapter +======================== +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +https://www.intel.com/support + + +Command Line Parameters +======================= +If the driver is built as a module, the following optional parameters are used +by entering them on the command line with the modprobe command using this +syntax:: + + modprobe e1000e [<option>=<VAL1>,<VAL2>,...] + +There needs to be a <VAL#> for each network port in the system supported by +this driver. The values will be applied to each instance, in function order. +For example:: + + modprobe e1000e InterruptThrottleRate=16000,16000 + +In this case, there are two network ports supported by e1000e in the system. +The default value for each parameter is generally the recommended setting, +unless otherwise noted. + +NOTE: A descriptor describes a data buffer and attributes related to the data +buffer. This information is accessed by the hardware. + +InterruptThrottleRate +--------------------- +:Valid Range: 0,1,3,4,100-100000 +:Default Value: 3 + +Interrupt Throttle Rate controls the number of interrupts each interrupt +vector can generate per second. Increasing ITR lowers latency at the cost of +increased CPU utilization, though it may help throughput in some circumstances. + +Setting InterruptThrottleRate to a value greater or equal to 100 +will program the adapter to send out a maximum of that many interrupts +per second, even if more packets have come in. This reduces interrupt +load on the system and can lower CPU utilization under heavy load, +but will increase latency as packets are not processed as quickly. + +The default behaviour of the driver previously assumed a static +InterruptThrottleRate value of 8000, providing a good fallback value for +all traffic types, but lacking in small packet performance and latency. +The hardware can handle many more small packets per second however, and +for this reason an adaptive interrupt moderation algorithm was implemented. + +The driver has two adaptive modes (setting 1 or 3) in which +it dynamically adjusts the InterruptThrottleRate value based on the traffic +that it receives. After determining the type of incoming traffic in the last +timeframe, it will adjust the InterruptThrottleRate to an appropriate value +for that traffic. + +The algorithm classifies the incoming traffic every interval into +classes. Once the class is determined, the InterruptThrottleRate value is +adjusted to suit that traffic type the best. There are three classes defined: +"Bulk traffic", for large amounts of packets of normal size; "Low latency", +for small amounts of traffic and/or a significant percentage of small +packets; and "Lowest latency", for almost completely small packets or +minimal traffic. + + - 0: Off + Turns off any interrupt moderation and may improve small packet latency. + However, this is generally not suitable for bulk throughput traffic due + to the increased CPU utilization of the higher interrupt rate. + - 1: Dynamic mode + This mode attempts to moderate interrupts per vector while maintaining + very low latency. This can sometimes cause extra CPU utilization. If + planning on deploying e1000e in a latency sensitive environment, this + parameter should be considered. + - 3: Dynamic Conservative mode (default) + In dynamic conservative mode, the InterruptThrottleRate value is set to + 4000 for traffic that falls in class "Bulk traffic". If traffic falls in + the "Low latency" or "Lowest latency" class, the InterruptThrottleRate is + increased stepwise to 20000. This default mode is suitable for most + applications. + - 4: Simplified Balancing mode + In simplified mode the interrupt rate is based on the ratio of TX and + RX traffic. If the bytes per second rate is approximately equal, the + interrupt rate will drop as low as 2000 interrupts per second. If the + traffic is mostly transmit or mostly receive, the interrupt rate could + be as high as 8000. + - 100-100000: + Setting InterruptThrottleRate to a value greater or equal to 100 + will program the adapter to send at most that many interrupts per second, + even if more packets have come in. This reduces interrupt load on the + system and can lower CPU utilization under heavy load, but will increase + latency as packets are not processed as quickly. + +NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and +RxAbsIntDelay parameters. In other words, minimizing the receive and/or +transmit absolute delays does not force the controller to generate more +interrupts than what the Interrupt Throttle Rate allows. + +RxIntDelay +---------- +:Valid Range: 0-65535 (0=off) +:Default Value: 0 + +This value delays the generation of receive interrupts in units of 1.024 +microseconds. Receive interrupt reduction can improve CPU efficiency if +properly tuned for specific network traffic. Increasing this value adds extra +latency to frame reception and can end up decreasing the throughput of TCP +traffic. If the system is reporting dropped receives, this value may be set +too high, causing the driver to run out of available receive descriptors. + +CAUTION: When setting RxIntDelay to a value other than 0, adapters may hang +(stop transmitting) under certain network conditions. If this occurs a NETDEV +WATCHDOG message is logged in the system event log. In addition, the +controller is automatically reset, restoring the network connection. To +eliminate the potential for the hang ensure that RxIntDelay is set to 0. + +RxAbsIntDelay +------------- +:Valid Range: 0-65535 (0=off) +:Default Value: 8 + +This value, in units of 1.024 microseconds, limits the delay in which a +receive interrupt is generated. This value ensures that an interrupt is +generated after the initial packet is received within the set amount of time, +which is useful only if RxIntDelay is non-zero. Proper tuning, along with +RxIntDelay, may improve traffic throughput in specific network conditions. + +TxIntDelay +---------- +:Valid Range: 0-65535 (0=off) +:Default Value: 8 + +This value delays the generation of transmit interrupts in units of 1.024 +microseconds. Transmit interrupt reduction can improve CPU efficiency if +properly tuned for specific network traffic. If the system is reporting +dropped transmits, this value may be set too high causing the driver to run +out of available transmit descriptors. + +TxAbsIntDelay +------------- +:Valid Range: 0-65535 (0=off) +:Default Value: 32 + +This value, in units of 1.024 microseconds, limits the delay in which a +transmit interrupt is generated. It is useful only if TxIntDelay is non-zero. +It ensures that an interrupt is generated after the initial Packet is sent on +the wire within the set amount of time. Proper tuning, along with TxIntDelay, +may improve traffic throughput in specific network conditions. + +copybreak +--------- +:Valid Range: 0-xxxxxxx (0=off) +:Default Value: 256 + +The driver copies all packets below or equaling this size to a fresh receive +buffer before handing it up the stack. +This parameter differs from other parameters because it is a single (not 1,1,1 +etc.) parameter applied to all driver instances and it is also available +during runtime at /sys/module/e1000e/parameters/copybreak. + +To use copybreak, type:: + + modprobe e1000e.ko copybreak=128 + +SmartPowerDownEnable +-------------------- +:Valid Range: 0,1 +:Default Value: 0 (disabled) + +Allows the PHY to turn off in lower power states. The user can turn off this +parameter in supported chipsets. + +KumeranLockLoss +--------------- +:Valid Range: 0,1 +:Default Value: 1 (enabled) + +This workaround skips resetting the PHY at shutdown for the initial silicon +releases of ICH8 systems. + +IntMode +------- +:Valid Range: 0-2 +:Default Value: 0 + + +-------+----------------+ + | Value | Interrupt Mode | + +=======+================+ + | 0 | Legacy | + +-------+----------------+ + | 1 | MSI | + +-------+----------------+ + | 2 | MSI-X | + +-------+----------------+ + +IntMode allows load time control over the type of interrupt registered for by +the driver. MSI-X is required for multiple queue support, and some kernels and +combinations of kernel .config options will force a lower level of interrupt +support. + +This command will show different values for each type of interrupt:: + + cat /proc/interrupts + +CrcStripping +------------ +:Valid Range: 0,1 +:Default Value: 1 (enabled) + +Strip the CRC from received packets before sending up the network stack. If +you have a machine with a BMC enabled but cannot receive IPMI traffic after +loading or enabling the driver, try disabling this feature. + +WriteProtectNVM +--------------- +:Valid Range: 0,1 +:Default Value: 1 (enabled) + +If set to 1, configure the hardware to ignore all write/erase cycles to the +GbE region in the ICHx NVM (in order to prevent accidental corruption of the +NVM). This feature can be disabled by setting the parameter to 0 during initial +driver load. + +NOTE: The machine must be power cycled (full off/on) when enabling NVM writes +via setting the parameter to zero. Once the NVM has been locked (via the +parameter at 1 when the driver loads) it cannot be unlocked except via power +cycle. + +Debug +----- +:Valid Range: 0-16 (0=none,...,16=all) +:Default Value: 0 + +This parameter adjusts the level of debug messages displayed in the system logs. + + +Additional Features and Configurations +====================================== + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file: + +- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x> +- For SLES: /etc/sysconfig/network/<config_file> + +NOTE: The maximum MTU setting for Jumbo Frames is 8996. This value coincides +with the maximum Jumbo Frames size of 9018 bytes. + +NOTE: Using Jumbo frames at 10 or 100 Mbps is not supported and may result in +poor performance or loss of link. + +NOTE: The following adapters limit Jumbo Frames sized packets to a maximum of +4088 bytes: + + - Intel(R) 82578DM Gigabit Network Connection + - Intel(R) 82577LM Gigabit Network Connection + +The following adapters do not support Jumbo Frames: + + - Intel(R) PRO/1000 Gigabit Server Adapter + - Intel(R) PRO/1000 PM Network Connection + - Intel(R) 82562G 10/100 Network Connection + - Intel(R) 82562G-2 10/100 Network Connection + - Intel(R) 82562GT 10/100 Network Connection + - Intel(R) 82562GT-2 10/100 Network Connection + - Intel(R) 82562V 10/100 Network Connection + - Intel(R) 82562V-2 10/100 Network Connection + - Intel(R) 82566DC Gigabit Network Connection + - Intel(R) 82566DC-2 Gigabit Network Connection + - Intel(R) 82566DM Gigabit Network Connection + - Intel(R) 82566MC Gigabit Network Connection + - Intel(R) 82566MM Gigabit Network Connection + - Intel(R) 82567V-3 Gigabit Network Connection + - Intel(R) 82577LC Gigabit Network Connection + - Intel(R) 82578DC Gigabit Network Connection + +NOTE: Jumbo Frames cannot be configured on an 82579-based Network device if +MACSec is enabled on the system. + + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: + +https://www.kernel.org/pub/software/network/ethtool/ + +NOTE: When validating enable/disable tests on some parts (for example, 82578), +it is necessary to add a few seconds between tests when working with ethtool. + + +Speed and Duplex Configuration +------------------------------ +In addressing speed and duplex configuration issues, you need to distinguish +between copper-based adapters and fiber-based adapters. + +In the default mode, an Intel(R) Ethernet Network Adapter using copper +connections will attempt to auto-negotiate with its link partner to determine +the best setting. If the adapter cannot establish link with the link partner +using auto-negotiation, you may need to manually configure the adapter and link +partner to identical settings to establish link and pass packets. This should +only be needed when attempting to link with an older switch that does not +support auto-negotiation or one that has been forced to a specific speed or +duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds +and higher cannot be forced. Use the autonegotiation advertising setting to +manually set devices for 1 Gbps and higher. + +Speed, duplex, and autonegotiation advertising are configured through the +ethtool* utility. + +Caution: Only experienced network administrators should force speed and duplex +or change autonegotiation advertising manually. The settings at the switch must +always match the adapter settings. Adapter performance may suffer or your +adapter may not operate if you configure the adapter differently from your +switch. + +An Intel(R) Ethernet Network Adapter using fiber-based connections, however, +will not attempt to auto-negotiate with its link partner since those adapters +operate only in full duplex and only at their native speed. + + +Enabling Wake on LAN* (WoL) +--------------------------- +WoL is configured through the ethtool* utility. + +WoL will be enabled on the system during the next shut down or reboot. For +this driver version, in order to enable WoL, the e1000e driver must be loaded +prior to shutting down or suspending the system. + +NOTE: Wake on LAN is only supported on port A for the following devices: +- Intel(R) PRO/1000 PT Dual Port Network Connection +- Intel(R) PRO/1000 PT Dual Port Server Connection +- Intel(R) PRO/1000 PT Dual Port Server Adapter +- Intel(R) PRO/1000 PF Dual Port Server Adapter +- Intel(R) PRO/1000 PT Quad Port Server Adapter +- Intel(R) Gigabit PT Quad Port Server ExpressModule + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/e1000e.txt b/Documentation/networking/e1000e.txt deleted file mode 100644 index 12089547baed..000000000000 --- a/Documentation/networking/e1000e.txt +++ /dev/null @@ -1,312 +0,0 @@ -Linux* Driver for Intel(R) Ethernet Network Connection -====================================================== - -Intel Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Command Line Parameters -- Additional Configurations -- Support - -Identifying Your Adapter -======================== - -The e1000e driver supports all PCI Express Intel(R) Gigabit Network -Connections, except those that are 82575, 82576 and 82580-based*. - -* NOTE: The Intel(R) PRO/1000 P Dual Port Server Adapter is supported by - the e1000 driver, not the e1000e driver due to the 82546 part being used - behind a PCI Express bridge. - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -For the latest Intel network drivers for Linux, refer to the following -website. In the search field, enter your adapter name or type, or use the -networking link on the left to search for your adapter: - - http://support.intel.com/support/go/network/adapter/home.htm - -Command Line Parameters -======================= - -The default value for each parameter is generally the recommended setting, -unless otherwise noted. - -NOTES: For more information about the InterruptThrottleRate, - RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay - parameters, see the application note at: - http://www.intel.com/design/network/applnots/ap450.htm - -InterruptThrottleRate ---------------------- -Valid Range: 0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative, - 4=simplified balancing) -Default Value: 3 - -The driver can limit the amount of interrupts per second that the adapter -will generate for incoming packets. It does this by writing a value to the -adapter that is based on the maximum amount of interrupts that the adapter -will generate per second. - -Setting InterruptThrottleRate to a value greater or equal to 100 -will program the adapter to send out a maximum of that many interrupts -per second, even if more packets have come in. This reduces interrupt -load on the system and can lower CPU utilization under heavy load, -but will increase latency as packets are not processed as quickly. - -The default behaviour of the driver previously assumed a static -InterruptThrottleRate value of 8000, providing a good fallback value for -all traffic types, but lacking in small packet performance and latency. -The hardware can handle many more small packets per second however, and -for this reason an adaptive interrupt moderation algorithm was implemented. - -The driver has two adaptive modes (setting 1 or 3) in which -it dynamically adjusts the InterruptThrottleRate value based on the traffic -that it receives. After determining the type of incoming traffic in the last -timeframe, it will adjust the InterruptThrottleRate to an appropriate value -for that traffic. - -The algorithm classifies the incoming traffic every interval into -classes. Once the class is determined, the InterruptThrottleRate value is -adjusted to suit that traffic type the best. There are three classes defined: -"Bulk traffic", for large amounts of packets of normal size; "Low latency", -for small amounts of traffic and/or a significant percentage of small -packets; and "Lowest latency", for almost completely small packets or -minimal traffic. - -In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 -for traffic that falls in class "Bulk traffic". If traffic falls in the "Low -latency" or "Lowest latency" class, the InterruptThrottleRate is increased -stepwise to 20000. This default mode is suitable for most applications. - -For situations where low latency is vital such as cluster or -grid computing, the algorithm can reduce latency even more when -InterruptThrottleRate is set to mode 1. In this mode, which operates -the same as mode 3, the InterruptThrottleRate will be increased stepwise to -70000 for traffic in class "Lowest latency". - -In simplified mode the interrupt rate is based on the ratio of TX and -RX traffic. If the bytes per second rate is approximately equal, the -interrupt rate will drop as low as 2000 interrupts per second. If the -traffic is mostly transmit or mostly receive, the interrupt rate could -be as high as 8000. - -Setting InterruptThrottleRate to 0 turns off any interrupt moderation -and may improve small packet latency, but is generally not suitable -for bulk throughput traffic. - -NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and - RxAbsIntDelay parameters. In other words, minimizing the receive - and/or transmit absolute delays does not force the controller to - generate more interrupts than what the Interrupt Throttle Rate - allows. - -NOTE: When e1000e is loaded with default settings and multiple adapters - are in use simultaneously, the CPU utilization may increase non- - linearly. In order to limit the CPU utilization without impacting - the overall throughput, we recommend that you load the driver as - follows: - - modprobe e1000e InterruptThrottleRate=3000,3000,3000 - - This sets the InterruptThrottleRate to 3000 interrupts/sec for - the first, second, and third instances of the driver. The range - of 2000 to 3000 interrupts per second works on a majority of - systems and is a good starting point, but the optimal value will - be platform-specific. If CPU utilization is not a concern, use - RX_POLLING (NAPI) and default driver settings. - -RxIntDelay ----------- -Valid Range: 0-65535 (0=off) -Default Value: 0 - -This value delays the generation of receive interrupts in units of 1.024 -microseconds. Receive interrupt reduction can improve CPU efficiency if -properly tuned for specific network traffic. Increasing this value adds -extra latency to frame reception and can end up decreasing the throughput -of TCP traffic. If the system is reporting dropped receives, this value -may be set too high, causing the driver to run out of available receive -descriptors. - -CAUTION: When setting RxIntDelay to a value other than 0, adapters may - hang (stop transmitting) under certain network conditions. If - this occurs a NETDEV WATCHDOG message is logged in the system - event log. In addition, the controller is automatically reset, - restoring the network connection. To eliminate the potential - for the hang ensure that RxIntDelay is set to 0. - -RxAbsIntDelay -------------- -Valid Range: 0-65535 (0=off) -Default Value: 8 - -This value, in units of 1.024 microseconds, limits the delay in which a -receive interrupt is generated. Useful only if RxIntDelay is non-zero, -this value ensures that an interrupt is generated after the initial -packet is received within the set amount of time. Proper tuning, -along with RxIntDelay, may improve traffic throughput in specific network -conditions. - -TxIntDelay ----------- -Valid Range: 0-65535 (0=off) -Default Value: 8 - -This value delays the generation of transmit interrupts in units of -1.024 microseconds. Transmit interrupt reduction can improve CPU -efficiency if properly tuned for specific network traffic. If the -system is reporting dropped transmits, this value may be set too high -causing the driver to run out of available transmit descriptors. - -TxAbsIntDelay -------------- -Valid Range: 0-65535 (0=off) -Default Value: 32 - -This value, in units of 1.024 microseconds, limits the delay in which a -transmit interrupt is generated. Useful only if TxIntDelay is non-zero, -this value ensures that an interrupt is generated after the initial -packet is sent on the wire within the set amount of time. Proper tuning, -along with TxIntDelay, may improve traffic throughput in specific -network conditions. - -Copybreak ---------- -Valid Range: 0-xxxxxxx (0=off) -Default Value: 256 - -Driver copies all packets below or equaling this size to a fresh RX -buffer before handing it up the stack. - -This parameter is different than other parameters, in that it is a -single (not 1,1,1 etc.) parameter applied to all driver instances and -it is also available during runtime at -/sys/module/e1000e/parameters/copybreak - -SmartPowerDownEnable --------------------- -Valid Range: 0-1 -Default Value: 0 (disabled) - -Allows PHY to turn off in lower power states. The user can set this parameter -in supported chipsets. - -KumeranLockLoss ---------------- -Valid Range: 0-1 -Default Value: 1 (enabled) - -This workaround skips resetting the PHY at shutdown for the initial -silicon releases of ICH8 systems. - -IntMode -------- -Valid Range: 0-2 (0=legacy, 1=MSI, 2=MSI-X) -Default Value: 2 - -Allows changing the interrupt mode at module load time, without requiring a -recompile. If the driver load fails to enable a specific interrupt mode, the -driver will try other interrupt modes, from least to most compatible. The -interrupt order is MSI-X, MSI, Legacy. If specifying MSI (IntMode=1) -interrupts, only MSI and Legacy will be attempted. - -CrcStripping ------------- -Valid Range: 0-1 -Default Value: 1 (enabled) - -Strip the CRC from received packets before sending up the network stack. If -you have a machine with a BMC enabled but cannot receive IPMI traffic after -loading or enabling the driver, try disabling this feature. - -WriteProtectNVM ---------------- -Valid Range: 0,1 -Default Value: 1 - -If set to 1, configure the hardware to ignore all write/erase cycles to the -GbE region in the ICHx NVM (in order to prevent accidental corruption of the -NVM). This feature can be disabled by setting the parameter to 0 during initial -driver load. -NOTE: The machine must be power cycled (full off/on) when enabling NVM writes -via setting the parameter to zero. Once the NVM has been locked (via the -parameter at 1 when the driver loads) it cannot be unlocked except via power -cycle. - -Additional Configurations -========================= - - Jumbo Frames - ------------ - Jumbo Frames support is enabled by changing the MTU to a value larger than - the default of 1500. Use the ifconfig command to increase the MTU size. - For example: - - ifconfig eth<x> mtu 9000 up - - This setting is not saved across reboots. - - Notes: - - - The maximum MTU setting for Jumbo Frames is 9216. This value coincides - with the maximum Jumbo Frames size of 9234 bytes. - - - Using Jumbo frames at 10 or 100 Mbps is not supported and may result in - poor performance or loss of link. - - - Some adapters limit Jumbo Frames sized packets to a maximum of - 4096 bytes and some adapters do not support Jumbo Frames. - - - Jumbo Frames cannot be configured on an 82579-based Network device, if - MACSec is enabled on the system. - - ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. We - strongly recommend downloading the latest version of ethtool at: - - https://kernel.org/pub/software/network/ethtool/ - - NOTE: When validating enable/disable tests on some parts (82578, for example) - you need to add a few seconds between tests when working with ethtool. - - Speed and Duplex - ---------------- - Speed and Duplex are configured through the ethtool* utility. For - instructions, refer to the ethtool man page. - - Enabling Wake on LAN* (WoL) - --------------------------- - WoL is configured through the ethtool* utility. For instructions on - enabling WoL with ethtool, refer to the ethtool man page. - - WoL will be enabled on the system during the next shut down or reboot. - For this driver version, in order to enable WoL, the e1000e driver must be - loaded when shutting down or rebooting the system. - - In most cases Wake On LAN is only supported on port A for multiple port - adapters. To verify if a port supports Wake on Lan run ethtool eth<X>. - -Support -======= - -For general information, go to the Intel support website at: - - www.intel.com/support/ - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index e6b4ebb2b243..2196b824e96c 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -203,11 +203,11 @@ opcodes as defined in linux/filter.h stand for: Instruction Addressing mode Description - ld 1, 2, 3, 4, 10 Load word into A + ld 1, 2, 3, 4, 12 Load word into A ldi 4 Load word into A ldh 1, 2 Load half-word into A ldb 1, 2 Load byte into A - ldx 3, 4, 5, 10 Load word into X + ldx 3, 4, 5, 12 Load word into X ldxi 4 Load word into X ldxb 5 Load byte into X @@ -216,14 +216,14 @@ opcodes as defined in linux/filter.h stand for: jmp 6 Jump to label ja 6 Jump to label - jeq 7, 8 Jump on A == k - jneq 8 Jump on A != k - jne 8 Jump on A != k - jlt 8 Jump on A < k - jle 8 Jump on A <= k - jgt 7, 8 Jump on A > k - jge 7, 8 Jump on A >= k - jset 7, 8 Jump on A & k + jeq 7, 8, 9, 10 Jump on A == <x> + jneq 9, 10 Jump on A != <x> + jne 9, 10 Jump on A != <x> + jlt 9, 10 Jump on A < <x> + jle 9, 10 Jump on A <= <x> + jgt 7, 8, 9, 10 Jump on A > <x> + jge 7, 8, 9, 10 Jump on A >= <x> + jset 7, 8, 9, 10 Jump on A & <x> add 0, 4 A + <x> sub 0, 4 A - <x> @@ -240,7 +240,7 @@ opcodes as defined in linux/filter.h stand for: tax Copy A into X txa Copy X into A - ret 4, 9 Return + ret 4, 11 Return The next table shows addressing formats from the 2nd column: @@ -254,9 +254,11 @@ The next table shows addressing formats from the 2nd column: 5 4*([k]&0xf) Lower nibble * 4 at byte offset k in the packet 6 L Jump label L 7 #k,Lt,Lf Jump to Lt if true, otherwise jump to Lf - 8 #k,Lt Jump to Lt if predicate is true - 9 a/%a Accumulator A - 10 extension BPF extension + 8 x/%x,Lt,Lf Jump to Lt if true, otherwise jump to Lf + 9 #k,Lt Jump to Lt if predicate is true + 10 x/%x,Lt Jump to Lt if predicate is true + 11 a/%a Accumulator A + 12 extension BPF extension The Linux kernel also has a couple of BPF extensions that are used along with the class of load instructions by "overloading" the k argument with @@ -1125,6 +1127,14 @@ pointer type. The types of pointers describe their base, as follows: PTR_TO_STACK Frame pointer. PTR_TO_PACKET skb->data. PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden. + PTR_TO_SOCKET Pointer to struct bpf_sock_ops, implicitly refcounted. + PTR_TO_SOCKET_OR_NULL + Either a pointer to a socket, or NULL; socket lookup + returns this type, which becomes a PTR_TO_SOCKET when + checked != NULL. PTR_TO_SOCKET is reference-counted, + so programs must release the reference through the + socket release function before the end of the program. + Arithmetic on these pointers is forbidden. However, a pointer may be offset from this base (as a result of pointer arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable offset'. The former is used when an exactly-known value (e.g. an immediate @@ -1171,6 +1181,13 @@ over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting pointer will have a variable offset known to be 4n+2 for some n, so adding the 2 bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through that pointer are safe. +The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common +to all copies of the pointer returned from a socket lookup. This has similar +behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but +it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly +represents a reference to the corresponding 'struct sock'. To ensure that the +reference is not leaked, it is imperative to NULL-check the reference and in +the non-NULL case, and pass the valid reference to the socket release function. Direct packet access -------------------- @@ -1444,6 +1461,55 @@ Error: 8: (7a) *(u64 *)(r0 +0) = 1 R0 invalid mem access 'imm' +Program that performs a socket lookup then sets the pointer to NULL without +checking it: +value: + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_MOV64_IMM(BPF_REG_3, 4), + BPF_MOV64_IMM(BPF_REG_4, 0), + BPF_MOV64_IMM(BPF_REG_5, 0), + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), +Error: + 0: (b7) r2 = 0 + 1: (63) *(u32 *)(r10 -8) = r2 + 2: (bf) r2 = r10 + 3: (07) r2 += -8 + 4: (b7) r3 = 4 + 5: (b7) r4 = 0 + 6: (b7) r5 = 0 + 7: (85) call bpf_sk_lookup_tcp#65 + 8: (b7) r0 = 0 + 9: (95) exit + Unreleased reference id=1, alloc_insn=7 + +Program that performs a socket lookup but does not NULL-check the returned +value: + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_MOV64_IMM(BPF_REG_3, 4), + BPF_MOV64_IMM(BPF_REG_4, 0), + BPF_MOV64_IMM(BPF_REG_5, 0), + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), + BPF_EXIT_INSN(), +Error: + 0: (b7) r2 = 0 + 1: (63) *(u32 *)(r10 -8) = r2 + 2: (bf) r2 = r10 + 3: (07) r2 += -8 + 4: (b7) r3 = 4 + 5: (b7) r4 = 0 + 6: (b7) r5 = 0 + 7: (85) call bpf_sk_lookup_tcp#65 + 8: (95) exit + Unreleased reference id=1, alloc_insn=7 + Testing ------- diff --git a/Documentation/networking/fm10k.rst b/Documentation/networking/fm10k.rst new file mode 100644 index 000000000000..bf5e5942f28d --- /dev/null +++ b/Documentation/networking/fm10k.rst @@ -0,0 +1,141 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for Intel(R) Ethernet Multi-host Controller +============================================================== + +August 20, 2018 +Copyright(c) 2015-2018 Intel Corporation. + +Contents +======== +- Identifying Your Adapter +- Additional Configurations +- Performance Tuning +- Known Issues +- Support + +Identifying Your Adapter +======================== +The driver in this release is compatible with devices based on the Intel(R) +Ethernet Multi-host Controller. + +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +http://www.intel.com/support + + +Flow Control +------------ +The Intel(R) Ethernet Switch Host Interface Driver does not support Flow +Control. It will not send pause frames. This may result in dropped frames. + + +Virtual Functions (VFs) +----------------------- +Use sysfs to enable VFs. +Valid Range: 0-64 + +For example:: + + echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs //enable VFs + echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable VFs + +NOTE: Neither the device nor the driver control how VFs are mapped into config +space. Bus layout will vary by operating system. On operating systems that +support it, you can check sysfs to find the mapping. + +NOTE: When SR-IOV mode is enabled, hardware VLAN filtering and VLAN tag +stripping/insertion will remain enabled. Please remove the old VLAN filter +before the new VLAN filter is added. For example:: + + ip link set eth0 vf 0 vlan 100 // set vlan 100 for VF 0 + ip link set eth0 vf 0 vlan 0 // Delete vlan 100 + ip link set eth0 vf 0 vlan 200 // set a new vlan 200 for VF 0 + + +Additional Features and Configurations +====================================== + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file: + +- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x> +- For SLES: /etc/sysconfig/network/<config_file> + +NOTE: The maximum MTU setting for Jumbo Frames is 15342. This value coincides +with the maximum Jumbo Frames size of 15364 bytes. + +NOTE: This driver will attempt to use multiple page sized buffers to receive +each jumbo packet. This should help to avoid buffer starvation issues when +allocating receive packets. + + +Generic Receive Offload, aka GRO +-------------------------------- +The driver supports the in-kernel software implementation of GRO. GRO has +shown that by coalescing Rx traffic into larger chunks of data, CPU +utilization can be significantly reduced when under large Rx load. GRO is an +evolution of the previously-used LRO interface. GRO is able to coalesce +other protocols besides TCP. It's also safe to use with configurations that +are problematic for LRO, namely bridging and iSCSI. + + + +Supported ethtool Commands and Options for Filtering +---------------------------------------------------- +-n --show-nfc + Retrieves the receive network flow classification configurations. + +rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 + Retrieves the hash options for the specified network traffic type. + +-N --config-nfc + Configures the receive network flow classification. + +rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r + Configures the hash options for the specified network traffic type. + +- udp4: UDP over IPv4 +- udp6: UDP over IPv6 +- f Hash on bytes 0 and 1 of the Layer 4 header of the rx packet. +- n Hash on bytes 2 and 3 of the Layer 4 header of the rx packet. + + +Known Issues/Troubleshooting +============================ + +Enabling SR-IOV in a 64-bit Microsoft* Windows Server* 2012/R2 guest OS under Linux KVM +--------------------------------------------------------------------------------------- +KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This +includes traditional PCIe devices, as well as SR-IOV-capable devices based on +the Intel Ethernet Controller XL710. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/i40e.rst b/Documentation/networking/i40e.rst new file mode 100644 index 000000000000..0cc16c525d10 --- /dev/null +++ b/Documentation/networking/i40e.rst @@ -0,0 +1,770 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for the Intel(R) Ethernet Controller 700 Series +================================================================== + +Intel 40 Gigabit Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== + +- Overview +- Identifying Your Adapter +- Intel(R) Ethernet Flow Director +- Additional Configurations +- Known Issues +- Support + + +Driver information can be obtained using ethtool, lspci, and ifconfig. +Instructions on updating ethtool can be found in the section Additional +Configurations later in this document. + +For questions related to hardware requirements, refer to the documentation +supplied with your Intel adapter. All hardware requirements listed apply to use +with Linux. + + +Identifying Your Adapter +======================== +The driver is compatible with devices based on the following: + + * Intel(R) Ethernet Controller X710 + * Intel(R) Ethernet Controller XL710 + * Intel(R) Ethernet Network Connection X722 + * Intel(R) Ethernet Controller XXV710 + +For the best performance, make sure the latest NVM/FW is installed on your +device. + +For information on how to identify your adapter, and for the latest NVM/FW +images and Intel network drivers, refer to the Intel Support website: +https://www.intel.com/support + +SFP+ and QSFP+ Devices +---------------------- +For information about supported media, refer to this document: +https://www.intel.com/content/dam/www/public/us/en/documents/release-notes/xl710-ethernet-controller-feature-matrix.pdf + +NOTE: Some adapters based on the Intel(R) Ethernet Controller 700 Series only +support Intel Ethernet Optics modules. On these adapters, other modules are not +supported and will not function. In all cases Intel recommends using Intel +Ethernet Optics; other modules may function but are not validated by Intel. +Contact Intel for supported media types. + +NOTE: For connections based on Intel(R) Ethernet Controller 700 Series, support +is dependent on your system board. Please see your vendor for details. + +NOTE: In systems that do not have adequate airflow to cool the adapter and +optical modules, you must use high temperature optical modules. + +Virtual Functions (VFs) +----------------------- +Use sysfs to enable VFs. For example:: + + #echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs #enable VFs + #echo 0 > /sys/class/net/$dev/device/sriov_numvfs #disable VFs + +For example, the following instructions will configure PF eth0 and the first VF +on VLAN 10:: + + $ ip link set dev eth0 vf 0 vlan 10 + +VLAN Tag Packet Steering +------------------------ +Allows you to send all packets with a specific VLAN tag to a particular SR-IOV +virtual function (VF). Further, this feature allows you to designate a +particular VF as trusted, and allows that trusted VF to request selective +promiscuous mode on the Physical Function (PF). + +To set a VF as trusted or untrusted, enter the following command in the +Hypervisor:: + + # ip link set dev eth0 vf 1 trust [on|off] + +Once the VF is designated as trusted, use the following commands in the VM to +set the VF to promiscuous mode. + +:: + + For promiscuous all: + #ip link set eth2 promisc on + Where eth2 is a VF interface in the VM + + For promiscuous Multicast: + #ip link set eth2 allmulticast on + Where eth2 is a VF interface in the VM + +NOTE: By default, the ethtool priv-flag vf-true-promisc-support is set to +"off",meaning that promiscuous mode for the VF will be limited. To set the +promiscuous mode for the VF to true promiscuous and allow the VF to see all +ingress traffic, use the following command:: + + #ethtool -set-priv-flags p261p1 vf-true-promisc-support on + +The vf-true-promisc-support priv-flag does not enable promiscuous mode; rather, +it designates which type of promiscuous mode (limited or true) you will get +when you enable promiscuous mode using the ip link commands above. Note that +this is a global setting that affects the entire device. However,the +vf-true-promisc-support priv-flag is only exposed to the first PF of the +device. The PF remains in limited promiscuous mode (unless it is in MFP mode) +regardless of the vf-true-promisc-support setting. + +Now add a VLAN interface on the VF interface:: + + #ip link add link eth2 name eth2.100 type vlan id 100 + +Note that the order in which you set the VF to promiscuous mode and add the +VLAN interface does not matter (you can do either first). The end result in +this example is that the VF will get all traffic that is tagged with VLAN 100. + +Intel(R) Ethernet Flow Director +------------------------------- +The Intel Ethernet Flow Director performs the following tasks: + +- Directs receive packets according to their flows to different queues. +- Enables tight control on routing a flow in the platform. +- Matches flows and CPU cores for flow affinity. +- Supports multiple parameters for flexible flow classification and load + balancing (in SFP mode only). + +NOTE: The Linux i40e driver supports the following flow types: IPv4, TCPv4, and +UDPv4. For a given flow type, it supports valid combinations of IP addresses +(source or destination) and UDP/TCP ports (source and destination). For +example, you can supply only a source IP address, a source IP address and a +destination port, or any combination of one or more of these four parameters. + +NOTE: The Linux i40e driver allows you to filter traffic based on a +user-defined flexible two-byte pattern and offset by using the ethtool user-def +and mask fields. Only L3 and L4 flow types are supported for user-defined +flexible filters. For a given flow type, you must clear all Intel Ethernet Flow +Director filters before changing the input set (for that flow type). + +To enable or disable the Intel Ethernet Flow Director:: + + # ethtool -K ethX ntuple <on|off> + +When disabling ntuple filters, all the user programmed filters are flushed from +the driver cache and hardware. All needed filters must be re-added when ntuple +is re-enabled. + +To add a filter that directs packet to queue 2, use -U or -N switch:: + + # ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \ + 192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1] + +To set a filter using only the source and destination IP address:: + + # ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \ + 192.168.10.2 action 2 [loc 1] + +To see the list of filters currently present:: + + # ethtool <-u|-n> ethX + +Application Targeted Routing (ATR) Perfect Filters +-------------------------------------------------- +ATR is enabled by default when the kernel is in multiple transmit queue mode. +An ATR Intel Ethernet Flow Director filter rule is added when a TCP-IP flow +starts and is deleted when the flow ends. When a TCP-IP Intel Ethernet Flow +Director rule is added from ethtool (Sideband filter), ATR is turned off by the +driver. To re-enable ATR, the sideband can be disabled with the ethtool -K +option. For example:: + + ethtool –K [adapter] ntuple [off|on] + +If sideband is re-enabled after ATR is re-enabled, ATR remains enabled until a +TCP-IP flow is added. When all TCP-IP sideband rules are deleted, ATR is +automatically re-enabled. + +Packets that match the ATR rules are counted in fdir_atr_match stats in +ethtool, which also can be used to verify whether ATR rules still exist. + +Sideband Perfect Filters +------------------------ +Sideband Perfect Filters are used to direct traffic that matches specified +characteristics. They are enabled through ethtool's ntuple interface. To add a +new filter use the following command:: + + ethtool -U <device> flow-type <type> src-ip <ip> dst-ip <ip> src-port <port> \ + dst-port <port> action <queue> + +Where: + <device> - the ethernet device to program + <type> - can be ip4, tcp4, udp4, or sctp4 + <ip> - the ip address to match on + <port> - the port number to match on + <queue> - the queue to direct traffic towards (-1 discards matching traffic) + +Use the following command to display all of the active filters:: + + ethtool -u <device> + +Use the following command to delete a filter:: + + ethtool -U <device> delete <N> + +Where <N> is the filter id displayed when printing all the active filters, and +may also have been specified using "loc <N>" when adding the filter. + +The following example matches TCP traffic sent from 192.168.0.1, port 5300, +directed to 192.168.0.5, port 80, and sends it to queue 7:: + + ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \ + src-port 5300 dst-port 80 action 7 + +For each flow-type, the programmed filters must all have the same matching +input set. For example, issuing the following two commands is acceptable:: + + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10 + +Issuing the next two commands, however, is not acceptable, since the first +specifies src-ip and the second specifies dst-ip:: + + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10 + +The second command will fail with an error. You may program multiple filters +with the same fields, using different values, but, on one device, you may not +program two tcp4 filters with different matching fields. + +Matching on a sub-portion of a field is not supported by the i40e driver, thus +partial mask fields are not supported. + +The driver also supports matching user-defined data within the packet payload. +This flexible data is specified using the "user-def" field of the ethtool +command in the following way: + ++----------------------------+--------------------------+ +| 31 28 24 20 16 | 15 12 8 4 0 | ++----------------------------+--------------------------+ +| offset into packet payload | 2 bytes of flexible data | ++----------------------------+--------------------------+ + +For example, + +:: + + ... user-def 0x4FFFF ... + +tells the filter to look 4 bytes into the payload and match that value against +0xFFFF. The offset is based on the beginning of the payload, and not the +beginning of the packet. Thus + +:: + + flow-type tcp4 ... user-def 0x8BEAF ... + +would match TCP/IPv4 packets which have the value 0xBEAF 8 bytes into the +TCP/IPv4 payload. + +Note that ICMP headers are parsed as 4 bytes of header and 4 bytes of payload. +Thus to match the first byte of the payload, you must actually add 4 bytes to +the offset. Also note that ip4 filters match both ICMP frames as well as raw +(unknown) ip4 frames, where the payload will be the L3 payload of the IP4 frame. + +The maximum offset is 64. The hardware will only read up to 64 bytes of data +from the payload. The offset must be even because the flexible data is 2 bytes +long and must be aligned to byte 0 of the packet payload. + +The user-defined flexible offset is also considered part of the input set and +cannot be programmed separately for multiple filters of the same type. However, +the flexible data is not part of the input set and multiple filters may use the +same offset but match against different data. + +To create filters that direct traffic to a specific Virtual Function, use the +"action" parameter. Specify the action as a 64 bit value, where the lower 32 +bits represents the queue number, while the next 8 bits represent which VF. +Note that 0 is the PF, so the VF identifier is offset by 1. For example:: + + ... action 0x800000002 ... + +specifies to direct traffic to Virtual Function 7 (8 minus 1) into queue 2 of +that VF. + +Note that these filters will not break internal routing rules, and will not +route traffic that otherwise would not have been sent to the specified Virtual +Function. + +Setting the link-down-on-close Private Flag +------------------------------------------- +When the link-down-on-close private flag is set to "on", the port's link will +go down when the interface is brought down using the ifconfig ethX down command. + +Use ethtool to view and set link-down-on-close, as follows:: + + ethtool --show-priv-flags ethX + ethtool --set-priv-flags ethX link-down-on-close [on|off] + +Viewing Link Messages +--------------------- +Link messages will not be displayed to the console if the distribution is +restricting system messages. In order to see network driver link messages on +your console, set dmesg to eight by entering the following:: + + dmesg -n 8 + +NOTE: This setting is not saved across reboots. + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file:: + + /etc/sysconfig/network-scripts/ifcfg-eth<x> // for RHEL + /etc/sysconfig/network/<config_file> // for SLES + +NOTE: The maximum MTU setting for Jumbo Frames is 9702. This value coincides +with the maximum Jumbo Frames size of 9728 bytes. + +NOTE: This driver will attempt to use multiple page sized buffers to receive +each jumbo packet. This should help to avoid buffer starvation issues when +allocating receive packets. + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: +https://www.kernel.org/pub/software/network/ethtool/ + +Supported ethtool Commands and Options for Filtering +---------------------------------------------------- +-n --show-nfc + Retrieves the receive network flow classification configurations. + +rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 + Retrieves the hash options for the specified network traffic type. + +-N --config-nfc + Configures the receive network flow classification. + +rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r... + Configures the hash options for the specified network traffic type. + +udp4 UDP over IPv4 +udp6 UDP over IPv6 + +f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet. +n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet. + +Speed and Duplex Configuration +------------------------------ +In addressing speed and duplex configuration issues, you need to distinguish +between copper-based adapters and fiber-based adapters. + +In the default mode, an Intel(R) Ethernet Network Adapter using copper +connections will attempt to auto-negotiate with its link partner to determine +the best setting. If the adapter cannot establish link with the link partner +using auto-negotiation, you may need to manually configure the adapter and link +partner to identical settings to establish link and pass packets. This should +only be needed when attempting to link with an older switch that does not +support auto-negotiation or one that has been forced to a specific speed or +duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds +and higher cannot be forced. Use the autonegotiation advertising setting to +manually set devices for 1 Gbps and higher. + +NOTE: You cannot set the speed for devices based on the Intel(R) Ethernet +Network Adapter XXV710 based devices. + +Speed, duplex, and autonegotiation advertising are configured through the +ethtool* utility. + +Caution: Only experienced network administrators should force speed and duplex +or change autonegotiation advertising manually. The settings at the switch must +always match the adapter settings. Adapter performance may suffer or your +adapter may not operate if you configure the adapter differently from your +switch. + +An Intel(R) Ethernet Network Adapter using fiber-based connections, however, +will not attempt to auto-negotiate with its link partner since those adapters +operate only in full duplex and only at their native speed. + +NAPI +---- +NAPI (Rx polling mode) is supported in the i40e driver. +For more information on NAPI, see +https://wiki.linuxfoundation.org/networking/napi + +Flow Control +------------ +Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable +receiving and transmitting pause frames for i40e. When transmit is enabled, +pause frames are generated when the receive packet buffer crosses a predefined +threshold. When receive is enabled, the transmit unit will halt for the time +delay specified when a pause frame is received. + +NOTE: You must have a flow control capable link partner. + +Flow Control is on by default. + +Use ethtool to change the flow control settings. + +To enable or disable Rx or Tx Flow Control:: + + ethtool -A eth? rx <on|off> tx <on|off> + +Note: This command only enables or disables Flow Control if auto-negotiation is +disabled. If auto-negotiation is enabled, this command changes the parameters +used for auto-negotiation with the link partner. + +To enable or disable auto-negotiation:: + + ethtool -s eth? autoneg <on|off> + +Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending +on your device, you may not be able to change the auto-negotiation setting. + +RSS Hash Flow +------------- +Allows you to set the hash bytes per flow type and any combination of one or +more options for Receive Side Scaling (RSS) hash byte configuration. + +:: + + # ethtool -N <dev> rx-flow-hash <type> <option> + +Where <type> is: + tcp4 signifying TCP over IPv4 + udp4 signifying UDP over IPv4 + tcp6 signifying TCP over IPv6 + udp6 signifying UDP over IPv6 +And <option> is one or more of: + s Hash on the IP source address of the Rx packet. + d Hash on the IP destination address of the Rx packet. + f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet. + n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet. + +MAC and VLAN anti-spoofing feature +---------------------------------- +When a malicious driver attempts to send a spoofed packet, it is dropped by the +hardware and not transmitted. +NOTE: This feature can be disabled for a specific Virtual Function (VF):: + + ip link set <pf dev> vf <vf id> spoofchk {off|on} + +IEEE 1588 Precision Time Protocol (PTP) Hardware Clock (PHC) +------------------------------------------------------------ +Precision Time Protocol (PTP) is used to synchronize clocks in a computer +network. PTP support varies among Intel devices that support this driver. Use +"ethtool -T <netdev name>" to get a definitive list of PTP capabilities +supported by the device. + +IEEE 802.1ad (QinQ) Support +--------------------------- +The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN +IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as +"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks +allow L2 tunneling and the ability to segregate traffic within a particular +VLAN ID, among other uses. + +The following are examples of how to configure 802.1ad (QinQ):: + + ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24 + ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371 + +Where "24" and "371" are example VLAN IDs. + +NOTES: + Receive checksum offloads, cloud filters, and VLAN acceleration are not + supported for 802.1ad (QinQ) packets. + +VXLAN and GENEVE Overlay HW Offloading +-------------------------------------- +Virtual Extensible LAN (VXLAN) allows you to extend an L2 network over an L3 +network, which may be useful in a virtualized or cloud environment. Some +Intel(R) Ethernet Network devices perform VXLAN processing, offloading it from +the operating system. This reduces CPU utilization. + +VXLAN offloading is controlled by the Tx and Rx checksum offload options +provided by ethtool. That is, if Tx checksum offload is enabled, and the +adapter has the capability, VXLAN offloading is also enabled. + +Support for VXLAN and GENEVE HW offloading is dependent on kernel support of +the HW offloading features. + +Multiple Functions per Port +--------------------------- +Some adapters based on the Intel Ethernet Controller X710/XL710 support +multiple functions on a single physical port. Configure these functions through +the System Setup/BIOS. + +Minimum TX Bandwidth is the guaranteed minimum data transmission bandwidth, as +a percentage of the full physical port link speed, that the partition will +receive. The bandwidth the partition is awarded will never fall below the level +you specify. + +The range for the minimum bandwidth values is: +1 to ((100 minus # of partitions on the physical port) plus 1) +For example, if a physical port has 4 partitions, the range would be: +1 to ((100 - 4) + 1 = 97) + +The Maximum Bandwidth percentage represents the maximum transmit bandwidth +allocated to the partition as a percentage of the full physical port link +speed. The accepted range of values is 1-100. The value is used as a limiter, +should you chose that any one particular function not be able to consume 100% +of a port's bandwidth (should it be available). The sum of all the values for +Maximum Bandwidth is not restricted, because no more than 100% of a port's +bandwidth can ever be used. + +NOTE: X710/XXV710 devices fail to enable Max VFs (64) when Multiple Functions +per Port (MFP) and SR-IOV are enabled. An error from i40e is logged that says +"add vsi failed for VF N, aq_err 16". To workaround the issue, enable less than +64 virtual functions (VFs). + +Data Center Bridging (DCB) +-------------------------- +DCB is a configuration Quality of Service implementation in hardware. It uses +the VLAN priority tag (802.1p) to filter traffic. That means that there are 8 +different priorities that traffic can be filtered into. It also enables +priority flow control (802.1Qbb) which can limit or eliminate the number of +dropped packets during network stress. Bandwidth can be allocated to each of +these priorities, which is enforced at the hardware level (802.1Qaz). + +Adapter firmware implements LLDP and DCBX protocol agents as per 802.1AB and +802.1Qaz respectively. The firmware based DCBX agent runs in willing mode only +and can accept settings from a DCBX capable peer. Software configuration of +DCBX parameters via dcbtool/lldptool are not supported. + +NOTE: Firmware LLDP can be disabled by setting the private flag disable-fw-lldp. + +The i40e driver implements the DCB netlink interface layer to allow user-space +to communicate with the driver and query DCB configuration for the port. + +NOTE: +The kernel assumes that TC0 is available, and will disable Priority Flow +Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is +enabled when setting up DCB on your switch. + +Interrupt Rate Limiting +----------------------- +:Valid Range: 0-235 (0=no limit) + +The Intel(R) Ethernet Controller XL710 family supports an interrupt rate +limiting mechanism. The user can control, via ethtool, the number of +microseconds between interrupts. + +Syntax:: + + # ethtool -C ethX rx-usecs-high N + +The range of 0-235 microseconds provides an effective range of 4,310 to 250,000 +interrupts per second. The value of rx-usecs-high can be set independently of +rx-usecs and tx-usecs in the same ethtool command, and is also independent of +the adaptive interrupt moderation algorithm. The underlying hardware supports +granularity in 4-microsecond intervals, so adjacent values may result in the +same interrupt rate. + +One possible use case is the following:: + + # ethtool -C ethX adaptive-rx off adaptive-tx off rx-usecs-high 20 rx-usecs \ + 5 tx-usecs 5 + +The above command would disable adaptive interrupt moderation, and allow a +maximum of 5 microseconds before indicating a receive or transmit was complete. +However, instead of resulting in as many as 200,000 interrupts per second, it +limits total interrupts per second to 50,000 via the rx-usecs-high parameter. + +Performance Optimization +======================== +Driver defaults are meant to fit a wide variety of workloads, but if further +optimization is required we recommend experimenting with the following settings. + +NOTE: For better performance when processing small (64B) frame sizes, try +enabling Hyper threading in the BIOS in order to increase the number of logical +cores in the system and subsequently increase the number of queues available to +the adapter. + +Virtualized Environments +------------------------ +1. Disable XPS on both ends by using the included virt_perf_default script +or by running the following command as root:: + + for file in `ls /sys/class/net/<ethX>/queues/tx-*/xps_cpus`; + do echo 0 > $file; done + +2. Using the appropriate mechanism (vcpupin) in the vm, pin the cpu's to +individual lcpu's, making sure to use a set of cpu's included in the +device's local_cpulist: /sys/class/net/<ethX>/device/local_cpulist. + +3. Configure as many Rx/Tx queues in the VM as available. Do not rely on +the default setting of 1. + + +Non-virtualized Environments +---------------------------- +Pin the adapter's IRQs to specific cores by disabling the irqbalance service +and using the included set_irq_affinity script. Please see the script's help +text for further options. + +- The following settings will distribute the IRQs across all the cores evenly:: + + # scripts/set_irq_affinity -x all <interface1> , [ <interface2>, ... ] + +- The following settings will distribute the IRQs across all the cores that are + local to the adapter (same NUMA node):: + + # scripts/set_irq_affinity -x local <interface1> ,[ <interface2>, ... ] + +For very CPU intensive workloads, we recommend pinning the IRQs to all cores. + +For IP Forwarding: Disable Adaptive ITR and lower Rx and Tx interrupts per +queue using ethtool. + +- Setting rx-usecs and tx-usecs to 125 will limit interrupts to about 8000 + interrupts per second per queue. + +:: + + # ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 125 \ + tx-usecs 125 + +For lower CPU utilization: Disable Adaptive ITR and lower Rx and Tx interrupts +per queue using ethtool. + +- Setting rx-usecs and tx-usecs to 250 will limit interrupts to about 4000 + interrupts per second per queue. + +:: + + # ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 250 \ + tx-usecs 250 + +For lower latency: Disable Adaptive ITR and ITR by setting Rx and Tx to 0 using +ethtool. + +:: + + # ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 0 \ + tx-usecs 0 + +Application Device Queues (ADq) +------------------------------- +Application Device Queues (ADq) allows you to dedicate one or more queues to a +specific application. This can reduce latency for the specified application, +and allow Tx traffic to be rate limited per application. Follow the steps below +to set ADq. + +1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface. +The shaper bw_rlimit parameter is optional. + +Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set +to 1Gbit for tc0 and 3Gbit for tc1. + +:: + + # tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1 + queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit + max_rate 1Gbit 3Gbit + +map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1 +sets priorities 0-3 to use tc0 and 4-7 to use tc1) + +queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns +16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total +number of queues for all tcs is 64 or number of cores, whichever is lower.) + +hw 1 mode channel: ‘channel’ with ‘hw’ set to 1 is a new new hardware +offload mode in mqprio that makes full use of the mqprio options, the +TCs, the queue configurations, and the QoS parameters. + +shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates. +Totals must be equal or less than port speed. + +For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network +monitoring tools such as ifstat or sar –n DEV [interval] [number of samples] + +2. Enable HW TC offload on interface:: + + # ethtool -K <interface> hw-tc-offload on + +3. Apply TCs to ingress (RX) flow of interface:: + + # tc qdisc add dev <interface> ingress + +NOTES: + - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory. + - ADq is not compatible with cloud filters. + - Setting up channels via ethtool (ethtool -L) is not supported when the + TCs are configured using mqprio. + - You must have iproute2 latest version + - NVM version 6.01 or later is required. + - ADq cannot be enabled when any the following features are enabled: Data + Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband + Filters. + - If another driver (for example, DPDK) has set cloud filters, you cannot + enable ADq. + - Tunnel filters are not supported in ADq. If encapsulated packets do + arrive in non-tunnel mode, filtering will be done on the inner headers. + For example, for VXLAN traffic in non-tunnel mode, PCTYPE is identified + as a VXLAN encapsulated packet, outer headers are ignored. Therefore, + inner headers are matched. + - If a TC filter on a PF matches traffic over a VF (on the PF), that + traffic will be routed to the appropriate queue of the PF, and will + not be passed on the VF. Such traffic will end up getting dropped higher + up in the TCP/IP stack as it does not match PF address data. + - If traffic matches multiple TC filters that point to different TCs, + that traffic will be duplicated and sent to all matching TC queues. + The hardware switch mirrors the packet to a VSI list when multiple + filters are matched. + + +Known Issues/Troubleshooting +============================ + +NOTE: 1 Gb devices based on the Intel(R) Ethernet Network Connection X722 do +not support the following features: + + * Data Center Bridging (DCB) + * QOS + * VMQ + * SR-IOV + * Task Encapsulation offload (VXLAN, NVGRE) + * Energy Efficient Ethernet (EEE) + * Auto-media detect + +Unexpected Issues when the device driver and DPDK share a device +---------------------------------------------------------------- +Unexpected issues may result when an i40e device is in multi driver mode and +the kernel driver and DPDK driver are sharing the device. This is because +access to the global NIC resources is not synchronized between multiple +drivers. Any change to the global NIC configuration (writing to a global +register, setting global configuration by AQ, or changing switch modes) will +affect all ports and drivers on the device. Loading DPDK with the +"multi-driver" module parameter may mitigate some of the issues. + +TC0 must be enabled when setting up DCB on a switch +--------------------------------------------------- +The kernel assumes that TC0 is available, and will disable Priority Flow +Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is +enabled when setting up DCB on your switch. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/i40e.txt b/Documentation/networking/i40e.txt deleted file mode 100644 index c2d6e1824b29..000000000000 --- a/Documentation/networking/i40e.txt +++ /dev/null @@ -1,190 +0,0 @@ -Linux Base Driver for the Intel(R) Ethernet Controller XL710 Family -=================================================================== - -Intel i40e Linux driver. -Copyright(c) 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Additional Configurations -- Performance Tuning -- Known Issues -- Support - - -Identifying Your Adapter -======================== - -The driver in this release is compatible with the Intel Ethernet -Controller XL710 Family. - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/network/sb/CS-012904.htm - - -Enabling the driver -=================== - -The driver is enabled via the standard kernel configuration system, -using the make command: - - make config/oldconfig/menuconfig/etc. - -The driver is located in the menu structure at: - - -> Device Drivers - -> Network device support (NETDEVICES [=y]) - -> Ethernet driver support - -> Intel devices - -> Intel(R) Ethernet Controller XL710 Family - -Additional Configurations -========================= - - Generic Receive Offload (GRO) - ----------------------------- - The driver supports the in-kernel software implementation of GRO. GRO has - shown that by coalescing Rx traffic into larger chunks of data, CPU - utilization can be significantly reduced when under large Rx load. GRO is - an evolution of the previously-used LRO interface. GRO is able to coalesce - other protocols besides TCP. It's also safe to use with configurations that - are problematic for LRO, namely bridging and iSCSI. - - Ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The latest - ethtool version is required for this functionality. - - The latest release of ethtool can be found from - https://www.kernel.org/pub/software/network/ethtool - - - Flow Director n-ntuple traffic filters (FDir) - --------------------------------------------- - The driver utilizes the ethtool interface for configuring ntuple filters, - via "ethtool -N <device> <filter>". - - The sctp4, ip4, udp4, and tcp4 flow types are supported with the standard - fields including src-ip, dst-ip, src-port and dst-port. The driver only - supports fully enabling or fully masking the fields, so use of the mask - fields for partial matches is not supported. - - Additionally, the driver supports using the action to specify filters for a - Virtual Function. You can specify the action as a 64bit value, where the - lower 32 bits represents the queue number, while the next 8 bits represent - which VF. Note that 0 is the PF, so the VF identifier is offset by 1. For - example: - - ... action 0x800000002 ... - - Would indicate to direct traffic for Virtual Function 7 (8 minus 1) on queue - 2 of that VF. - - The driver also supports using the user-defined field to specify 2 bytes of - arbitrary data to match within the packet payload in addition to the regular - fields. The data is specified in the lower 32bits of the user-def field in - the following way: - - +----------------------------+---------------------------+ - | 31 28 24 20 16 | 15 12 8 4 0| - +----------------------------+---------------------------+ - | offset into packet payload | 2 bytes of flexible data | - +----------------------------+---------------------------+ - - As an example, - - ... user-def 0x4FFFF .... - - means to match the value 0xFFFF 4 bytes into the packet payload. Note that - the offset is based on the beginning of the payload, and not the beginning - of the packet. Thus - - flow-type tcp4 ... user-def 0x8BEAF .... - - would match TCP/IPv4 packets which have the value 0xBEAF 8bytes into the - TCP/IPv4 payload. - - For ICMP, the hardware parses the ICMP header as 4 bytes of header and 4 - bytes of payload, so if you want to match an ICMP frames payload you may need - to add 4 to the offset in order to match the data. - - Furthermore, the offset can only be up to a value of 64, as the hardware - will only read up to 64 bytes of data from the payload. It must also be even - as the flexible data is 2 bytes long and must be aligned to byte 0 of the - packet payload. - - When programming filters, the hardware is limited to using a single input - set for each flow type. This means that it is an error to program two - different filters with the same type that don't match on the same fields. - Thus the second of the following two commands will fail: - - ethtool -N <device> flow-type tcp4 src-ip 192.168.0.7 action 5 - ethtool -N <device> flow-type tcp4 dst-ip 192.168.15.18 action 1 - - This is because the first filter will be accepted and reprogram the input - set for TCPv4 filters, but the second filter will be unable to reprogram the - input set until all the conflicting TCPv4 filters are first removed. - - Note that the user-defined flexible offset is also considered part of the - input set and cannot be programmed separately for multiple filters of the - same type. However, the flexible data is not part of the input set and - multiple filters may use the same offset but match against different data. - - Data Center Bridging (DCB) - -------------------------- - DCB configuration is not currently supported. - - FCoE - ---- - The driver supports Fiber Channel over Ethernet (FCoE) and Data Center - Bridging (DCB) functionality. Configuring DCB and FCoE is outside the scope - of this driver doc. Refer to http://www.open-fcoe.org/ for FCoE project - information and http://www.open-lldp.org/ or email list - e1000-eedc@lists.sourceforge.net for DCB information. - - MAC and VLAN anti-spoofing feature - ---------------------------------- - When a malicious driver attempts to send a spoofed packet, it is dropped by - the hardware and not transmitted. An interrupt is sent to the PF driver - notifying it of the spoof attempt. - - When a spoofed packet is detected the PF driver will send the following - message to the system log (displayed by the "dmesg" command): - - Spoof event(s) detected on VF (n) - - Where n=the VF that attempted to do the spoofing. - - -Performance Tuning -================== - -An excellent article on performance tuning can be found at: - -http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf - - -Known Issues -============ - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://e1000.sourceforge.net - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sourceforge.net and copy -netdev@vger.kernel.org. diff --git a/Documentation/networking/i40evf.txt b/Documentation/networking/i40evf.txt deleted file mode 100644 index e9b3035b95d0..000000000000 --- a/Documentation/networking/i40evf.txt +++ /dev/null @@ -1,54 +0,0 @@ -Linux* Base Driver for Intel(R) Network Connection -================================================== - -Intel Ethernet Adaptive Virtual Function Linux driver. -Copyright(c) 2013-2017 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Known Issues/Troubleshooting -- Support - -This file describes the i40evf Linux* Base Driver. - -The i40evf driver supports the below mentioned virtual function -devices and can only be activated on kernels running the i40e or -newer Physical Function (PF) driver compiled with CONFIG_PCI_IOV. -The i40evf driver requires CONFIG_PCI_MSI to be enabled. - -The guest OS loading the i40evf driver must support MSI-X interrupts. - -Supported Hardware -================== -Intel XL710 X710 Virtual Function -Intel Ethernet Adaptive Virtual Function -Intel X722 Virtual Function - -Identifying Your Adapter -======================== - -For more information on how to identify your adapter, go to the -Adapter & Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -Known Issues/Troubleshooting -============================ - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/iavf.rst b/Documentation/networking/iavf.rst new file mode 100644 index 000000000000..f8b42b64eb28 --- /dev/null +++ b/Documentation/networking/iavf.rst @@ -0,0 +1,281 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for Intel(R) Ethernet Adaptive Virtual Function +================================================================== + +Intel Ethernet Adaptive Virtual Function Linux driver. +Copyright(c) 2013-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Additional Configurations +- Known Issues/Troubleshooting +- Support + +This file describes the iavf Linux* Base Driver. This driver was formerly +called i40evf. + +The iavf driver supports the below mentioned virtual function devices and +can only be activated on kernels running the i40e or newer Physical Function +(PF) driver compiled with CONFIG_PCI_IOV. The iavf driver requires +CONFIG_PCI_MSI to be enabled. + +The guest OS loading the iavf driver must support MSI-X interrupts. + +Identifying Your Adapter +======================== +The driver in this kernel is compatible with devices based on the following: + * Intel(R) XL710 X710 Virtual Function + * Intel(R) X722 Virtual Function + * Intel(R) XXV710 Virtual Function + * Intel(R) Ethernet Adaptive Virtual Function + +For the best performance, make sure the latest NVM/FW is installed on your +device. + +For information on how to identify your adapter, and for the latest NVM/FW +images and Intel network drivers, refer to the Intel Support website: +http://www.intel.com/support + + +Additional Features and Configurations +====================================== + +Viewing Link Messages +--------------------- +Link messages will not be displayed to the console if the distribution is +restricting system messages. In order to see network driver link messages on +your console, set dmesg to eight by entering the following:: + + dmesg -n 8 + +NOTE: This setting is not saved across reboots. + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: +https://www.kernel.org/pub/software/network/ethtool/ + +Setting VLAN Tag Stripping +-------------------------- +If you have applications that require Virtual Functions (VFs) to receive +packets with VLAN tags, you can disable VLAN tag stripping for the VF. The +Physical Function (PF) processes requests issued from the VF to enable or +disable VLAN tag stripping. Note that if the PF has assigned a VLAN to a VF, +then requests from that VF to set VLAN tag stripping will be ignored. + +To enable/disable VLAN tag stripping for a VF, issue the following command +from inside the VM in which you are running the VF:: + + ethtool -K <if_name> rxvlan on/off + +or alternatively:: + + ethtool --offload <if_name> rxvlan on/off + +Adaptive Virtual Function +------------------------- +Adaptive Virtual Function (AVF) allows the virtual function driver, or VF, to +adapt to changing feature sets of the physical function driver (PF) with which +it is associated. This allows system administrators to update a PF without +having to update all the VFs associated with it. All AVFs have a single common +device ID and branding string. + +AVFs have a minimum set of features known as "base mode," but may provide +additional features depending on what features are available in the PF with +which the AVF is associated. The following are base mode features: + +- 4 Queue Pairs (QP) and associated Configuration Status Registers (CSRs) + for Tx/Rx. +- i40e descriptors and ring format. +- Descriptor write-back completion. +- 1 control queue, with i40e descriptors, CSRs and ring format. +- 5 MSI-X interrupt vectors and corresponding i40e CSRs. +- 1 Interrupt Throttle Rate (ITR) index. +- 1 Virtual Station Interface (VSI) per VF. +- 1 Traffic Class (TC), TC0 +- Receive Side Scaling (RSS) with 64 entry indirection table and key, + configured through the PF. +- 1 unicast MAC address reserved per VF. +- 16 MAC address filters for each VF. +- Stateless offloads - non-tunneled checksums. +- AVF device ID. +- HW mailbox is used for VF to PF communications (including on Windows). + +IEEE 802.1ad (QinQ) Support +--------------------------- +The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN +IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as +"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks +allow L2 tunneling and the ability to segregate traffic within a particular +VLAN ID, among other uses. + +The following are examples of how to configure 802.1ad (QinQ):: + + ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24 + ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371 + +Where "24" and "371" are example VLAN IDs. + +NOTES: + Receive checksum offloads, cloud filters, and VLAN acceleration are not + supported for 802.1ad (QinQ) packets. + +Application Device Queues (ADq) +------------------------------- +Application Device Queues (ADq) allows you to dedicate one or more queues to a +specific application. This can reduce latency for the specified application, +and allow Tx traffic to be rate limited per application. Follow the steps below +to set ADq. + +1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface. +The shaper bw_rlimit parameter is optional. + +Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set +to 1Gbit for tc0 and 3Gbit for tc1. + +:: + + # tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1 + queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit + max_rate 1Gbit 3Gbit + +map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1 +sets priorities 0-3 to use tc0 and 4-7 to use tc1) + +queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns +16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total +number of queues for all tcs is 64 or number of cores, whichever is lower.) + +hw 1 mode channel: ‘channel’ with ‘hw’ set to 1 is a new new hardware +offload mode in mqprio that makes full use of the mqprio options, the +TCs, the queue configurations, and the QoS parameters. + +shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates. +Totals must be equal or less than port speed. + +For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network +monitoring tools such as ifstat or sar –n DEV [interval] [number of samples] + +2. Enable HW TC offload on interface:: + + # ethtool -K <interface> hw-tc-offload on + +3. Apply TCs to ingress (RX) flow of interface:: + + # tc qdisc add dev <interface> ingress + +NOTES: + - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory. + - ADq is not compatible with cloud filters. + - Setting up channels via ethtool (ethtool -L) is not supported when the TCs + are configured using mqprio. + - You must have iproute2 latest version + - NVM version 6.01 or later is required. + - ADq cannot be enabled when any the following features are enabled: Data + Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters. + - If another driver (for example, DPDK) has set cloud filters, you cannot + enable ADq. + - Tunnel filters are not supported in ADq. If encapsulated packets do arrive + in non-tunnel mode, filtering will be done on the inner headers. For example, + for VXLAN traffic in non-tunnel mode, PCTYPE is identified as a VXLAN + encapsulated packet, outer headers are ignored. Therefore, inner headers are + matched. + - If a TC filter on a PF matches traffic over a VF (on the PF), that traffic + will be routed to the appropriate queue of the PF, and will not be passed on + the VF. Such traffic will end up getting dropped higher up in the TCP/IP + stack as it does not match PF address data. + - If traffic matches multiple TC filters that point to different TCs, that + traffic will be duplicated and sent to all matching TC queues. The hardware + switch mirrors the packet to a VSI list when multiple filters are matched. + + +Known Issues/Troubleshooting +============================ + +Traffic Is Not Being Passed Between VM and Client +------------------------------------------------- +You may not be able to pass traffic between a client system and a +Virtual Machine (VM) running on a separate host if the Virtual Function +(VF, or Virtual NIC) is not in trusted mode and spoof checking is enabled +on the VF. Note that this situation can occur in any combination of client, +host, and guest operating system. For information on how to set the VF to +trusted mode, refer to the section "VLAN Tag Packet Steering" in this +readme document. For information on setting spoof checking, refer to the +section "MAC and VLAN anti-spoofing feature" in this readme document. + +Do not unload port driver if VF with active VM is bound to it +------------------------------------------------------------- +Do not unload a port's driver if a Virtual Function (VF) with an active Virtual +Machine (VM) is bound to it. Doing so will cause the port to appear to hang. +Once the VM shuts down, or otherwise releases the VF, the command will complete. + +Virtual machine does not get link +--------------------------------- +If the virtual machine has more than one virtual port assigned to it, and those +virtual ports are bound to different physical ports, you may not get link on +all of the virtual ports. The following command may work around the issue:: + + ethtool -r <PF> + +Where <PF> is the PF interface in the host, for example: p5p1. You may need to +run the command more than once to get link on all virtual ports. + +MAC address of Virtual Function changes unexpectedly +---------------------------------------------------- +If a Virtual Function's MAC address is not assigned in the host, then the VF +(virtual function) driver will use a random MAC address. This random MAC +address may change each time the VF driver is reloaded. You can assign a static +MAC address in the host machine. This static MAC address will survive +a VF driver reload. + +Driver Buffer Overflow Fix +-------------------------- +The fix to resolve CVE-2016-8105, referenced in Intel SA-00069 +https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00069.html +is included in this and future versions of the driver. + +Multiple Interfaces on Same Ethernet Broadcast Network +------------------------------------------------------ +Due to the default ARP behavior on Linux, it is not possible to have one system +on two IP networks in the same Ethernet broadcast domain (non-partitioned +switch) behave as expected. All Ethernet interfaces will respond to IP traffic +for any IP address assigned to the system. This results in unbalanced receive +traffic. + +If you have multiple interfaces in a server, either turn on ARP filtering by +entering:: + + echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter + +NOTE: This setting is not saved across reboots. The configuration change can be +made permanent by adding the following line to the file /etc/sysctl.conf:: + + net.ipv4.conf.all.arp_filter = 1 + +Another alternative is to install the interfaces in separate broadcast domains +(either in different switches or in a switch partitioned to VLANs). + +Rx Page Allocation Errors +------------------------- +'Page allocation failure. order:0' errors may occur under stress. +This is caused by the way the Linux kernel reports this stressed condition. + + +Support +======= +For general information, go to the Intel support website at: + +https://support.intel.com + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on the supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net diff --git a/Documentation/networking/ice.rst b/Documentation/networking/ice.rst new file mode 100644 index 000000000000..4d118b827bbb --- /dev/null +++ b/Documentation/networking/ice.rst @@ -0,0 +1,45 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for the Intel(R) Ethernet Connection E800 Series +=================================================================== + +Intel ice Linux driver. +Copyright(c) 2018 Intel Corporation. + +Contents +======== + +- Enabling the driver +- Support + +The driver in this release supports Intel's E800 Series of products. For +more information, visit Intel's support page at https://support.intel.com. + +Enabling the driver +=================== +The driver is enabled via the standard kernel configuration system, +using the make command:: + + make oldconfig/menuconfig/etc. + +The driver is located in the menu structure at: + + -> Device Drivers + -> Network device support (NETDEVICES [=y]) + -> Ethernet driver support + -> Intel devices + -> Intel(R) Ethernet Connection E800 Series Support + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/ice.txt b/Documentation/networking/ice.txt deleted file mode 100644 index 6261c46378e1..000000000000 --- a/Documentation/networking/ice.txt +++ /dev/null @@ -1,39 +0,0 @@ -Intel(R) Ethernet Connection E800 Series Linux Driver -=================================================================== - -Intel ice Linux driver. -Copyright(c) 2018 Intel Corporation. - -Contents -======== -- Enabling the driver -- Support - -The driver in this release supports Intel's E800 Series of products. For -more information, visit Intel's support page at http://support.intel.com. - -Enabling the driver -=================== - -The driver is enabled via the standard kernel configuration system, -using the make command: - - Make oldconfig/silentoldconfig/menuconfig/etc. - -The driver is located in the menu structure at: - - -> Device Drivers - -> Network device support (NETDEVICES [=y]) - -> Ethernet driver support - -> Intel devices - -> Intel(R) Ethernet Connection E800 Series Support - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -If an issue is identified with the released source code, please email -the maintainer listed in the MAINTAINERS file. diff --git a/Documentation/networking/igb.rst b/Documentation/networking/igb.rst new file mode 100644 index 000000000000..ba16b86d5593 --- /dev/null +++ b/Documentation/networking/igb.rst @@ -0,0 +1,193 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for Intel(R) Ethernet Network Connection +=========================================================== + +Intel Gigabit Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Command Line Parameters +- Additional Configurations +- Support + + +Identifying Your Adapter +======================== +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +http://www.intel.com/support + + +Command Line Parameters +======================== +If the driver is built as a module, the following optional parameters are used +by entering them on the command line with the modprobe command using this +syntax:: + + modprobe igb [<option>=<VAL1>,<VAL2>,...] + +There needs to be a <VAL#> for each network port in the system supported by +this driver. The values will be applied to each instance, in function order. +For example:: + + modprobe igb max_vfs=2,4 + +In this case, there are two network ports supported by igb in the system. + +NOTE: A descriptor describes a data buffer and attributes related to the data +buffer. This information is accessed by the hardware. + +max_vfs +------- +:Valid Range: 0-7 + +This parameter adds support for SR-IOV. It causes the driver to spawn up to +max_vfs worth of virtual functions. If the value is greater than 0 it will +also force the VMDq parameter to be 1 or more. + +The parameters for the driver are referenced by position. Thus, if you have a +dual port adapter, or more than one adapter in your system, and want N virtual +functions per port, you must specify a number for each port with each parameter +separated by a comma. For example:: + + modprobe igb max_vfs=4 + +This will spawn 4 VFs on the first port. + +:: + + modprobe igb max_vfs=2,4 + +This will spawn 2 VFs on the first port and 4 VFs on the second port. + +NOTE: Caution must be used in loading the driver with these parameters. +Depending on your system configuration, number of slots, etc., it is impossible +to predict in all cases where the positions would be on the command line. + +NOTE: Neither the device nor the driver control how VFs are mapped into config +space. Bus layout will vary by operating system. On operating systems that +support it, you can check sysfs to find the mapping. + +NOTE: When either SR-IOV mode or VMDq mode is enabled, hardware VLAN filtering +and VLAN tag stripping/insertion will remain enabled. Please remove the old +VLAN filter before the new VLAN filter is added. For example:: + + ip link set eth0 vf 0 vlan 100 // set vlan 100 for VF 0 + ip link set eth0 vf 0 vlan 0 // Delete vlan 100 + ip link set eth0 vf 0 vlan 200 // set a new vlan 200 for VF 0 + +Debug +----- +:Valid Range: 0-16 (0=none,...,16=all) +:Default Value: 0 + +This parameter adjusts the level debug messages displayed in the system logs. + + +Additional Features and Configurations +====================================== + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file: + +- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x> +- For SLES: /etc/sysconfig/network/<config_file> + +NOTE: The maximum MTU setting for Jumbo Frames is 9216. This value coincides +with the maximum Jumbo Frames size of 9234 bytes. + +NOTE: Using Jumbo frames at 10 or 100 Mbps is not supported and may result in +poor performance or loss of link. + + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: + +https://www.kernel.org/pub/software/network/ethtool/ + + +Enabling Wake on LAN* (WoL) +--------------------------- +WoL is configured through the ethtool* utility. + +WoL will be enabled on the system during the next shut down or reboot. For +this driver version, in order to enable WoL, the igb driver must be loaded +prior to shutting down or suspending the system. + +NOTE: Wake on LAN is only supported on port A of multi-port devices. Also +Wake On LAN is not supported for the following device: +- Intel(R) Gigabit VT Quad Port Server Adapter + + +Multiqueue +---------- +In this mode, a separate MSI-X vector is allocated for each queue and one for +"other" interrupts such as link status change and errors. All interrupts are +throttled via interrupt moderation. Interrupt moderation must be used to avoid +interrupt storms while the driver is processing one interrupt. The moderation +value should be at least as large as the expected time for the driver to +process an interrupt. Multiqueue is off by default. + +REQUIREMENTS: MSI-X support is required for Multiqueue. If MSI-X is not found, +the system will fallback to MSI or to Legacy interrupts. This driver supports +receive multiqueue on all kernels that support MSI-X. + +NOTE: On some kernels a reboot is required to switch between single queue mode +and multiqueue mode or vice-versa. + + +MAC and VLAN anti-spoofing feature +---------------------------------- +When a malicious driver attempts to send a spoofed packet, it is dropped by the +hardware and not transmitted. + +An interrupt is sent to the PF driver notifying it of the spoof attempt. When a +spoofed packet is detected, the PF driver will send the following message to +the system log (displayed by the "dmesg" command): +Spoof event(s) detected on VF(n), where n = the VF that attempted to do the +spoofing + + +Setting MAC Address, VLAN and Rate Limit Using IProute2 Tool +------------------------------------------------------------ +You can set a MAC address of a Virtual Function (VF), a default VLAN and the +rate limit using the IProute2 tool. Download the latest version of the +IProute2 tool from Sourceforge if your version does not have all the features +you require. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/igb.txt b/Documentation/networking/igb.txt deleted file mode 100644 index f90643ef39c9..000000000000 --- a/Documentation/networking/igb.txt +++ /dev/null @@ -1,129 +0,0 @@ -Linux* Base Driver for Intel(R) Ethernet Network Connection -=========================================================== - -Intel Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Additional Configurations -- Support - -Identifying Your Adapter -======================== - -This driver supports all 82575, 82576 and 82580-based Intel (R) gigabit network -connections. - -For specific information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -Command Line Parameters -======================= - -The default value for each parameter is generally the recommended setting, -unless otherwise noted. - -max_vfs -------- -Valid Range: 0-7 -Default Value: 0 - -This parameter adds support for SR-IOV. It causes the driver to spawn up to -max_vfs worth of virtual function. - -Additional Configurations -========================= - - Jumbo Frames - ------------ - Jumbo Frames support is enabled by changing the MTU to a value larger than - the default of 1500. Use the ip command to increase the MTU size. - For example: - - ip link set dev eth<x> mtu 9000 - - This setting is not saved across reboots. - - Notes: - - - The maximum MTU setting for Jumbo Frames is 9216. This value coincides - with the maximum Jumbo Frames size of 9234 bytes. - - - Using Jumbo frames at 10 or 100 Mbps is not supported and may result in - poor performance or loss of link. - - ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The latest - version of ethtool can be found at: - - https://www.kernel.org/pub/software/network/ethtool/ - - Enabling Wake on LAN* (WoL) - --------------------------- - WoL is configured through the ethtool* utility. - - For instructions on enabling WoL with ethtool, refer to the ethtool man page. - - WoL will be enabled on the system during the next shut down or reboot. - For this driver version, in order to enable WoL, the igb driver must be - loaded when shutting down or rebooting the system. - - Wake On LAN is only supported on port A of multi-port adapters. - - Wake On LAN is not supported for the Intel(R) Gigabit VT Quad Port Server - Adapter. - - Multiqueue - ---------- - In this mode, a separate MSI-X vector is allocated for each queue and one - for "other" interrupts such as link status change and errors. All - interrupts are throttled via interrupt moderation. Interrupt moderation - must be used to avoid interrupt storms while the driver is processing one - interrupt. The moderation value should be at least as large as the expected - time for the driver to process an interrupt. Multiqueue is off by default. - - REQUIREMENTS: MSI-X support is required for Multiqueue. If MSI-X is not - found, the system will fallback to MSI or to Legacy interrupts. - - MAC and VLAN anti-spoofing feature - ---------------------------------- - When a malicious driver attempts to send a spoofed packet, it is dropped by - the hardware and not transmitted. An interrupt is sent to the PF driver - notifying it of the spoof attempt. - - When a spoofed packet is detected the PF driver will send the following - message to the system log (displayed by the "dmesg" command): - - Spoof event(s) detected on VF(n) - - Where n=the VF that attempted to do the spoofing. - - Setting MAC Address, VLAN and Rate Limit Using IProute2 Tool - ------------------------------------------------------------ - You can set a MAC address of a Virtual Function (VF), a default VLAN and the - rate limit using the IProute2 tool. Download the latest version of the - iproute2 tool from Sourceforge if your version does not have all the - features you require. - - -Support -======= - -For general information, go to the Intel support website at: - - www.intel.com/support/ - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/igbvf.rst b/Documentation/networking/igbvf.rst new file mode 100644 index 000000000000..a8a9ffa4f8d3 --- /dev/null +++ b/Documentation/networking/igbvf.rst @@ -0,0 +1,64 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Virtual Function Driver for Intel(R) 1G Ethernet +============================================================ + +Intel Gigabit Virtual Function Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== +- Identifying Your Adapter +- Additional Configurations +- Support + +This driver supports Intel 82576-based virtual function devices-based virtual +function devices that can only be activated on kernels that support SR-IOV. + +SR-IOV requires the correct platform and OS support. + +The guest OS loading this driver must support MSI-X interrupts. + +For questions related to hardware requirements, refer to the documentation +supplied with your Intel adapter. All hardware requirements listed apply to use +with Linux. + +Driver information can be obtained using ethtool, lspci, and ifconfig. +Instructions on updating ethtool can be found in the section Additional +Configurations later in this document. + +NOTE: There is a limit of a total of 32 shared VLANs to 1 or more VFs. + + +Identifying Your Adapter +======================== +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +http://www.intel.com/support + + +Additional Features and Configurations +====================================== + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: + +https://www.kernel.org/pub/software/network/ethtool/ + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/igbvf.txt b/Documentation/networking/igbvf.txt deleted file mode 100644 index bd404735fb46..000000000000 --- a/Documentation/networking/igbvf.txt +++ /dev/null @@ -1,80 +0,0 @@ -Linux* Base Driver for Intel(R) Ethernet Network Connection -=========================================================== - -Intel Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Additional Configurations -- Support - -This file describes the igbvf Linux* Base Driver for Intel Network Connection. - -The igbvf driver supports 82576-based virtual function devices that can only -be activated on kernels that support SR-IOV. SR-IOV requires the correct -platform and OS support. - -The igbvf driver requires the igb driver, version 2.0 or later. The igbvf -driver supports virtual functions generated by the igb driver with a max_vfs -value of 1 or greater. For more information on the max_vfs parameter refer -to the README included with the igb driver. - -The guest OS loading the igbvf driver must support MSI-X interrupts. - -This driver is only supported as a loadable module at this time. Intel is -not supplying patches against the kernel source to allow for static linking -of the driver. For questions related to hardware requirements, refer to the -documentation supplied with your Intel Gigabit adapter. All hardware -requirements listed apply to use with Linux. - -Instructions on updating ethtool can be found in the section "Additional -Configurations" later in this document. - -VLANs: There is a limit of a total of 32 shared VLANs to 1 or more VFs. - -Identifying Your Adapter -======================== - -The igbvf driver supports 82576-based virtual function devices that can only -be activated on kernels that support SR-IOV. - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -For the latest Intel network drivers for Linux, refer to the following -website. In the search field, enter your adapter name or type, or use the -networking link on the left to search for your adapter: - - http://downloadcenter.intel.com/scripts-df-external/Support_Intel.aspx - -Additional Configurations -========================= - - ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The ethtool - version 3.0 or later is required for this functionality, although we - strongly recommend downloading the latest version at: - - https://www.kernel.org/pub/software/network/ethtool/ - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index fcd710f2cc7a..bd89dae8d578 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -14,6 +14,16 @@ Contents: dpaa2/index e100 e1000 + e1000e + fm10k + igb + igbvf + ixgb + ixgbe + ixgbevf + i40e + iavf + ice kapi z8530book msg_zerocopy diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 8313a636dd53..32b21571adfe 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -316,6 +316,17 @@ tcp_frto - INTEGER By default it's enabled with a non-zero value. 0 disables F-RTO. +tcp_fwmark_accept - BOOLEAN + If set, incoming connections to listening sockets that do not have a + socket mark will set the mark of the accepting socket to the fwmark of + the incoming SYN packet. This will cause all packets on that connection + (starting from the first SYNACK) to be sent with that fwmark. The + listening socket's mark is unchanged. Listening sockets that already + have a fwmark set via setsockopt(SOL_SOCKET, SO_MARK, ...) are + unaffected. + + Default: 0 + tcp_invalid_ratelimit - INTEGER Limit the maximal rate for sending duplicate acknowledgments in response to incoming TCP packets that are for an existing @@ -425,7 +436,7 @@ tcp_mtu_probing - INTEGER 1 - Disabled by default, enabled when an ICMP black hole detected 2 - Always enabled, use initial MSS of tcp_base_mss. -tcp_probe_interval - INTEGER +tcp_probe_interval - UNSIGNED INTEGER Controls how often to start TCP Packetization-Layer Path MTU Discovery reprobe. The default is reprobing every 10 minutes as per RFC4821. @@ -1442,6 +1453,14 @@ max_hbh_length - INTEGER header. Default: INT_MAX (unlimited) +skip_notify_on_dev_down - BOOLEAN + Controls whether an RTM_DELROUTE message is generated for routes + removed when a device is taken down or deleted. IPv4 does not + generate this message; IPv6 does by default. Setting this sysctl + to true skips the message, making IPv4 and IPv6 on par in relying + on userspace caches to track link events and evict routes. + Default: false (generate message) + IPv6 Fragmentation: ip6frag_high_thresh - INTEGER diff --git a/Documentation/networking/ixgb.rst b/Documentation/networking/ixgb.rst new file mode 100644 index 000000000000..8bd80e27843d --- /dev/null +++ b/Documentation/networking/ixgb.rst @@ -0,0 +1,467 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection +===================================================================== + +October 1, 2018 + + +Contents +======== + +- In This Release +- Identifying Your Adapter +- Command Line Parameters +- Improving Performance +- Additional Configurations +- Known Issues/Troubleshooting +- Support + + + +In This Release +=============== + +This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R) +Network Connection. This driver includes support for Itanium(R)2-based +systems. + +For questions related to hardware requirements, refer to the documentation +supplied with your 10 Gigabit adapter. All hardware requirements listed apply +to use with Linux. + +The following features are available in this kernel: + - Native VLANs + - Channel Bonding (teaming) + - SNMP + +Channel Bonding documentation can be found in the Linux kernel source: +/Documentation/networking/bonding.txt + +The driver information previously displayed in the /proc filesystem is not +supported in this release. Alternatively, you can use ethtool (version 1.6 +or later), lspci, and iproute2 to obtain the same information. + +Instructions on updating ethtool can be found in the section "Additional +Configurations" later in this document. + + +Identifying Your Adapter +======================== + +The following Intel network adapters are compatible with the drivers in this +release: + ++------------+------------------------------+----------------------------------+ +| Controller | Adapter Name | Physical Layer | ++============+==============================+==================================+ +| 82597EX | Intel(R) PRO/10GbE LR/SR/CX4 | - 10G Base-LR (fiber) | +| | Server Adapters | - 10G Base-SR (fiber) | +| | | - 10G Base-CX4 (copper) | ++------------+------------------------------+----------------------------------+ + +For more information on how to identify your adapter, go to the Adapter & +Driver ID Guide at: + + https://support.intel.com + + +Command Line Parameters +======================= + +If the driver is built as a module, the following optional parameters are +used by entering them on the command line with the modprobe command using +this syntax:: + + modprobe ixgb [<option>=<VAL1>,<VAL2>,...] + +For example, with two 10GbE PCI adapters, entering:: + + modprobe ixgb TxDescriptors=80,128 + +loads the ixgb driver with 80 TX resources for the first adapter and 128 TX +resources for the second adapter. + +The default value for each parameter is generally the recommended setting, +unless otherwise noted. + +Copybreak +--------- +:Valid Range: 0-XXXX +:Default Value: 256 + + This is the maximum size of packet that is copied to a new buffer on + receive. + +Debug +----- +:Valid Range: 0-16 (0=none,...,16=all) +:Default Value: 0 + + This parameter adjusts the level of debug messages displayed in the + system logs. + +FlowControl +----------- +:Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx) +:Default Value: 1 if no EEPROM, otherwise read from EEPROM + + This parameter controls the automatic generation(Tx) and response(Rx) to + Ethernet PAUSE frames. There are hardware bugs associated with enabling + Tx flow control so beware. + +RxDescriptors +------------- +:Valid Range: 64-4096 +:Default Value: 1024 + + This value is the number of receive descriptors allocated by the driver. + Increasing this value allows the driver to buffer more incoming packets. + Each descriptor is 16 bytes. A receive buffer is also allocated for + each descriptor and can be either 2048, 4056, 8192, or 16384 bytes, + depending on the MTU setting. When the MTU size is 1500 or less, the + receive buffer size is 2048 bytes. When the MTU is greater than 1500 the + receive buffer size will be either 4056, 8192, or 16384 bytes. The + maximum MTU size is 16114. + +TxDescriptors +------------- +:Valid Range: 64-4096 +:Default Value: 256 + + This value is the number of transmit descriptors allocated by the driver. + Increasing this value allows the driver to queue more transmits. Each + descriptor is 16 bytes. + +RxIntDelay +---------- +:Valid Range: 0-65535 (0=off) +:Default Value: 72 + + This value delays the generation of receive interrupts in units of + 0.8192 microseconds. Receive interrupt reduction can improve CPU + efficiency if properly tuned for specific network traffic. Increasing + this value adds extra latency to frame reception and can end up + decreasing the throughput of TCP traffic. If the system is reporting + dropped receives, this value may be set too high, causing the driver to + run out of available receive descriptors. + +TxIntDelay +---------- +:Valid Range: 0-65535 (0=off) +:Default Value: 32 + + This value delays the generation of transmit interrupts in units of + 0.8192 microseconds. Transmit interrupt reduction can improve CPU + efficiency if properly tuned for specific network traffic. Increasing + this value adds extra latency to frame transmission and can end up + decreasing the throughput of TCP traffic. If this value is set too high, + it will cause the driver to run out of available transmit descriptors. + +XsumRX +------ +:Valid Range: 0-1 +:Default Value: 1 + + A value of '1' indicates that the driver should enable IP checksum + offload for received packets (both UDP and TCP) to the adapter hardware. + +RxFCHighThresh +-------------- +:Valid Range: 1,536-262,136 (0x600 - 0x3FFF8, 8 byte granularity) +:Default Value: 196,608 (0x30000) + + Receive Flow control high threshold (when we send a pause frame) + +RxFCLowThresh +------------- +:Valid Range: 64-262,136 (0x40 - 0x3FFF8, 8 byte granularity) +:Default Value: 163,840 (0x28000) + + Receive Flow control low threshold (when we send a resume frame) + +FCReqTimeout +------------ +:Valid Range: 1-65535 +:Default Value: 65535 + + Flow control request timeout (how long to pause the link partner's tx) + +IntDelayEnable +-------------- +:Value Range: 0,1 +:Default Value: 1 + + Interrupt Delay, 0 disables transmit interrupt delay and 1 enables it. + + +Improving Performance +===================== + +With the 10 Gigabit server adapters, the default Linux configuration will +very likely limit the total available throughput artificially. There is a set +of configuration changes that, when applied together, will increase the ability +of Linux to transmit and receive data. The following enhancements were +originally acquired from settings published at http://www.spec.org/web99/ for +various submitted results using Linux. + +NOTE: + These changes are only suggestions, and serve as a starting point for + tuning your network performance. + +The changes are made in three major ways, listed in order of greatest effect: + +- Use ip link to modify the mtu (maximum transmission unit) and the txqueuelen + parameter. +- Use sysctl to modify /proc parameters (essentially kernel tuning) +- Use setpci to modify the MMRBC field in PCI-X configuration space to increase + transmit burst lengths on the bus. + +NOTE: + setpci modifies the adapter's configuration registers to allow it to read + up to 4k bytes at a time (for transmits). However, for some systems the + behavior after modifying this register may be undefined (possibly errors of + some kind). A power-cycle, hard reset or explicitly setting the e6 register + back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a + stable configuration. + +- COPY these lines and paste them into ixgb_perf.sh: + +:: + + #!/bin/bash + echo "configuring network performance , edit this file to change the interface + or device ID of 10GbE card" + # set mmrbc to 4k reads, modify only Intel 10GbE device IDs + # replace 1a48 with appropriate 10GbE device's ID installed on the system, + # if needed. + setpci -d 8086:1a48 e6.b=2e + # set the MTU (max transmission unit) - it requires your switch and clients + # to change as well. + # set the txqueuelen + # your ixgb adapter should be loaded as eth1 for this to work, change if needed + ip li set dev eth1 mtu 9000 txqueuelen 1000 up + # call the sysctl utility to modify /proc/sys entries + sysctl -p ./sysctl_ixgb.conf + +- COPY these lines and paste them into sysctl_ixgb.conf: + +:: + + # some of the defaults may be different for your kernel + # call this file with sysctl -p <this file> + # these are just suggested values that worked well to increase throughput in + # several network benchmark tests, your mileage may vary + + ### IPV4 specific settings + # turn TCP timestamp support off, default 1, reduces CPU use + net.ipv4.tcp_timestamps = 0 + # turn SACK support off, default on + # on systems with a VERY fast bus -> memory interface this is the big gainer + net.ipv4.tcp_sack = 0 + # set min/default/max TCP read buffer, default 4096 87380 174760 + net.ipv4.tcp_rmem = 10000000 10000000 10000000 + # set min/pressure/max TCP write buffer, default 4096 16384 131072 + net.ipv4.tcp_wmem = 10000000 10000000 10000000 + # set min/pressure/max TCP buffer space, default 31744 32256 32768 + net.ipv4.tcp_mem = 10000000 10000000 10000000 + + ### CORE settings (mostly for socket and UDP effect) + # set maximum receive socket buffer size, default 131071 + net.core.rmem_max = 524287 + # set maximum send socket buffer size, default 131071 + net.core.wmem_max = 524287 + # set default receive socket buffer size, default 65535 + net.core.rmem_default = 524287 + # set default send socket buffer size, default 65535 + net.core.wmem_default = 524287 + # set maximum amount of option memory buffers, default 10240 + net.core.optmem_max = 524287 + # set number of unprocessed input packets before kernel starts dropping them; default 300 + net.core.netdev_max_backlog = 300000 + +Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface +your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's +ID installed on the system. + +NOTE: + Unless these scripts are added to the boot process, these changes will + only last only until the next system reboot. + + +Resolving Slow UDP Traffic +-------------------------- +If your server does not seem to be able to receive UDP traffic as fast as it +can receive TCP traffic, it could be because Linux, by default, does not set +the network stack buffers as large as they need to be to support high UDP +transfer rates. One way to alleviate this problem is to allow more memory to +be used by the IP stack to store incoming data. + +For instance, use the commands:: + + sysctl -w net.core.rmem_max=262143 + +and:: + + sysctl -w net.core.rmem_default=262143 + +to increase the read buffer memory max and default to 262143 (256k - 1) from +defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables +will increase the amount of memory used by the network stack for receives, and +can be increased significantly more if necessary for your application. + + +Additional Configurations +========================= + +Configuring the Driver on Different Distributions +------------------------------------------------- +Configuring a network driver to load properly when the system is started is +distribution dependent. Typically, the configuration process involves adding +an alias line to /etc/modprobe.conf as well as editing other system startup +scripts and/or configuration files. Many popular Linux distributions ship +with tools to make these changes for you. To learn the proper way to +configure a network device for your system, refer to your distribution +documentation. If during this process you are asked for the driver or module +name, the name for the Linux Base Driver for the Intel 10GbE Family of +Adapters is ixgb. + +Viewing Link Messages +--------------------- +Link messages will not be displayed to the console if the distribution is +restricting system messages. In order to see network driver link messages on +your console, set dmesg to eight by entering the following:: + + dmesg -n 8 + +NOTE: This setting is not saved across reboots. + +Jumbo Frames +------------ +The driver supports Jumbo Frames for all adapters. Jumbo Frames support is +enabled by changing the MTU to a value larger than the default of 1500. +The maximum value for the MTU is 16114. Use the ip command to +increase the MTU size. For example:: + + ip li set dev ethx mtu 9000 + +The maximum MTU setting for Jumbo Frames is 16114. This value coincides +with the maximum Jumbo Frames size of 16128. + +Ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The ethtool +version 1.6 or later is required for this functionality. + +The latest release of ethtool can be found from +https://www.kernel.org/pub/software/network/ethtool/ + +NOTE: + The ethtool version 1.6 only supports a limited set of ethtool options. + Support for a more complete ethtool feature set can be enabled by + upgrading to the latest version. + +NAPI +---- +NAPI (Rx polling mode) is supported in the ixgb driver. + +See https://wiki.linuxfoundation.org/networking/napi for more information on +NAPI. + + +Known Issues/Troubleshooting +============================ + +NOTE: + After installing the driver, if your Intel Network Connection is not + working, verify in the "In This Release" section of the readme that you have + installed the correct driver. + +Cable Interoperability Issue with Fujitsu XENPAK Module in SmartBits Chassis +---------------------------------------------------------------------------- +Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 +Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits +chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni. +The CRC errors may be received either by the Intel(R) PRO/10GbE CX4 +Server adapter or the SmartBits. If this situation occurs using a different +cable assembly may resolve the issue. + +Cable Interoperability Issues with HP Procurve 3400cl Switch Port +----------------------------------------------------------------- +Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server +adapter is connected to an HP Procurve 3400cl switch port using short cables +(1 m or shorter). If this situation occurs, using a longer cable may resolve +the issue. + +Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that +Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC +errors may be received either by the CX4 Server adapter or at the switch. If +this situation occurs, using a different cable assembly may resolve the issue. + +Jumbo Frames System Requirement +------------------------------- +Memory allocation failures have been observed on Linux systems with 64 MB +of RAM or less that are running Jumbo Frames. If you are using Jumbo +Frames, your system may require more than the advertised minimum +requirement of 64 MB of system memory. + +Performance Degradation with Jumbo Frames +----------------------------------------- +Degradation in throughput performance may be observed in some Jumbo frames +environments. If this is observed, increasing the application's socket buffer +size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help. +See the specific application manual and /usr/src/linux*/Documentation/ +networking/ip-sysctl.txt for more details. + +Allocating Rx Buffers when Using Jumbo Frames +--------------------------------------------- +Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if +the available memory is heavily fragmented. This issue may be seen with PCI-X +adapters or with packet split disabled. This can be reduced or eliminated +by changing the amount of available memory for receive buffer allocation, by +increasing /proc/sys/vm/min_free_kbytes. + +Multiple Interfaces on Same Ethernet Broadcast Network +------------------------------------------------------ +Due to the default ARP behavior on Linux, it is not possible to have +one system on two IP networks in the same Ethernet broadcast domain +(non-partitioned switch) behave as expected. All Ethernet interfaces +will respond to IP traffic for any IP address assigned to the system. +This results in unbalanced receive traffic. + +If you have multiple interfaces in a server, do either of the following: + + - Turn on ARP filtering by entering:: + + echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter + + - Install the interfaces in separate broadcast domains - either in + different switches or in a switch partitioned to VLANs. + +UDP Stress Test Dropped Packet Issue +-------------------------------------- +Under small packets UDP stress test with 10GbE driver, the Linux system +may drop UDP packets due to the fullness of socket buffers. You may want +to change the driver's Flow Control variables to the minimum value for +controlling packet reception. + +Tx Hangs Possible Under Stress +------------------------------ +Under stress conditions, if TX hangs occur, turning off TSO +"ethtool -K eth0 tso off" may resolve the problem. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net diff --git a/Documentation/networking/ixgb.txt b/Documentation/networking/ixgb.txt deleted file mode 100644 index 09f71d71920a..000000000000 --- a/Documentation/networking/ixgb.txt +++ /dev/null @@ -1,433 +0,0 @@ -Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection -===================================================================== - -March 14, 2011 - - -Contents -======== - -- In This Release -- Identifying Your Adapter -- Building and Installation -- Command Line Parameters -- Improving Performance -- Additional Configurations -- Known Issues/Troubleshooting -- Support - - - -In This Release -=============== - -This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R) -Network Connection. This driver includes support for Itanium(R)2-based -systems. - -For questions related to hardware requirements, refer to the documentation -supplied with your 10 Gigabit adapter. All hardware requirements listed apply -to use with Linux. - -The following features are available in this kernel: - - Native VLANs - - Channel Bonding (teaming) - - SNMP - -Channel Bonding documentation can be found in the Linux kernel source: -/Documentation/networking/bonding.txt - -The driver information previously displayed in the /proc filesystem is not -supported in this release. Alternatively, you can use ethtool (version 1.6 -or later), lspci, and iproute2 to obtain the same information. - -Instructions on updating ethtool can be found in the section "Additional -Configurations" later in this document. - - -Identifying Your Adapter -======================== - -The following Intel network adapters are compatible with the drivers in this -release: - -Controller Adapter Name Physical Layer ----------- ------------ -------------- -82597EX Intel(R) PRO/10GbE LR/SR/CX4 10G Base-LR (1310 nm optical fiber) - Server Adapters 10G Base-SR (850 nm optical fiber) - 10G Base-CX4(twin-axial copper cabling) - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/network/sb/CS-012904.htm - - -Building and Installation -========================= - -select m for "Intel(R) PRO/10GbE support" located at: - Location: - -> Device Drivers - -> Network device support (NETDEVICES [=y]) - -> Ethernet (10000 Mbit) (NETDEV_10000 [=y]) -1. make modules && make modules_install - -2. Load the module: - -    modprobe ixgb <parameter>=<value> - - The insmod command can be used if the full - path to the driver module is specified. For example: - - insmod /lib/modules/<KERNEL VERSION>/kernel/drivers/net/ixgb/ixgb.ko - - With 2.6 based kernels also make sure that older ixgb drivers are - removed from the kernel, before loading the new module: - - rmmod ixgb; modprobe ixgb - -3. Assign an IP address to the interface by entering the following, where - x is the interface number: - - ip addr add ethx <IP_address> - -4. Verify that the interface works. Enter the following, where <IP_address> - is the IP address for another machine on the same subnet as the interface - that is being tested: - - ping <IP_address> - - -Command Line Parameters -======================= - -If the driver is built as a module, the following optional parameters are -used by entering them on the command line with the modprobe command using -this syntax: - - modprobe ixgb [<option>=<VAL1>,<VAL2>,...] - -For example, with two 10GbE PCI adapters, entering: - - modprobe ixgb TxDescriptors=80,128 - -loads the ixgb driver with 80 TX resources for the first adapter and 128 TX -resources for the second adapter. - -The default value for each parameter is generally the recommended setting, -unless otherwise noted. - -FlowControl -Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx) -Default: Read from the EEPROM - If EEPROM is not detected, default is 1 - This parameter controls the automatic generation(Tx) and response(Rx) to - Ethernet PAUSE frames. There are hardware bugs associated with enabling - Tx flow control so beware. - -RxDescriptors -Valid Range: 64-512 -Default Value: 512 - This value is the number of receive descriptors allocated by the driver. - Increasing this value allows the driver to buffer more incoming packets. - Each descriptor is 16 bytes. A receive buffer is also allocated for - each descriptor and can be either 2048, 4056, 8192, or 16384 bytes, - depending on the MTU setting. When the MTU size is 1500 or less, the - receive buffer size is 2048 bytes. When the MTU is greater than 1500 the - receive buffer size will be either 4056, 8192, or 16384 bytes. The - maximum MTU size is 16114. - -RxIntDelay -Valid Range: 0-65535 (0=off) -Default Value: 72 - This value delays the generation of receive interrupts in units of - 0.8192 microseconds. Receive interrupt reduction can improve CPU - efficiency if properly tuned for specific network traffic. Increasing - this value adds extra latency to frame reception and can end up - decreasing the throughput of TCP traffic. If the system is reporting - dropped receives, this value may be set too high, causing the driver to - run out of available receive descriptors. - -TxDescriptors -Valid Range: 64-4096 -Default Value: 256 - This value is the number of transmit descriptors allocated by the driver. - Increasing this value allows the driver to queue more transmits. Each - descriptor is 16 bytes. - -XsumRX -Valid Range: 0-1 -Default Value: 1 - A value of '1' indicates that the driver should enable IP checksum - offload for received packets (both UDP and TCP) to the adapter hardware. - - -Improving Performance -===================== - -With the 10 Gigabit server adapters, the default Linux configuration will -very likely limit the total available throughput artificially. There is a set -of configuration changes that, when applied together, will increase the ability -of Linux to transmit and receive data. The following enhancements were -originally acquired from settings published at http://www.spec.org/web99/ for -various submitted results using Linux. - -NOTE: These changes are only suggestions, and serve as a starting point for - tuning your network performance. - -The changes are made in three major ways, listed in order of greatest effect: -- Use ip link to modify the mtu (maximum transmission unit) and the txqueuelen - parameter. -- Use sysctl to modify /proc parameters (essentially kernel tuning) -- Use setpci to modify the MMRBC field in PCI-X configuration space to increase - transmit burst lengths on the bus. - -NOTE: setpci modifies the adapter's configuration registers to allow it to read -up to 4k bytes at a time (for transmits). However, for some systems the -behavior after modifying this register may be undefined (possibly errors of -some kind). A power-cycle, hard reset or explicitly setting the e6 register -back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a -stable configuration. - -- COPY these lines and paste them into ixgb_perf.sh: -#!/bin/bash -echo "configuring network performance , edit this file to change the interface -or device ID of 10GbE card" -# set mmrbc to 4k reads, modify only Intel 10GbE device IDs -# replace 1a48 with appropriate 10GbE device's ID installed on the system, -# if needed. -setpci -d 8086:1a48 e6.b=2e -# set the MTU (max transmission unit) - it requires your switch and clients -# to change as well. -# set the txqueuelen -# your ixgb adapter should be loaded as eth1 for this to work, change if needed -ip li set dev eth1 mtu 9000 txqueuelen 1000 up -# call the sysctl utility to modify /proc/sys entries -sysctl -p ./sysctl_ixgb.conf -- END ixgb_perf.sh - -- COPY these lines and paste them into sysctl_ixgb.conf: -# some of the defaults may be different for your kernel -# call this file with sysctl -p <this file> -# these are just suggested values that worked well to increase throughput in -# several network benchmark tests, your mileage may vary - -### IPV4 specific settings -# turn TCP timestamp support off, default 1, reduces CPU use -net.ipv4.tcp_timestamps = 0 -# turn SACK support off, default on -# on systems with a VERY fast bus -> memory interface this is the big gainer -net.ipv4.tcp_sack = 0 -# set min/default/max TCP read buffer, default 4096 87380 174760 -net.ipv4.tcp_rmem = 10000000 10000000 10000000 -# set min/pressure/max TCP write buffer, default 4096 16384 131072 -net.ipv4.tcp_wmem = 10000000 10000000 10000000 -# set min/pressure/max TCP buffer space, default 31744 32256 32768 -net.ipv4.tcp_mem = 10000000 10000000 10000000 - -### CORE settings (mostly for socket and UDP effect) -# set maximum receive socket buffer size, default 131071 -net.core.rmem_max = 524287 -# set maximum send socket buffer size, default 131071 -net.core.wmem_max = 524287 -# set default receive socket buffer size, default 65535 -net.core.rmem_default = 524287 -# set default send socket buffer size, default 65535 -net.core.wmem_default = 524287 -# set maximum amount of option memory buffers, default 10240 -net.core.optmem_max = 524287 -# set number of unprocessed input packets before kernel starts dropping them; default 300 -net.core.netdev_max_backlog = 300000 -- END sysctl_ixgb.conf - -Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface -your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's -ID installed on the system. - -NOTE: Unless these scripts are added to the boot process, these changes will - only last only until the next system reboot. - - -Resolving Slow UDP Traffic --------------------------- -If your server does not seem to be able to receive UDP traffic as fast as it -can receive TCP traffic, it could be because Linux, by default, does not set -the network stack buffers as large as they need to be to support high UDP -transfer rates. One way to alleviate this problem is to allow more memory to -be used by the IP stack to store incoming data. - -For instance, use the commands: - sysctl -w net.core.rmem_max=262143 -and - sysctl -w net.core.rmem_default=262143 -to increase the read buffer memory max and default to 262143 (256k - 1) from -defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables -will increase the amount of memory used by the network stack for receives, and -can be increased significantly more if necessary for your application. - - -Additional Configurations -========================= - - Configuring the Driver on Different Distributions - ------------------------------------------------- - Configuring a network driver to load properly when the system is started is - distribution dependent. Typically, the configuration process involves adding - an alias line to /etc/modprobe.conf as well as editing other system startup - scripts and/or configuration files. Many popular Linux distributions ship - with tools to make these changes for you. To learn the proper way to - configure a network device for your system, refer to your distribution - documentation. If during this process you are asked for the driver or module - name, the name for the Linux Base Driver for the Intel 10GbE Family of - Adapters is ixgb. - - Viewing Link Messages - --------------------- - Link messages will not be displayed to the console if the distribution is - restricting system messages. In order to see network driver link messages on - your console, set dmesg to eight by entering the following: - - dmesg -n 8 - - NOTE: This setting is not saved across reboots. - - - Jumbo Frames - ------------ - The driver supports Jumbo Frames for all adapters. Jumbo Frames support is - enabled by changing the MTU to a value larger than the default of 1500. - The maximum value for the MTU is 16114. Use the ip command to - increase the MTU size. For example: - - ip li set dev ethx mtu 9000 - - The maximum MTU setting for Jumbo Frames is 16114. This value coincides - with the maximum Jumbo Frames size of 16128. - - - ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The ethtool - version 1.6 or later is required for this functionality. - - The latest release of ethtool can be found from - https://www.kernel.org/pub/software/network/ethtool/ - - NOTE: The ethtool version 1.6 only supports a limited set of ethtool options. - Support for a more complete ethtool feature set can be enabled by - upgrading to the latest version. - - - NAPI - ---- - - NAPI (Rx polling mode) is supported in the ixgb driver. NAPI is enabled - or disabled based on the configuration of the kernel. see CONFIG_IXGB_NAPI - - See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI. - - -Known Issues/Troubleshooting -============================ - - NOTE: After installing the driver, if your Intel Network Connection is not - working, verify in the "In This Release" section of the readme that you have - installed the correct driver. - - Intel(R) PRO/10GbE CX4 Server Adapter Cable Interoperability Issue with - Fujitsu XENPAK Module in SmartBits Chassis - --------------------------------------------------------------------- - Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 - Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits - chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni. - The CRC errors may be received either by the Intel(R) PRO/10GbE CX4 - Server adapter or the SmartBits. If this situation occurs using a different - cable assembly may resolve the issue. - - CX4 Server Adapter Cable Interoperability Issues with HP Procurve 3400cl - Switch Port - ------------------------------------------------------------------------ - Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server - adapter is connected to an HP Procurve 3400cl switch port using short cables - (1 m or shorter). If this situation occurs, using a longer cable may resolve - the issue. - - Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that - Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC - errors may be received either by the CX4 Server adapter or at the switch. If - this situation occurs, using a different cable assembly may resolve the issue. - - - Jumbo Frames System Requirement - ------------------------------- - Memory allocation failures have been observed on Linux systems with 64 MB - of RAM or less that are running Jumbo Frames. If you are using Jumbo - Frames, your system may require more than the advertised minimum - requirement of 64 MB of system memory. - - - Performance Degradation with Jumbo Frames - ----------------------------------------- - Degradation in throughput performance may be observed in some Jumbo frames - environments. If this is observed, increasing the application's socket buffer - size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help. - See the specific application manual and /usr/src/linux*/Documentation/ - networking/ip-sysctl.txt for more details. - - - Allocating Rx Buffers when Using Jumbo Frames - --------------------------------------------- - Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if - the available memory is heavily fragmented. This issue may be seen with PCI-X - adapters or with packet split disabled. This can be reduced or eliminated - by changing the amount of available memory for receive buffer allocation, by - increasing /proc/sys/vm/min_free_kbytes. - - - Multiple Interfaces on Same Ethernet Broadcast Network - ------------------------------------------------------ - Due to the default ARP behavior on Linux, it is not possible to have - one system on two IP networks in the same Ethernet broadcast domain - (non-partitioned switch) behave as expected. All Ethernet interfaces - will respond to IP traffic for any IP address assigned to the system. - This results in unbalanced receive traffic. - - If you have multiple interfaces in a server, do either of the following: - - - Turn on ARP filtering by entering: - echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter - - - Install the interfaces in separate broadcast domains - either in - different switches or in a switch partitioned to VLANs. - - - UDP Stress Test Dropped Packet Issue - -------------------------------------- - Under small packets UDP stress test with 10GbE driver, the Linux system - may drop UDP packets due to the fullness of socket buffers. You may want - to change the driver's Flow Control variables to the minimum value for - controlling packet reception. - - - Tx Hangs Possible Under Stress - ------------------------------ - Under stress conditions, if TX hangs occur, turning off TSO - "ethtool -K eth0 tso off" may resolve the problem. - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/ixgbe.rst b/Documentation/networking/ixgbe.rst new file mode 100644 index 000000000000..725fc697fd8f --- /dev/null +++ b/Documentation/networking/ixgbe.rst @@ -0,0 +1,527 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Adapters +============================================================================= + +Intel 10 Gigabit Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Command Line Parameters +- Additional Configurations +- Known Issues +- Support + +Identifying Your Adapter +======================== +The driver is compatible with devices based on the following: + + * Intel(R) Ethernet Controller 82598 + * Intel(R) Ethernet Controller 82599 + * Intel(R) Ethernet Controller X520 + * Intel(R) Ethernet Controller X540 + * Intel(R) Ethernet Controller x550 + * Intel(R) Ethernet Controller X552 + * Intel(R) Ethernet Controller X553 + +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +https://www.intel.com/support + +SFP+ Devices with Pluggable Optics +---------------------------------- + +82599-BASED ADAPTERS +~~~~~~~~~~~~~~~~~~~~ +NOTES: +- If your 82599-based Intel(R) Network Adapter came with Intel optics or is an +Intel(R) Ethernet Server Adapter X520-2, then it only supports Intel optics +and/or the direct attach cables listed below. +- When 82599-based SFP+ devices are connected back to back, they should be set +to the same Speed setting via ethtool. Results may vary if you mix speed +settings. + ++---------------+---------------------------------------+------------------+ +| Supplier | Type | Part Numbers | ++===============+=======================================+==================+ +| SR Modules | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | FTLX8571D3BCV-IT | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | AFBR-703SDZ-IN2 | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | AFBR-703SDDZ-IN1 | ++---------------+---------------------------------------+------------------+ +| LR Modules | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | FTLX1471D3BCV-IT | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | AFCT-701SDZ-IN2 | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | AFCT-701SDDZ-IN1 | ++---------------+---------------------------------------+------------------+ + +The following is a list of 3rd party SFP+ modules that have received some +testing. Not all modules are applicable to all devices. + ++---------------+---------------------------------------+------------------+ +| Supplier | Type | Part Numbers | ++===============+=======================================+==================+ +| Finisar | SFP+ SR bailed, 10g single rate | FTLX8571D3BCL | ++---------------+---------------------------------------+------------------+ +| Avago | SFP+ SR bailed, 10g single rate | AFBR-700SDZ | ++---------------+---------------------------------------+------------------+ +| Finisar | SFP+ LR bailed, 10g single rate | FTLX1471D3BCL | ++---------------+---------------------------------------+------------------+ +| Finisar | DUAL RATE 1G/10G SFP+ SR (No Bail) | FTLX8571D3QCV-IT | ++---------------+---------------------------------------+------------------+ +| Avago | DUAL RATE 1G/10G SFP+ SR (No Bail) | AFBR-703SDZ-IN1 | ++---------------+---------------------------------------+------------------+ +| Finisar | DUAL RATE 1G/10G SFP+ LR (No Bail) | FTLX1471D3QCV-IT | ++---------------+---------------------------------------+------------------+ +| Avago | DUAL RATE 1G/10G SFP+ LR (No Bail) | AFCT-701SDZ-IN1 | ++---------------+---------------------------------------+------------------+ +| Finisar | 1000BASE-T SFP | FCLF8522P2BTL | ++---------------+---------------------------------------+------------------+ +| Avago | 1000BASE-T | ABCU-5710RZ | ++---------------+---------------------------------------+------------------+ +| HP | 1000BASE-SX SFP | 453153-001 | ++---------------+---------------------------------------+------------------+ + +82599-based adapters support all passive and active limiting direct attach +cables that comply with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. + +Laser turns off for SFP+ when ifconfig ethX down +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +"ifconfig ethX down" turns off the laser for 82599-based SFP+ fiber adapters. +"ifconfig ethX up" turns on the laser. +Alternatively, you can use "ip link set [down/up] dev ethX" to turn the +laser off and on. + + +82599-based QSFP+ Adapters +~~~~~~~~~~~~~~~~~~~~~~~~~~ +NOTES: +- If your 82599-based Intel(R) Network Adapter came with Intel optics, it only +supports Intel optics. +- 82599-based QSFP+ adapters only support 4x10 Gbps connections. 1x40 Gbps +connections are not supported. QSFP+ link partners must be configured for +4x10 Gbps. +- 82599-based QSFP+ adapters do not support automatic link speed detection. +The link speed must be configured to either 10 Gbps or 1 Gbps to match the link +partners speed capabilities. Incorrect speed configurations will result in +failure to link. +- Intel(R) Ethernet Converged Network Adapter X520-Q1 only supports the optics +and direct attach cables listed below. + ++---------------+---------------------------------------+------------------+ +| Supplier | Type | Part Numbers | ++===============+=======================================+==================+ +| Intel | DUAL RATE 1G/10G QSFP+ SRL (bailed) | E10GQSFPSR | ++---------------+---------------------------------------+------------------+ + +82599-based QSFP+ adapters support all passive and active limiting QSFP+ +direct attach cables that comply with SFF-8436 v4.1 specifications. + +82598-BASED ADAPTERS +~~~~~~~~~~~~~~~~~~~~ +NOTES: +- Intel(r) Ethernet Network Adapters that support removable optical modules +only support their original module type (for example, the Intel(R) 10 Gigabit +SR Dual Port Express Module only supports SR optical modules). If you plug in +a different type of module, the driver will not load. +- Hot Swapping/hot plugging optical modules is not supported. +- Only single speed, 10 gigabit modules are supported. +- LAN on Motherboard (LOMs) may support DA, SR, or LR modules. Other module +types are not supported. Please see your system documentation for details. + +The following is a list of SFP+ modules and direct attach cables that have +received some testing. Not all modules are applicable to all devices. + ++---------------+---------------------------------------+------------------+ +| Supplier | Type | Part Numbers | ++===============+=======================================+==================+ +| Finisar | SFP+ SR bailed, 10g single rate | FTLX8571D3BCL | ++---------------+---------------------------------------+------------------+ +| Avago | SFP+ SR bailed, 10g single rate | AFBR-700SDZ | ++---------------+---------------------------------------+------------------+ +| Finisar | SFP+ LR bailed, 10g single rate | FTLX1471D3BCL | ++---------------+---------------------------------------+------------------+ + +82598-based adapters support all passive direct attach cables that comply with +SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach cables +are not supported. + +Third party optic modules and cables referred to above are listed only for the +purpose of highlighting third party specifications and potential +compatibility, and are not recommendations or endorsements or sponsorship of +any third party's product by Intel. Intel is not endorsing or promoting +products made by any third party and the third party reference is provided +only to share information regarding certain optic modules and cables with the +above specifications. There may be other manufacturers or suppliers, producing +or supplying optic modules and cables with similar or matching descriptions. +Customers must use their own discretion and diligence to purchase optic +modules and cables from any third party of their choice. Customers are solely +responsible for assessing the suitability of the product and/or devices and +for the selection of the vendor for purchasing any product. THE OPTIC MODULES +AND CABLES REFERRED TO ABOVE ARE NOT WARRANTED OR SUPPORTED BY INTEL. INTEL +ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED +WARRANTY, RELATING TO SALE AND/OR USE OF SUCH THIRD PARTY PRODUCTS OR +SELECTION OF VENDOR BY CUSTOMERS. + +Command Line Parameters +======================= + +max_vfs +------- +:Valid Range: 1-63 + +This parameter adds support for SR-IOV. It causes the driver to spawn up to +max_vfs worth of virtual functions. +If the value is greater than 0 it will also force the VMDq parameter to be 1 or +more. + +NOTE: This parameter is only used on kernel 3.7.x and below. On kernel 3.8.x +and above, use sysfs to enable VFs. Also, for Red Hat distributions, this +parameter is only used on version 6.6 and older. For version 6.7 and newer, use +sysfs. For example:: + + #echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs // enable VFs + #echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable VFs + +The parameters for the driver are referenced by position. Thus, if you have a +dual port adapter, or more than one adapter in your system, and want N virtual +functions per port, you must specify a number for each port with each parameter +separated by a comma. For example:: + + modprobe ixgbe max_vfs=4 + +This will spawn 4 VFs on the first port. + +:: + + modprobe ixgbe max_vfs=2,4 + +This will spawn 2 VFs on the first port and 4 VFs on the second port. + +NOTE: Caution must be used in loading the driver with these parameters. +Depending on your system configuration, number of slots, etc., it is impossible +to predict in all cases where the positions would be on the command line. + +NOTE: Neither the device nor the driver control how VFs are mapped into config +space. Bus layout will vary by operating system. On operating systems that +support it, you can check sysfs to find the mapping. + +NOTE: When either SR-IOV mode or VMDq mode is enabled, hardware VLAN filtering +and VLAN tag stripping/insertion will remain enabled. Please remove the old +VLAN filter before the new VLAN filter is added. For example, + +:: + + ip link set eth0 vf 0 vlan 100 // set VLAN 100 for VF 0 + ip link set eth0 vf 0 vlan 0 // Delete VLAN 100 + ip link set eth0 vf 0 vlan 200 // set a new VLAN 200 for VF 0 + +With kernel 3.6, the driver supports the simultaneous usage of max_vfs and DCB +features, subject to the constraints described below. Prior to kernel 3.6, the +driver did not support the simultaneous operation of max_vfs greater than 0 and +the DCB features (multiple traffic classes utilizing Priority Flow Control and +Extended Transmission Selection). + +When DCB is enabled, network traffic is transmitted and received through +multiple traffic classes (packet buffers in the NIC). The traffic is associated +with a specific class based on priority, which has a value of 0 through 7 used +in the VLAN tag. When SR-IOV is not enabled, each traffic class is associated +with a set of receive/transmit descriptor queue pairs. The number of queue +pairs for a given traffic class depends on the hardware configuration. When +SR-IOV is enabled, the descriptor queue pairs are grouped into pools. The +Physical Function (PF) and each Virtual Function (VF) is allocated a pool of +receive/transmit descriptor queue pairs. When multiple traffic classes are +configured (for example, DCB is enabled), each pool contains a queue pair from +each traffic class. When a single traffic class is configured in the hardware, +the pools contain multiple queue pairs from the single traffic class. + +The number of VFs that can be allocated depends on the number of traffic +classes that can be enabled. The configurable number of traffic classes for +each enabled VF is as follows: +0 - 15 VFs = Up to 8 traffic classes, depending on device support +16 - 31 VFs = Up to 4 traffic classes +32 - 63 VFs = 1 traffic class + +When VFs are configured, the PF is allocated one pool as well. The PF supports +the DCB features with the constraint that each traffic class will only use a +single queue pair. When zero VFs are configured, the PF can support multiple +queue pairs per traffic class. + +allow_unsupported_sfp +--------------------- +:Valid Range: 0,1 +:Default Value: 0 (disabled) + +This parameter allows unsupported and untested SFP+ modules on 82599-based +adapters, as long as the type of module is known to the driver. + +debug +----- +:Valid Range: 0-16 (0=none,...,16=all) +:Default Value: 0 + +This parameter adjusts the level of debug messages displayed in the system +logs. + + +Additional Features and Configurations +====================================== + +Flow Control +------------ +Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable +receiving and transmitting pause frames for ixgbe. When transmit is enabled, +pause frames are generated when the receive packet buffer crosses a predefined +threshold. When receive is enabled, the transmit unit will halt for the time +delay specified when a pause frame is received. + +NOTE: You must have a flow control capable link partner. + +Flow Control is enabled by default. + +Use ethtool to change the flow control settings. To enable or disable Rx or +Tx Flow Control:: + + ethtool -A eth? rx <on|off> tx <on|off> + +Note: This command only enables or disables Flow Control if auto-negotiation is +disabled. If auto-negotiation is enabled, this command changes the parameters +used for auto-negotiation with the link partner. + +To enable or disable auto-negotiation:: + + ethtool -s eth? autoneg <on|off> + +Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending +on your device, you may not be able to change the auto-negotiation setting. + +NOTE: For 82598 backplane cards entering 1 gigabit mode, flow control default +behavior is changed to off. Flow control in 1 gigabit mode on these devices can +lead to transmit hangs. + +Intel(R) Ethernet Flow Director +------------------------------- +The Intel Ethernet Flow Director performs the following tasks: + +- Directs receive packets according to their flows to different queues. +- Enables tight control on routing a flow in the platform. +- Matches flows and CPU cores for flow affinity. +- Supports multiple parameters for flexible flow classification and load + balancing (in SFP mode only). + +NOTE: Intel Ethernet Flow Director masking works in the opposite manner from +subnet masking. In the following command:: + + #ethtool -N eth11 flow-type ip4 src-ip 172.4.1.2 m 255.0.0.0 dst-ip \ + 172.21.1.1 m 255.128.0.0 action 31 + +The src-ip value that is written to the filter will be 0.4.1.2, not 172.0.0.0 +as might be expected. Similarly, the dst-ip value written to the filter will be +0.21.1.1, not 172.0.0.0. + +To enable or disable the Intel Ethernet Flow Director:: + + # ethtool -K ethX ntuple <on|off> + +When disabling ntuple filters, all the user programmed filters are flushed from +the driver cache and hardware. All needed filters must be re-added when ntuple +is re-enabled. + +To add a filter that directs packet to queue 2, use -U or -N switch:: + + # ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \ + 192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1] + +To see the list of filters currently present:: + + # ethtool <-u|-n> ethX + +Sideband Perfect Filters +------------------------ +Sideband Perfect Filters are used to direct traffic that matches specified +characteristics. They are enabled through ethtool's ntuple interface. To add a +new filter use the following command:: + + ethtool -U <device> flow-type <type> src-ip <ip> dst-ip <ip> src-port <port> \ + dst-port <port> action <queue> + +Where: + <device> - the ethernet device to program + <type> - can be ip4, tcp4, udp4, or sctp4 + <ip> - the IP address to match on + <port> - the port number to match on + <queue> - the queue to direct traffic towards (-1 discards the matched traffic) + +Use the following command to delete a filter:: + + ethtool -U <device> delete <N> + +Where <N> is the filter id displayed when printing all the active filters, and +may also have been specified using "loc <N>" when adding the filter. + +The following example matches TCP traffic sent from 192.168.0.1, port 5300, +directed to 192.168.0.5, port 80, and sends it to queue 7:: + + ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \ + src-port 5300 dst-port 80 action 7 + +For each flow-type, the programmed filters must all have the same matching +input set. For example, issuing the following two commands is acceptable:: + + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10 + +Issuing the next two commands, however, is not acceptable, since the first +specifies src-ip and the second specifies dst-ip:: + + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10 + +The second command will fail with an error. You may program multiple filters +with the same fields, using different values, but, on one device, you may not +program two TCP4 filters with different matching fields. + +Matching on a sub-portion of a field is not supported by the ixgbe driver, thus +partial mask fields are not supported. + +To create filters that direct traffic to a specific Virtual Function, use the +"user-def" parameter. Specify the user-def as a 64 bit value, where the lower 32 +bits represents the queue number, while the next 8 bits represent which VF. +Note that 0 is the PF, so the VF identifier is offset by 1. For example:: + + ... user-def 0x800000002 ... + +specifies to direct traffic to Virtual Function 7 (8 minus 1) into queue 2 of +that VF. + +Note that these filters will not break internal routing rules, and will not +route traffic that otherwise would not have been sent to the specified Virtual +Function. + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file:: + + /etc/sysconfig/network-scripts/ifcfg-eth<x> // for RHEL + /etc/sysconfig/network/<config_file> // for SLES + +NOTE: The maximum MTU setting for Jumbo Frames is 9710. This value coincides +with the maximum Jumbo Frames size of 9728 bytes. + +NOTE: This driver will attempt to use multiple page sized buffers to receive +each jumbo packet. This should help to avoid buffer starvation issues when +allocating receive packets. + +NOTE: For 82599-based network connections, if you are enabling jumbo frames in +a virtual function (VF), jumbo frames must first be enabled in the physical +function (PF). The VF MTU setting cannot be larger than the PF MTU. + +Generic Receive Offload, aka GRO +-------------------------------- +The driver supports the in-kernel software implementation of GRO. GRO has +shown that by coalescing Rx traffic into larger chunks of data, CPU +utilization can be significantly reduced when under large Rx load. GRO is an +evolution of the previously-used LRO interface. GRO is able to coalesce +other protocols besides TCP. It's also safe to use with configurations that +are problematic for LRO, namely bridging and iSCSI. + +Data Center Bridging (DCB) +-------------------------- +NOTE: +The kernel assumes that TC0 is available, and will disable Priority Flow +Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is +enabled when setting up DCB on your switch. + +DCB is a configuration Quality of Service implementation in hardware. It uses +the VLAN priority tag (802.1p) to filter traffic. That means that there are 8 +different priorities that traffic can be filtered into. It also enables +priority flow control (802.1Qbb) which can limit or eliminate the number of +dropped packets during network stress. Bandwidth can be allocated to each of +these priorities, which is enforced at the hardware level (802.1Qaz). + +Adapter firmware implements LLDP and DCBX protocol agents as per 802.1AB and +802.1Qaz respectively. The firmware based DCBX agent runs in willing mode only +and can accept settings from a DCBX capable peer. Software configuration of +DCBX parameters via dcbtool/lldptool are not supported. + +The ixgbe driver implements the DCB netlink interface layer to allow user-space +to communicate with the driver and query DCB configuration for the port. + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: +https://www.kernel.org/pub/software/network/ethtool/ + +FCoE +---- +The ixgbe driver supports Fiber Channel over Ethernet (FCoE) and Data Center +Bridging (DCB). This code has no default effect on the regular driver +operation. Configuring DCB and FCoE is outside the scope of this README. Refer +to http://www.open-fcoe.org/ for FCoE project information and contact +ixgbe-eedc@lists.sourceforge.net for DCB information. + +MAC and VLAN anti-spoofing feature +---------------------------------- +When a malicious driver attempts to send a spoofed packet, it is dropped by the +hardware and not transmitted. + +An interrupt is sent to the PF driver notifying it of the spoof attempt. When a +spoofed packet is detected, the PF driver will send the following message to +the system log (displayed by the "dmesg" command):: + + ixgbe ethX: ixgbe_spoof_check: n spoofed packets detected + +where "x" is the PF interface number; and "n" is number of spoofed packets. +NOTE: This feature can be disabled for a specific Virtual Function (VF):: + + ip link set <pf dev> vf <vf id> spoofchk {off|on} + + +Known Issues/Troubleshooting +============================ + +Enabling SR-IOV in a 64-bit Microsoft* Windows Server* 2012/R2 guest OS +----------------------------------------------------------------------- +Linux KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. +This includes traditional PCIe devices, as well as SR-IOV-capable devices based +on the Intel Ethernet Controller XL710. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/ixgbe.txt b/Documentation/networking/ixgbe.txt deleted file mode 100644 index 687835415707..000000000000 --- a/Documentation/networking/ixgbe.txt +++ /dev/null @@ -1,349 +0,0 @@ -Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Family of -Adapters -============================================================================= - -Intel 10 Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Additional Configurations -- Performance Tuning -- Known Issues -- Support - -Identifying Your Adapter -======================== - -The driver in this release is compatible with 82598, 82599 and X540-based -Intel Network Connections. - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/network/sb/CS-012904.htm - -SFP+ Devices with Pluggable Optics ----------------------------------- - -82599-BASED ADAPTERS - -NOTES: If your 82599-based Intel(R) Network Adapter came with Intel optics, or -is an Intel(R) Ethernet Server Adapter X520-2, then it only supports Intel -optics and/or the direct attach cables listed below. - -When 82599-based SFP+ devices are connected back to back, they should be set to -the same Speed setting via ethtool. Results may vary if you mix speed settings. -82598-based adapters support all passive direct attach cables that comply -with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach -cables are not supported. - -Supplier Type Part Numbers - -SR Modules -Intel DUAL RATE 1G/10G SFP+ SR (bailed) FTLX8571D3BCV-IT -Intel DUAL RATE 1G/10G SFP+ SR (bailed) AFBR-703SDDZ-IN1 -Intel DUAL RATE 1G/10G SFP+ SR (bailed) AFBR-703SDZ-IN2 -LR Modules -Intel DUAL RATE 1G/10G SFP+ LR (bailed) FTLX1471D3BCV-IT -Intel DUAL RATE 1G/10G SFP+ LR (bailed) AFCT-701SDDZ-IN1 -Intel DUAL RATE 1G/10G SFP+ LR (bailed) AFCT-701SDZ-IN2 - -The following is a list of 3rd party SFP+ modules and direct attach cables that -have received some testing. Not all modules are applicable to all devices. - -Supplier Type Part Numbers - -Finisar SFP+ SR bailed, 10g single rate FTLX8571D3BCL -Avago SFP+ SR bailed, 10g single rate AFBR-700SDZ -Finisar SFP+ LR bailed, 10g single rate FTLX1471D3BCL - -Finisar DUAL RATE 1G/10G SFP+ SR (No Bail) FTLX8571D3QCV-IT -Avago DUAL RATE 1G/10G SFP+ SR (No Bail) AFBR-703SDZ-IN1 -Finisar DUAL RATE 1G/10G SFP+ LR (No Bail) FTLX1471D3QCV-IT -Avago DUAL RATE 1G/10G SFP+ LR (No Bail) AFCT-701SDZ-IN1 -Finistar 1000BASE-T SFP FCLF8522P2BTL -Avago 1000BASE-T SFP ABCU-5710RZ - -82599-based adapters support all passive and active limiting direct attach -cables that comply with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. - -Laser turns off for SFP+ when device is down -------------------------------------------- -"ip link set down" turns off the laser for 82599-based SFP+ fiber adapters. -"ip link set up" turns on the laser. - - -82598-BASED ADAPTERS - -NOTES for 82598-Based Adapters: -- Intel(R) Network Adapters that support removable optical modules only support - their original module type (i.e., the Intel(R) 10 Gigabit SR Dual Port - Express Module only supports SR optical modules). If you plug in a different - type of module, the driver will not load. -- Hot Swapping/hot plugging optical modules is not supported. -- Only single speed, 10 gigabit modules are supported. -- LAN on Motherboard (LOMs) may support DA, SR, or LR modules. Other module - types are not supported. Please see your system documentation for details. - -The following is a list of 3rd party SFP+ modules and direct attach cables that -have received some testing. Not all modules are applicable to all devices. - -Supplier Type Part Numbers - -Finisar SFP+ SR bailed, 10g single rate FTLX8571D3BCL -Avago SFP+ SR bailed, 10g single rate AFBR-700SDZ -Finisar SFP+ LR bailed, 10g single rate FTLX1471D3BCL - -82598-based adapters support all passive direct attach cables that comply -with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach -cables are not supported. - - -Flow Control ------------- -Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable -receiving and transmitting pause frames for ixgbe. When TX is enabled, PAUSE -frames are generated when the receive packet buffer crosses a predefined -threshold. When rx is enabled, the transmit unit will halt for the time delay -specified when a PAUSE frame is received. - -Flow Control is enabled by default. If you want to disable a flow control -capable link partner, use ethtool: - - ethtool -A eth? autoneg off RX off TX off - -NOTE: For 82598 backplane cards entering 1 gig mode, flow control default -behavior is changed to off. Flow control in 1 gig mode on these devices can -lead to Tx hangs. - -Intel(R) Ethernet Flow Director -------------------------------- -Supports advanced filters that direct receive packets by their flows to -different queues. Enables tight control on routing a flow in the platform. -Matches flows and CPU cores for flow affinity. Supports multiple parameters -for flexible flow classification and load balancing. - -Flow director is enabled only if the kernel is multiple TX queue capable. - -An included script (set_irq_affinity.sh) automates setting the IRQ to CPU -affinity. - -You can verify that the driver is using Flow Director by looking at the counter -in ethtool: fdir_miss and fdir_match. - -Other ethtool Commands: -To enable Flow Director - ethtool -K ethX ntuple on -To add a filter - Use -U switch. e.g., ethtool -U ethX flow-type tcp4 src-ip 10.0.128.23 - action 1 -To see the list of filters currently present: - ethtool -u ethX - -Perfect Filter: Perfect filter is an interface to load the filter table that -funnels all flow into queue_0 unless an alternative queue is specified using -"action". In that case, any flow that matches the filter criteria will be -directed to the appropriate queue. - -If the queue is defined as -1, filter will drop matching packets. - -To account for filter matches and misses, there are two stats in ethtool: -fdir_match and fdir_miss. In addition, rx_queue_N_packets shows the number of -packets processed by the Nth queue. - -NOTE: Receive Packet Steering (RPS) and Receive Flow Steering (RFS) are not -compatible with Flow Director. IF Flow Director is enabled, these will be -disabled. - -The following three parameters impact Flow Director. - -FdirMode --------- -Valid Range: 0-2 (0=off, 1=ATR, 2=Perfect filter mode) -Default Value: 1 - - Flow Director filtering modes. - -FdirPballoc ------------ -Valid Range: 0-2 (0=64k, 1=128k, 2=256k) -Default Value: 0 - - Flow Director allocated packet buffer size. - -AtrSampleRate --------------- -Valid Range: 1-100 -Default Value: 20 - - Software ATR Tx packet sample rate. For example, when set to 20, every 20th - packet, looks to see if the packet will create a new flow. - -Node ----- -Valid Range: 0-n -Default Value: 1 (off) - - 0 - n: where n is the number of NUMA nodes (i.e. 0 - 3) currently online in - your system - 1: turns this option off - - The Node parameter will allow you to pick which NUMA node you want to have - the adapter allocate memory on. - -max_vfs -------- -Valid Range: 1-63 -Default Value: 0 - - If the value is greater than 0 it will also force the VMDq parameter to be 1 - or more. - - This parameter adds support for SR-IOV. It causes the driver to spawn up to - max_vfs worth of virtual function. - - -Additional Configurations -========================= - - Jumbo Frames - ------------ - The driver supports Jumbo Frames for all adapters. Jumbo Frames support is - enabled by changing the MTU to a value larger than the default of 1500. - The maximum value for the MTU is 16110. Use the ip command to - increase the MTU size. For example: - - ip link set dev ethx mtu 9000 - - The maximum MTU setting for Jumbo Frames is 9710. This value coincides - with the maximum Jumbo Frames size of 9728. - - Generic Receive Offload, aka GRO - -------------------------------- - The driver supports the in-kernel software implementation of GRO. GRO has - shown that by coalescing Rx traffic into larger chunks of data, CPU - utilization can be significantly reduced when under large Rx load. GRO is an - evolution of the previously-used LRO interface. GRO is able to coalesce - other protocols besides TCP. It's also safe to use with configurations that - are problematic for LRO, namely bridging and iSCSI. - - Data Center Bridging, aka DCB - ----------------------------- - DCB is a configuration Quality of Service implementation in hardware. - It uses the VLAN priority tag (802.1p) to filter traffic. That means - that there are 8 different priorities that traffic can be filtered into. - It also enables priority flow control which can limit or eliminate the - number of dropped packets during network stress. Bandwidth can be - allocated to each of these priorities, which is enforced at the hardware - level. - - To enable DCB support in ixgbe, you must enable the DCB netlink layer to - allow the userspace tools (see below) to communicate with the driver. - This can be found in the kernel configuration here: - - -> Networking support - -> Networking options - -> Data Center Bridging support - - Once this is selected, DCB support must be selected for ixgbe. This can - be found here: - - -> Device Drivers - -> Network device support (NETDEVICES [=y]) - -> Ethernet (10000 Mbit) (NETDEV_10000 [=y]) - -> Intel(R) 10GbE PCI Express adapters support - -> Data Center Bridging (DCB) Support - - After these options are selected, you must rebuild your kernel and your - modules. - - In order to use DCB, userspace tools must be downloaded and installed. - The dcbd tools can be found at: - - http://e1000.sf.net - - Ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The latest - ethtool version is required for this functionality. - - The latest release of ethtool can be found from - https://www.kernel.org/pub/software/network/ethtool/ - - FCoE - ---- - This release of the ixgbe driver contains new code to enable users to use - Fiber Channel over Ethernet (FCoE) and Data Center Bridging (DCB) - functionality that is supported by the 82598-based hardware. This code has - no default effect on the regular driver operation, and configuring DCB and - FCoE is outside the scope of this driver README. Refer to - http://www.open-fcoe.org/ for FCoE project information and contact - e1000-eedc@lists.sourceforge.net for DCB information. - - MAC and VLAN anti-spoofing feature - ---------------------------------- - When a malicious driver attempts to send a spoofed packet, it is dropped by - the hardware and not transmitted. An interrupt is sent to the PF driver - notifying it of the spoof attempt. - - When a spoofed packet is detected the PF driver will send the following - message to the system log (displayed by the "dmesg" command): - - Spoof event(s) detected on VF (n) - - Where n=the VF that attempted to do the spoofing. - - -Performance Tuning -================== - -An excellent article on performance tuning can be found at: - -http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf - - -Known Issues -============ - - Enabling SR-IOV in a 32-bit or 64-bit Microsoft* Windows* Server 2008/R2 - Guest OS using Intel (R) 82576-based GbE or Intel (R) 82599-based 10GbE - controller under KVM - ------------------------------------------------------------------------ - KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This - includes traditional PCIe devices, as well as SR-IOV-capable devices using - Intel 82576-based and 82599-based controllers. - - While direct assignment of a PCIe device or an SR-IOV Virtual Function (VF) - to a Linux-based VM running 2.6.32 or later kernel works fine, there is a - known issue with Microsoft Windows Server 2008 VM that results in a "yellow - bang" error. This problem is within the KVM VMM itself, not the Intel driver, - or the SR-IOV logic of the VMM, but rather that KVM emulates an older CPU - model for the guests, and this older CPU model does not support MSI-X - interrupts, which is a requirement for Intel SR-IOV. - - If you wish to use the Intel 82576 or 82599-based controllers in SR-IOV mode - with KVM and a Microsoft Windows Server 2008 guest try the following - workaround. The workaround is to tell KVM to emulate a different model of CPU - when using qemu to create the KVM guest: - - "-cpu qemu64,model=13" - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://e1000.sourceforge.net - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/ixgbevf.rst b/Documentation/networking/ixgbevf.rst new file mode 100644 index 000000000000..56cde6366c2f --- /dev/null +++ b/Documentation/networking/ixgbevf.rst @@ -0,0 +1,66 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Virtual Function Driver for Intel(R) 10G Ethernet +============================================================= + +Intel 10 Gigabit Virtual Function Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Known Issues +- Support + +This driver supports 82599, X540, X550, and X552-based virtual function devices +that can only be activated on kernels that support SR-IOV. + +For questions related to hardware requirements, refer to the documentation +supplied with your Intel adapter. All hardware requirements listed apply to use +with Linux. + + +Identifying Your Adapter +======================== +The driver is compatible with devices based on the following: + + * Intel(R) Ethernet Controller 82598 + * Intel(R) Ethernet Controller 82599 + * Intel(R) Ethernet Controller X520 + * Intel(R) Ethernet Controller X540 + * Intel(R) Ethernet Controller x550 + * Intel(R) Ethernet Controller X552 + * Intel(R) Ethernet Controller X553 + +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +https://www.intel.com/support + +Known Issues/Troubleshooting +============================ + +SR-IOV requires the correct platform and OS support. + +The guest OS loading this driver must support MSI-X interrupts. + +This driver is only supported as a loadable module at this time. Intel is not +supplying patches against the kernel source to allow for static linking of the +drivers. + +VLANs: There is a limit of a total of 64 shared VLANs to 1 or more VFs. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/ixgbevf.txt b/Documentation/networking/ixgbevf.txt deleted file mode 100644 index 53d8d2a5a6a3..000000000000 --- a/Documentation/networking/ixgbevf.txt +++ /dev/null @@ -1,52 +0,0 @@ -Linux* Base Driver for Intel(R) Ethernet Network Connection -=========================================================== - -Intel Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Known Issues/Troubleshooting -- Support - -This file describes the ixgbevf Linux* Base Driver for Intel Network -Connection. - -The ixgbevf driver supports 82599-based virtual function devices that can only -be activated on kernels with CONFIG_PCI_IOV enabled. - -The ixgbevf driver supports virtual functions generated by the ixgbe driver -with a max_vfs value of 1 or greater. - -The guest OS loading the ixgbevf driver must support MSI-X interrupts. - -VLANs: There is a limit of a total of 32 shared VLANs to 1 or more VFs. - -Identifying Your Adapter -======================== - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -Known Issues/Troubleshooting -============================ - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/netvsc.txt b/Documentation/networking/netvsc.txt index 92f5b31392fa..3bfa635bbbd5 100644 --- a/Documentation/networking/netvsc.txt +++ b/Documentation/networking/netvsc.txt @@ -45,6 +45,15 @@ Features like packets and significantly reduces CPU usage under heavy Rx load. + Large Receive Offload (LRO), or Receive Side Coalescing (RSC) + ------------------------------------------------------------- + The driver supports LRO/RSC in the vSwitch feature. It reduces the per packet + processing overhead by coalescing multiple TCP segments when possible. The + feature is enabled by default on VMs running on Windows Server 2019 and + later. It may be changed by ethtool command: + ethtool -K eth0 lro on + ethtool -K eth0 lro off + SR-IOV support -------------- Hyper-V supports SR-IOV as a hardware acceleration option. If SR-IOV diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index b5407163d53b..605e00cdd6be 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -1069,6 +1069,31 @@ The kernel interface functions are as follows: This function may transmit a PING ACK. + (*) Get reply timestamp. + + bool rxrpc_kernel_get_reply_time(struct socket *sock, + struct rxrpc_call *call, + ktime_t *_ts) + + This allows the timestamp on the first DATA packet of the reply of a + client call to be queried, provided that it is still in the Rx ring. If + successful, the timestamp will be stored into *_ts and true will be + returned; false will be returned otherwise. + + (*) Get remote client epoch. + + u32 rxrpc_kernel_get_epoch(struct socket *sock, + struct rxrpc_call *call) + + This allows the epoch that's contained in packets of an incoming client + call to be queried. This value is returned. The function always + successful if the call is still in progress. It shouldn't be called once + the call has expired. Note that calling this on a local client call only + returns the local epoch. + + This value can be used to determine if the remote client has been + restarted as it shouldn't change otherwise. + ======================= CONFIGURABLE PARAMETERS diff --git a/Documentation/networking/tcp.txt b/Documentation/networking/tcp.txt deleted file mode 100644 index 9c7139d57e57..000000000000 --- a/Documentation/networking/tcp.txt +++ /dev/null @@ -1,101 +0,0 @@ -TCP protocol -============ - -Last updated: 3 June 2017 - -Contents -======== - -- Congestion control -- How the new TCP output machine [nyi] works - -Congestion control -================== - -The following variables are used in the tcp_sock for congestion control: -snd_cwnd The size of the congestion window -snd_ssthresh Slow start threshold. We are in slow start if - snd_cwnd is less than this. -snd_cwnd_cnt A counter used to slow down the rate of increase - once we exceed slow start threshold. -snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to. -snd_cwnd_stamp Timestamp for when congestion window last validated. -snd_cwnd_used Used as a highwater mark for how much of the - congestion window is in use. It is used to adjust - snd_cwnd down when the link is limited by the - application rather than the network. - -As of 2.6.13, Linux supports pluggable congestion control algorithms. -A congestion control mechanism can be registered through functions in -tcp_cong.c. The functions used by the congestion control mechanism are -registered via passing a tcp_congestion_ops struct to -tcp_register_congestion_control. As a minimum, the congestion control -mechanism must provide a valid name and must implement either ssthresh, -cong_avoid and undo_cwnd hooks or the "omnipotent" cong_control hook. - -Private data for a congestion control mechanism is stored in tp->ca_priv. -tcp_ca(tp) returns a pointer to this space. This is preallocated space - it -is important to check the size of your private data will fit this space, or -alternatively, space could be allocated elsewhere and a pointer to it could -be stored here. - -There are three kinds of congestion control algorithms currently: The -simplest ones are derived from TCP reno (highspeed, scalable) and just -provide an alternative congestion window calculation. More complex -ones like BIC try to look at other events to provide better -heuristics. There are also round trip time based algorithms like -Vegas and Westwood+. - -Good TCP congestion control is a complex problem because the algorithm -needs to maintain fairness and performance. Please review current -research and RFC's before developing new modules. - -The default congestion control mechanism is chosen based on the -DEFAULT_TCP_CONG Kconfig parameter. If you really want a particular default -value then you can set it using sysctl net.ipv4.tcp_congestion_control. The -module will be autoloaded if needed and you will get the expected protocol. If -you ask for an unknown congestion method, then the sysctl attempt will fail. - -If you remove a TCP congestion control module, then you will get the next -available one. Since reno cannot be built as a module, and cannot be -removed, it will always be available. - -How the new TCP output machine [nyi] works. -=========================================== - -Data is kept on a single queue. The skb->users flag tells us if the frame is -one that has been queued already. To add a frame we throw it on the end. Ack -walks down the list from the start. - -We keep a set of control flags - - - sk->tcp_pend_event - - TCP_PEND_ACK Ack needed - TCP_ACK_NOW Needed now - TCP_WINDOW Window update check - TCP_WINZERO Zero probing - - - sk->transmit_queue The transmission frame begin - sk->transmit_new First new frame pointer - sk->transmit_end Where to add frames - - sk->tcp_last_tx_ack Last ack seen - sk->tcp_dup_ack Dup ack count for fast retransmit - - -Frames are queued for output by tcp_write. We do our best to send the frames -off immediately if possible, but otherwise queue and compute the body -checksum in the copy. - -When a write is done we try to clear any pending events and piggy back them. -If the window is full we queue full sized frames. On the first timeout in -zero window we split this. - -On a timer we walk the retransmit list to send any retransmits, update the -backoff timers etc. A change of route table stamp causes a change of header -and recompute. We add any new tcp level headers and refinish the checksum -before sending. - diff --git a/Documentation/networking/xfrm_device.txt b/Documentation/networking/xfrm_device.txt index 50c34ca65efe..267f55b5f54a 100644 --- a/Documentation/networking/xfrm_device.txt +++ b/Documentation/networking/xfrm_device.txt @@ -68,6 +68,10 @@ and an indication of whether it is for Rx or Tx. The driver should - verify the algorithm is supported for offloads - store the SA information (key, salt, target-ip, protocol, etc) - enable the HW offload of the SA + - return status value: + 0 success + -EOPNETSUPP offload not supported, try SW IPsec + other fail the request The driver can also set an offload_handle in the SA, an opaque void pointer that can be used to convey context into the fast-path offload requests. |