diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2017-05-02 16:40:27 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2017-05-02 16:40:27 -0700 |
commit | 8d65b08debc7e62b2c6032d7fe7389d895b92cbc (patch) | |
tree | 0c3141b60c3a03cc32742b5750c5e763b9dae489 /Documentation/networking | |
parent | 5a0387a8a8efb90ae7fea1e2e5c62de3efa74691 (diff) | |
parent | 5d15af6778b8e4ed1fd41b040283af278e7a9a72 (diff) | |
download | linux-8d65b08debc7e62b2c6032d7fe7389d895b92cbc.tar.bz2 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Millar:
"Here are some highlights from the 2065 networking commits that
happened this development cycle:
1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)
2) Add a generic XDP driver, so that anyone can test XDP even if they
lack a networking device whose driver has explicit XDP support
(me).
3) Sparc64 now has an eBPF JIT too (me)
4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
Starovoitov)
5) Make netfitler network namespace teardown less expensive (Florian
Westphal)
6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)
7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)
8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)
9) Multiqueue support in stmmac driver (Joao Pinto)
10) Remove TCP timewait recycling, it never really could possibly work
well in the real world and timestamp randomization really zaps any
hint of usability this feature had (Soheil Hassas Yeganeh)
11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
Aleksandrov)
12) Add socket busy poll support to epoll (Sridhar Samudrala)
13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
and several others)
14) IPSEC hw offload infrastructure (Steffen Klassert)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
tipc: refactor function tipc_sk_recv_stream()
tipc: refactor function tipc_sk_recvmsg()
net: thunderx: Optimize page recycling for XDP
net: thunderx: Support for XDP header adjustment
net: thunderx: Add support for XDP_TX
net: thunderx: Add support for XDP_DROP
net: thunderx: Add basic XDP support
net: thunderx: Cleanup receive buffer allocation
net: thunderx: Optimize CQE_TX handling
net: thunderx: Optimize RBDR descriptor handling
net: thunderx: Support for page recycling
ipx: call ipxitf_put() in ioctl error path
net: sched: add helpers to handle extended actions
qed*: Fix issues in the ptp filter config implementation.
qede: Fix concurrency issue in PTP Tx path processing.
stmmac: Add support for SIMATIC IOT2000 platform
net: hns: fix ethtool_get_strings overflow in hns driver
tcp: fix wraparound issue in tcp_lp
bpf, arm64: fix jit branch offset related to ldimm64
bpf, arm64: implement jiting of BPF_XADD
...
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/filter.txt | 7 | ||||
-rw-r--r-- | Documentation/networking/i40e.txt | 72 | ||||
-rw-r--r-- | Documentation/networking/ip-sysctl.txt | 45 | ||||
-rw-r--r-- | Documentation/networking/ipvs-sysctl.txt | 68 | ||||
-rw-r--r-- | Documentation/networking/mpls-sysctl.txt | 19 | ||||
-rw-r--r-- | Documentation/networking/switchdev.txt | 70 |
6 files changed, 226 insertions, 55 deletions
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index 683ada5ad81d..b69b205501de 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -595,10 +595,9 @@ got from bpf_prog_create(), and 'ctx' the given context (e.g. skb pointer). All constraints and restrictions from bpf_check_classic() apply before a conversion to the new layout is being done behind the scenes! -Currently, the classic BPF format is being used for JITing on most of the -architectures. x86-64, aarch64 and s390x perform JIT compilation from eBPF -instruction set, however, future work will migrate other JIT compilers as well, -so that they will profit from the very same benefits. +Currently, the classic BPF format is being used for JITing on most 32-bit +architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64 perform JIT +compilation from eBPF instruction set. Some core changes of the new internal format: diff --git a/Documentation/networking/i40e.txt b/Documentation/networking/i40e.txt index a251bf4fe9c9..57e616ed10b0 100644 --- a/Documentation/networking/i40e.txt +++ b/Documentation/networking/i40e.txt @@ -63,6 +63,78 @@ Additional Configurations The latest release of ethtool can be found from https://www.kernel.org/pub/software/network/ethtool + + Flow Director n-ntuple traffic filters (FDir) + --------------------------------------------- + The driver utilizes the ethtool interface for configuring ntuple filters, + via "ethtool -N <device> <filter>". + + The sctp4, ip4, udp4, and tcp4 flow types are supported with the standard + fields including src-ip, dst-ip, src-port and dst-port. The driver only + supports fully enabling or fully masking the fields, so use of the mask + fields for partial matches is not supported. + + Additionally, the driver supports using the action to specify filters for a + Virtual Function. You can specify the action as a 64bit value, where the + lower 32 bits represents the queue number, while the next 8 bits represent + which VF. Note that 0 is the PF, so the VF identifier is offset by 1. For + example: + + ... action 0x800000002 ... + + Would indicate to direct traffic for Virtual Function 7 (8 minus 1) on queue + 2 of that VF. + + The driver also supports using the user-defined field to specify 2 bytes of + arbitrary data to match within the packet payload in addition to the regular + fields. The data is specified in the lower 32bits of the user-def field in + the following way: + + +----------------------------+---------------------------+ + | 31 28 24 20 16 | 15 12 8 4 0| + +----------------------------+---------------------------+ + | offset into packet payload | 2 bytes of flexible data | + +----------------------------+---------------------------+ + + As an example, + + ... user-def 0x4FFFF .... + + means to match the value 0xFFFF 4 bytes into the packet payload. Note that + the offset is based on the beginning of the payload, and not the beginning + of the packet. Thus + + flow-type tcp4 ... user-def 0x8BEAF .... + + would match TCP/IPv4 packets which have the value 0xBEAF 8bytes into the + TCP/IPv4 payload. + + For ICMP, the hardware parses the ICMP header as 4 bytes of header and 4 + bytes of payload, so if you want to match an ICMP frames payload you may need + to add 4 to the offset in order to match the data. + + Furthermore, the offset can only be up to a value of 64, as the hardware + will only read up to 64 bytes of data from the payload. It must also be even + as the flexible data is 2 bytes long and must be aligned to byte 0 of the + packet payload. + + When programming filters, the hardware is limited to using a single input + set for each flow type. This means that it is an error to program two + different filters with the same type that don't match on the same fields. + Thus the second of the following two commands will fail: + + ethtool -N <device> flow-type tcp4 src-ip 192.168.0.7 action 5 + ethtool -N <device> flow-type tcp4 dst-ip 192.168.15.18 action 1 + + This is because the first filter will be accepted and reprogram the input + set for TCPv4 filters, but the second filter will be unable to reprogram the + input set until all the conflicting TCPv4 filters are first removed. + + Note that the user-defined flexible offset is also considered part of the + input set and cannot be programmed separately for multiple filters of the + same type. However, the flexible data is not part of the input set and + multiple filters may use the same offset but match against different data. + Data Center Bridging (DCB) -------------------------- DCB configuration is not currently supported. diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ab0230461377..974ab47ae53a 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -73,6 +73,14 @@ fib_multipath_use_neigh - BOOLEAN 0 - disabled 1 - enabled +fib_multipath_hash_policy - INTEGER + Controls which hash policy to use for multipath routes. Only valid + for kernels built with CONFIG_IP_ROUTE_MULTIPATH enabled. + Default: 0 (Layer 3) + Possible values: + 0 - Layer 3 + 1 - Layer 4 + route/max_size - INTEGER Maximum number of routes allowed in the kernel. Increase this when using large numbers of interfaces and/or routes. @@ -594,6 +602,14 @@ tcp_fastopen - INTEGER Note that that additional client or server features are only effective if the basic support (0x1 and 0x2) are enabled respectively. +tcp_fastopen_blackhole_timeout_sec - INTEGER + Initial time period in second to disable Fastopen on active TCP sockets + when a TFO firewall blackhole issue happens. + This time period will grow exponentially when more blackhole issues + get detected right after Fastopen is re-enabled and will reset to + initial value when the blackhole issue goes away. + By default, it is set to 1hr. + tcp_syn_retries - INTEGER Number of times initial SYNs for an active TCP connection attempt will be retransmitted. Should not be higher than 127. Default value @@ -640,11 +656,6 @@ tcp_tso_win_divisor - INTEGER building larger TSO frames. Default: 3 -tcp_tw_recycle - BOOLEAN - Enable fast recycling TIME-WAIT sockets. Default value is 0. - It should not be changed without advice/request of technical - experts. - tcp_tw_reuse - BOOLEAN Allow to reuse TIME-WAIT sockets for new connections when it is safe from protocol viewpoint. Default value is 0. @@ -853,12 +864,21 @@ ip_dynaddr - BOOLEAN ip_early_demux - BOOLEAN Optimize input packet processing down to one demux for certain kinds of local sockets. Currently we only do this - for established TCP sockets. + for established TCP and connected UDP sockets. It may add an additional cost for pure routing workloads that reduces overall throughput, in such case you should disable it. Default: 1 +tcp_early_demux - BOOLEAN + Enable early demux for established TCP sockets. + Default: 1 + +udp_early_demux - BOOLEAN + Enable early demux for connected UDP sockets. Disable this if + your system could experience more unconnected load. + Default: 1 + icmp_echo_ignore_all - BOOLEAN If set non-zero, then the kernel will ignore all ICMP ECHO requests sent to it. @@ -1458,11 +1478,20 @@ accept_ra_pinfo - BOOLEAN Functional default: enabled if accept_ra is enabled. disabled if accept_ra is disabled. +accept_ra_rt_info_min_plen - INTEGER + Minimum prefix length of Route Information in RA. + + Route Information w/ prefix smaller than this variable shall + be ignored. + + Functional default: 0 if accept_ra_rtr_pref is enabled. + -1 if accept_ra_rtr_pref is disabled. + accept_ra_rt_info_max_plen - INTEGER Maximum prefix length of Route Information in RA. - Route Information w/ prefix larger than or equal to this - variable shall be ignored. + Route Information w/ prefix larger than this variable shall + be ignored. Functional default: 0 if accept_ra_rtr_pref is enabled. -1 if accept_ra_rtr_pref is disabled. diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt index e6b1c025fdd8..056898685d40 100644 --- a/Documentation/networking/ipvs-sysctl.txt +++ b/Documentation/networking/ipvs-sysctl.txt @@ -175,6 +175,14 @@ nat_icmp_send - BOOLEAN for VS/NAT when the load balancer receives packets from real servers but the connection entries don't exist. +pmtu_disc - BOOLEAN + 0 - disabled + not 0 - enabled (default) + + By default, reject with FRAG_NEEDED all DF packets that exceed + the PMTU, irrespective of the forwarding method. For TUN method + the flag can be disabled to fragment such packets. + secure_tcp - INTEGER 0 - disabled (default) @@ -185,15 +193,59 @@ secure_tcp - INTEGER The value definition is the same as that of drop_entry and drop_packet. -sync_threshold - INTEGER - default 3 +sync_threshold - vector of 2 INTEGERs: sync_threshold, sync_period + default 3 50 + + It sets synchronization threshold, which is the minimum number + of incoming packets that a connection needs to receive before + the connection will be synchronized. A connection will be + synchronized, every time the number of its incoming packets + modulus sync_period equals the threshold. The range of the + threshold is from 0 to sync_period. + + When sync_period and sync_refresh_period are 0, send sync only + for state changes or only once when pkts matches sync_threshold + +sync_refresh_period - UNSIGNED INTEGER + default 0 + + In seconds, difference in reported connection timer that triggers + new sync message. It can be used to avoid sync messages for the + specified period (or half of the connection timeout if it is lower) + if connection state is not changed since last sync. + + This is useful for normal connections with high traffic to reduce + sync rate. Additionally, retry sync_retries times with period of + sync_refresh_period/8. + +sync_retries - INTEGER + default 0 + + Defines sync retries with period of sync_refresh_period/8. Useful + to protect against loss of sync messages. The range of the + sync_retries is from 0 to 3. + +sync_qlen_max - UNSIGNED LONG + + Hard limit for queued sync messages that are not sent yet. It + defaults to 1/32 of the memory pages but actually represents + number of messages. It will protect us from allocating large + parts of memory when the sending rate is lower than the queuing + rate. + +sync_sock_size - INTEGER + default 0 + + Configuration of SNDBUF (master) or RCVBUF (slave) socket limit. + Default value is 0 (preserve system defaults). + +sync_ports - INTEGER + default 1 - It sets synchronization threshold, which is the minimum number - of incoming packets that a connection needs to receive before - the connection will be synchronized. A connection will be - synchronized, every time the number of its incoming packets - modulus 50 equals the threshold. The range of the threshold is - from 0 to 49. + The number of threads that master and backup servers can use for + sync traffic. Every thread will use single UDP port, thread 0 will + use the default port 8848 while last thread will use port + 8848+sync_ports-1. snat_reroute - BOOLEAN 0 - disabled diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt index 15d8d16934fd..2f24a1912a48 100644 --- a/Documentation/networking/mpls-sysctl.txt +++ b/Documentation/networking/mpls-sysctl.txt @@ -19,6 +19,25 @@ platform_labels - INTEGER Possible values: 0 - 1048575 Default: 0 +ip_ttl_propagate - BOOL + Control whether TTL is propagated from the IPv4/IPv6 header to + the MPLS header on imposing labels and propagated from the + MPLS header to the IPv4/IPv6 header on popping the last label. + + If disabled, the MPLS transport network will appear as a + single hop to transit traffic. + + 0 - disabled / RFC 3443 [Short] Pipe Model + 1 - enabled / RFC 3443 Uniform Model (default) + +default_ttl - BOOL + Default TTL value to use for MPLS packets where it cannot be + propagated from an IP header, either because one isn't present + or ip_ttl_propagate has been disabled. + + Possible values: 1 - 255 + Default: 255 + conf/<interface>/input - BOOL Control whether packets can be input on this interface. diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt index 2bbac05ab9e2..3e7b946dea27 100644 --- a/Documentation/networking/switchdev.txt +++ b/Documentation/networking/switchdev.txt @@ -13,43 +13,43 @@ an example setup using a data-center-class switch ASIC chip. Other setups with SR-IOV or soft switches, such as OVS, are possible. - User-space tools - - user space | - +-------------------------------------------------------------------+ - kernel | Netlink - | - +--------------+-------------------------------+ - | Network stack | - | (Linux) | - | | - +----------------------------------------------+ - + User-space tools + + user space | + +-------------------------------------------------------------------+ + kernel | Netlink + | + +--------------+-------------------------------+ + | Network stack | + | (Linux) | + | | + +----------------------------------------------+ + sw1p2 sw1p4 sw1p6 - sw1p1 + sw1p3 + sw1p5 + eth1 - + | + | + | + - | | | | | | | - +--+----+----+----+-+--+----+---+ +-----+-----+ - | Switch driver | | mgmt | - | (this document) | | driver | - | | | | - +--------------+----------------+ +-----------+ - | - kernel | HW bus (eg PCI) - +-------------------------------------------------------------------+ - hardware | - +--------------+---+------------+ - | Switch device (sw1) | - | +----+ +--------+ - | | v offloaded data path | mgmt port - | | | | - +--|----|----+----+----+----+---+ - | | | | | | - + + + + + + + sw1p1 + sw1p3 + sw1p5 + eth1 + + | + | + | + + | | | | | | | + +--+----+----+----+-+--+----+---+ +-----+-----+ + | Switch driver | | mgmt | + | (this document) | | driver | + | | | | + +--------------+----------------+ +-----------+ + | + kernel | HW bus (eg PCI) + +-------------------------------------------------------------------+ + hardware | + +--------------+---+------------+ + | Switch device (sw1) | + | +----+ +--------+ + | | v offloaded data path | mgmt port + | | | | + +--|----|----+----+----+----+---+ + | | | | | | + + + + + + + p1 p2 p3 p4 p5 p6 - - front-panel ports - + + front-panel ports + Fig 1. |