summaryrefslogtreecommitdiffstats
path: root/Documentation/networking/ip-sysctl.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking/ip-sysctl.rst')
-rw-r--r--Documentation/networking/ip-sysctl.rst111
1 files changed, 111 insertions, 0 deletions
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index e7b3fa7bb3f7..7fbd060d6047 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1069,6 +1069,81 @@ tcp_child_ehash_entries - INTEGER
Default: 0
+tcp_plb_enabled - BOOLEAN
+ If set and the underlying congestion control (e.g. DCTCP) supports
+ and enables PLB feature, TCP PLB (Protective Load Balancing) is
+ enabled. PLB is described in the following paper:
+ https://doi.org/10.1145/3544216.3544226. Based on PLB parameters,
+ upon sensing sustained congestion, TCP triggers a change in
+ flow label field for outgoing IPv6 packets. A change in flow label
+ field potentially changes the path of outgoing packets for switches
+ that use ECMP/WCMP for routing.
+
+ PLB changes socket txhash which results in a change in IPv6 Flow Label
+ field, and currently no-op for IPv4 headers. It is possible
+ to apply PLB for IPv4 with other network header fields (e.g. TCP
+ or IPv4 options) or using encapsulation where outer header is used
+ by switches to determine next hop. In either case, further host
+ and switch side changes will be needed.
+
+ When set, PLB assumes that congestion signal (e.g. ECN) is made
+ available and used by congestion control module to estimate a
+ congestion measure (e.g. ce_ratio). PLB needs a congestion measure to
+ make repathing decisions.
+
+ Default: FALSE
+
+tcp_plb_idle_rehash_rounds - INTEGER
+ Number of consecutive congested rounds (RTT) seen after which
+ a rehash can be performed, given there are no packets in flight.
+ This is referred to as M in PLB paper:
+ https://doi.org/10.1145/3544216.3544226.
+
+ Possible Values: 0 - 31
+
+ Default: 3
+
+tcp_plb_rehash_rounds - INTEGER
+ Number of consecutive congested rounds (RTT) seen after which
+ a forced rehash can be performed. Be careful when setting this
+ parameter, as a small value increases the risk of retransmissions.
+ This is referred to as N in PLB paper:
+ https://doi.org/10.1145/3544216.3544226.
+
+ Possible Values: 0 - 31
+
+ Default: 12
+
+tcp_plb_suspend_rto_sec - INTEGER
+ Time, in seconds, to suspend PLB in event of an RTO. In order to avoid
+ having PLB repath onto a connectivity "black hole", after an RTO a TCP
+ connection suspends PLB repathing for a random duration between 1x and
+ 2x of this parameter. Randomness is added to avoid concurrent rehashing
+ of multiple TCP connections. This should be set corresponding to the
+ amount of time it takes to repair a failed link.
+
+ Possible Values: 0 - 255
+
+ Default: 60
+
+tcp_plb_cong_thresh - INTEGER
+ Fraction of packets marked with congestion over a round (RTT) to
+ tag that round as congested. This is referred to as K in the PLB paper:
+ https://doi.org/10.1145/3544216.3544226.
+
+ The 0-1 fraction range is mapped to 0-256 range to avoid floating
+ point operations. For example, 128 means that if at least 50% of
+ the packets in a round were marked as congested then the round
+ will be tagged as congested.
+
+ Setting threshold to 0 means that PLB repaths every RTT regardless
+ of congestion. This is not intended behavior for PLB and should be
+ used only for experimentation purpose.
+
+ Possible Values: 0 - 256
+
+ Default: 128
+
UDP variables
=============
@@ -1102,6 +1177,33 @@ udp_rmem_min - INTEGER
udp_wmem_min - INTEGER
UDP does not have tx memory accounting and this tunable has no effect.
+udp_hash_entries - INTEGER
+ Show the number of hash buckets for UDP sockets in the current
+ networking namespace.
+
+ A negative value means the networking namespace does not own its
+ hash buckets and shares the initial networking namespace's one.
+
+udp_child_ehash_entries - INTEGER
+ Control the number of hash buckets for UDP sockets in the child
+ networking namespace, which must be set before clone() or unshare().
+
+ If the value is not 0, the kernel uses a value rounded up to 2^n
+ as the actual hash bucket size. 0 is a special value, meaning
+ the child networking namespace will share the initial networking
+ namespace's hash buckets.
+
+ Note that the child will use the global one in case the kernel
+ fails to allocate enough memory. In addition, the global hash
+ buckets are spread over available NUMA nodes, but the allocation
+ of the child hash table depends on the current process's NUMA
+ policy, which could result in performance differences.
+
+ Possible values: 0, 2^n (n: 7 (128) - 16 (64K))
+
+ Default: 0
+
+
RAW variables
=============
@@ -3025,6 +3127,15 @@ ecn_enable - BOOLEAN
Default: 1
+l3mdev_accept - BOOLEAN
+ Enabling this option allows a "global" bound socket to work
+ across L3 master domains (e.g., VRFs) with packets capable of
+ being received regardless of the L3 domain in which they
+ originated. Only valid when the kernel was compiled with
+ CONFIG_NET_L3_MASTER_DEV.
+
+ Default: 1 (enabled)
+
``/proc/sys/net/core/*``
========================