summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/00-INDEX2
-rw-r--r--Documentation/feature-removal-schedule.txt38
-rw-r--r--Documentation/ftape.txt307
-rw-r--r--Documentation/kernel-parameters.txt3
-rw-r--r--Documentation/networking/dccp.txt84
-rw-r--r--Documentation/networking/e1000.txt451
-rw-r--r--Documentation/networking/ip-sysctl.txt347
-rw-r--r--Documentation/networking/phy.txt13
-rw-r--r--Documentation/networking/udplite.txt281
-rw-r--r--Documentation/networking/xfrm_sync.txt5
-rw-r--r--Documentation/s390/CommonIO4
-rw-r--r--Documentation/s390/Debugging390.txt38
-rw-r--r--Documentation/s390/cds.txt12
-rw-r--r--Documentation/s390/crypto/crypto-API.txt6
-rw-r--r--Documentation/s390/s390dbf.txt8
15 files changed, 848 insertions, 751 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index 02457ec9c94f..f08ca9535733 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -104,8 +104,6 @@ firmware_class/
- request_firmware() hotplug interface info.
floppy.txt
- notes and driver options for the floppy disk driver.
-ftape.txt
- - notes about the floppy tape device driver.
hayes-esp.txt
- info on using the Hayes ESP serial driver.
highuid.txt
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index d52c4aaaf17f..226ecf2ffd56 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -70,18 +70,6 @@ Who: Dominik Brodowski <linux@brodo.de>
---------------------------
-What: ip_queue and ip6_queue (old ipv4-only and ipv6-only netfilter queue)
-When: December 2005
-Why: This interface has been obsoleted by the new layer3-independent
- "nfnetlink_queue". The Kernel interface is compatible, so the old
- ip[6]tables "QUEUE" targets still work and will transparently handle
- all packets into nfnetlink queue number 0. Userspace users will have
- to link against API-compatible library on top of libnfnetlink_queue
- instead of the current 'libipq'.
-Who: Harald Welte <laforge@netfilter.org>
-
----------------------------
-
What: remove EXPORT_SYMBOL(kernel_thread)
When: August 2006
Files: arch/*/kernel/*_ksyms.c
@@ -227,21 +215,6 @@ Who: Patrick McHardy <kaber@trash.net>
---------------------------
-What: frame diverter
-When: November 2006
-Why: The frame diverter is included in most distribution kernels, but is
- broken. It does not correctly handle many things:
- - IPV6
- - non-linear skb's
- - network device RCU on removal
- - input frames not correctly checked for protocol errors
- It also adds allocation overhead even if not enabled.
- It is not clear if anyone is still using it.
-Who: Stephen Hemminger <shemminger@osdl.org>
-
----------------------------
-
-
What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment
When: October 2008
Why: The stacking of class devices makes these values misleading and
@@ -261,10 +234,11 @@ Who: Jean Delvare <khali@linux-fr.org>
---------------------------
-What: ftape
-When: 2.6.20
-Why: Orphaned for ages. SMP bugs long unfixed. Few users left
- in the world.
-Who: Jeff Garzik <jeff@garzik.org>
+What: IPv4 only connection tracking/NAT/helpers
+When: 2.6.22
+Why: The new layer 3 independant connection tracking replaces the old
+ IPv4 only version. After some stabilization of the new code the
+ old one will be removed.
+Who: Patrick McHardy <kaber@trash.net>
---------------------------
diff --git a/Documentation/ftape.txt b/Documentation/ftape.txt
deleted file mode 100644
index 7d8bb3384031..000000000000
--- a/Documentation/ftape.txt
+++ /dev/null
@@ -1,307 +0,0 @@
-Intro
-=====
-
-This file describes some issues involved when using the "ftape"
-floppy tape device driver that comes with the Linux kernel.
-
-ftape has a home page at
-
-http://ftape.dot-heine.de/
-
-which contains further information about ftape. Please cross check
-this WWW address against the address given (if any) in the MAINTAINERS
-file located in the top level directory of the Linux kernel source
-tree.
-
-NOTE: This is an unmaintained set of drivers, and it is not guaranteed to work.
-If you are interested in taking over maintenance, contact Claus-Justus Heine
-<ch@dot-heine.de>, the former maintainer.
-
-Contents
-========
-
-A minus 1: Ftape documentation
-
-A. Changes
- 1. Goal
- 2. I/O Block Size
- 3. Write Access when not at EOD (End Of Data) or BOT (Begin Of Tape)
- 4. Formatting
- 5. Interchanging cartridges with other operating systems
-
-B. Debugging Output
- 1. Introduction
- 2. Tuning the debugging output
-
-C. Boot and load time configuration
- 1. Setting boot time parameters
- 2. Module load time parameters
- 3. Ftape boot- and load time options
- 4. Example kernel parameter setting
- 5. Example module parameter setting
-
-D. Support and contacts
-
-*******************************************************************************
-
-A minus 1. Ftape documentation
-==============================
-
-Unluckily, the ftape-HOWTO is out of date. This really needs to be
-changed. Up to date documentation as well as recent development
-versions of ftape and useful links to related topics can be found at
-the ftape home page at
-
-http://ftape.dot-heine.de/
-
-*******************************************************************************
-
-A. Changes
-==========
-
-1. Goal
- ~~~~
- The goal of all that incompatibilities was to give ftape an interface
- that resembles the interface provided by SCSI tape drives as close
- as possible. Thus any Unix backup program that is known to work
- with SCSI tape drives should also work.
-
- The concept of a fixed block size for read/write transfers is
- rather unrelated to this SCSI tape compatibility at the file system
- interface level. It developed out of a feature of zftape, a
- block wise user transparent on-the-fly compression. That compression
- support will not be dropped in future releases for compatibility
- reasons with previous releases of zftape.
-
-2. I/O Block Size
- ~~~~~~~~~~~~~~
- The block size defaults to 10k which is the default block size of
- GNU tar.
-
- The block size can be tuned either during kernel configuration or
- at runtime with the MTIOCTOP ioctl using the MTSETBLK operation
- (i.e. do "mt -f /dev/qft0" setblk #BLKSZ). A block size of 0
- switches to variable block size mode i.e. "mt setblk 0" switches
- off the block size restriction. However, this disables zftape's
- built in on-the-fly compression which doesn't work with variable
- block size mode.
-
- The BLKSZ parameter must be given as a byte count and must be a
- multiple of 32k or 0, i.e. use "mt setblk 32768" to switch to a
- block size of 32k.
-
- The typical symptom of a block size mismatch is an "invalid
- argument" error message.
-
-3. Write Access when not at EOD (End Of Data) or BOT (Begin Of Tape)
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- zftape (the file system interface of ftape-3.x) denies write access
- to the tape cartridge when it isn't positioned either at BOT or
- EOD.
-
-4. Formatting
- ~~~~~~~~~~
- ftape DOES support formatting of floppy tape cartridges. You need the
- `ftformat' program that is shipped with the modules version of ftape.
- Please get the latest version of ftape from
-
- ftp://sunsite.unc.edu/pub/Linux/kernel/tapes
-
- or from the ftape home page at
-
- http://ftape.dot-heine.de/
-
- `ftformat' is contained in the `./contrib/' subdirectory of that
- separate ftape package.
-
-5. Interchanging cartridges with other operating systems
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
- The internal emulation of Unix tape device file marks has changed
- completely. ftape now uses the volume table segment as specified
- by the QIC-40/80/3010/3020/113 standards to emulate file marks. As
- a consequence there is limited support to interchange cartridges
- with other operating systems.
-
- To be more precise: ftape will detect volumes written by other OS's
- programs and other OS's programs will detect volumes written by
- ftape.
-
- However, it isn't possible to extract the data dumped to the tape
- by some MSDOS program with ftape. This exceeds the scope of a
- kernel device driver. If you need such functionality, then go ahead
- and write a user space utility that is able to do that. ftape already
- provides all kernel level support necessary to do that.
-
-*******************************************************************************
-
-B. Debugging Output
- ================
-
-1. Introduction
- ~~~~~~~~~~~~
- The ftape driver can be very noisy in that is can print lots of
- debugging messages to the kernel log files and the system console.
- While this is useful for debugging it might be annoying during
- normal use and enlarges the size of the driver by several kilobytes.
-
- To reduce the size of the driver you can trim the maximal amount of
- debugging information available during kernel configuration. Please
- refer to the kernel configuration script and its on-line help
- functionality.
-
- The amount of debugging output maps to the "tracing" boot time
- option and the "ft_tracing" modules option as follows:
-
- 0 bugs
- 1 + errors (with call-stack dump)
- 2 + warnings
- 3 + information
- 4 + more information
- 5 + program flow
- 6 + fdc/dma info
- 7 + data flow
- 8 + everything else
-
-2. Tuning the debugging output
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
- To reduce the amount of debugging output printed to the system
- console you can
-
- i) trim the debugging output at run-time with
-
- mt -f /dev/nqft0 setdensity #DBGLVL
-
- where "#DBGLVL" is a number between 0 and 9
-
- ii) trim the debugging output at module load time with
-
- modprobe ftape ft_tracing=#DBGLVL
-
- Of course, this applies only if you have configured ftape to be
- compiled as a module.
-
- iii) trim the debugging output during system boot time. Add the
- following to the kernel command line:
-
- ftape=#DBGLVL,tracing
-
- Please refer also to the next section if you don't know how to
- set boot time parameters.
-
-*******************************************************************************
-
-C. Boot and load time configuration
- ================================
-
-1. Setting boot time parameters
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Assuming that you use lilo, the LI)nux LO)ader, boot time kernel
- parameters can be set by adding a line
-
- append some_kernel_boot_time_parameter
-
- to `/etc/lilo.conf' or at real boot time by typing in the options
- at the prompt provided by LILO. I can't give you advice on how to
- specify those parameters with other loaders as I don't use them.
-
- For ftape, each "some_kernel_boot_time_parameter" looks like
- "ftape=value,option". As an example, the debugging output can be
- increased with
-
- ftape=4,tracing
-
- NOTE: the value precedes the option name.
-
-2. Module load time parameters
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Module parameters can be specified either directly when invoking
- the program 'modprobe' at the shell prompt:
-
- modprobe ftape ft_tracing=4
-
- or by editing the file `/etc/modprobe.conf' in which case they take
- effect each time when the module is loaded with `modprobe' (please
- refer to the respective manual pages). Thus, you should add a line
-
- options ftape ft_tracing=4
-
- to `/etc/modprobe.conf` if you intend to increase the debugging
- output of the driver.
-
-
-3. Ftape boot- and load time options
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
- i. Controlling the amount of debugging output
- DBGLVL has to be replaced by a number between 0 and 8.
-
- module | kernel command line
- -----------------------|----------------------
- ft_tracing=DBGLVL | ftape=DBGLVL,tracing
-
- ii. Hardware setup
- BASE is the base address of your floppy disk controller,
- IRQ and DMA give its interrupt and DMA channel, respectively.
- BOOL is an integer, "0" means "no"; any other value means
- "yes". You don't need to specify anything if connecting your tape
- drive to the standard floppy disk controller. All of these
- values have reasonable defaults. The defaults can be modified
- during kernel configuration, i.e. while running "make config",
- "make menuconfig" or "make xconfig" in the top level directory
- of the Linux kernel source tree. Please refer also to the on
- line documentation provided during that kernel configuration
- process.
-
- ft_probe_fc10 is set to a non-zero value if you wish for ftape to
- probe for a Colorado FC-10 or FC-20 controller.
-
- ft_mach2 is set to a non-zero value if you wish for ftape to probe
- for a Mountain MACH-2 controller.
-
- module | kernel command line
- -----------------------|----------------------
- ft_fdc_base=BASE | ftape=BASE,ioport
- ft_fdc_irq=IRQ | ftape=IRQ,irq
- ft_fdc_dma=DMA | ftape=DMA,dma
- ft_probe_fc10=BOOL | ftape=BOOL,fc10
- ft_mach2=BOOL | ftape=BOOL,mach2
- ft_fdc_threshold=THR | ftape=THR,threshold
- ft_fdc_rate_limit=RATE | ftape=RATE,datarate
-
-4. Example kernel parameter setting
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- To configure ftape to probe for a Colorado FC-10/FC-20 controller
- and to increase the amount of debugging output a little bit, add
- the following line to `/etc/lilo.conf':
-
- append ftape=1,fc10 ftape=4,tracing
-
-5. Example module parameter setting
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- To do the same, but with ftape compiled as a loadable kernel
- module, add the following line to `/etc/modprobe.conf':
-
- options ftape ft_probe_fc10=1 ft_tracing=4
-
-*******************************************************************************
-
-D. Support and contacts
- ====================
-
- Ftape is distributed under the GNU General Public License. There is
- absolutely no warranty for this software. However, you can reach
- the current maintainer of the ftape package under the email address
- given in the MAINTAINERS file which is located in the top level
- directory of the Linux kernel source tree. There you'll find also
- the relevant mailing list to use as a discussion forum and the web
- page to query for the most recent documentation, related work and
- development versions of ftape.
-
- Changelog:
- ==========
-
-~1996: Original Document
-
-10-24-2004: General cleanup and updating, noting additional module options.
- James Nelson <james4765@gmail.com>
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 67473849f20e..15e4fed127f6 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -557,9 +557,6 @@ and is between 256 and 4096 characters. It is defined in the file
floppy= [HW]
See Documentation/floppy.txt.
- ftape= [HW] Floppy Tape subsystem debugging options.
- See Documentation/ftape.txt.
-
gamecon.map[2|3]=
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
support via parallel port (up to 5 devices per port)
diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt
index 74563b38ffd9..dda15886bcb5 100644
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.txt
@@ -19,21 +19,17 @@ for real time and multimedia traffic.
It has a base protocol and pluggable congestion control IDs (CCIDs).
-It is at draft RFC status and the homepage for DCCP as a protocol is at:
- http://www.icir.org/kohler/dcp/
+It is at experimental RFC status and the homepage for DCCP as a protocol is at:
+ http://www.read.cs.ucla.edu/dccp/
Missing features
================
The DCCP implementation does not currently have all the features that are in
-the draft RFC.
+the RFC.
-In particular the following are missing:
-- CCID2 support
-- feature negotiation
-
-When testing against other implementations it appears that elapsed time
-options are not coded compliant to the specification.
+The known bugs are at:
+ http://linux-net.osdl.org/index.php/TODO#DCCP
Socket options
==============
@@ -47,12 +43,70 @@ the socket will fall back to 0 (which means that no meaningful service code
is present). Connecting sockets set at most one service option; for
listening sockets, multiple service codes can be specified.
+DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the
+partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums
+always cover the entire packet and that only fully covered application data is
+accepted by the receiver. Hence, when using this feature on the sender, it must
+be enabled at the receiver, too with suitable choice of CsCov.
+
+DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the
+ range 0..15 are acceptable. The default setting is 0 (full coverage),
+ values between 1..15 indicate partial coverage.
+DCCP_SOCKOPT_SEND_CSCOV is for the receiver and has a different meaning: it
+ sets a threshold, where again values 0..15 are acceptable. The default
+ of 0 means that all packets with a partial coverage will be discarded.
+ Values in the range 1..15 indicate that packets with minimally such a
+ coverage value are also acceptable. The higher the number, the more
+ restrictive this setting (see [RFC 4340, sec. 9.2.1]).
+
+Sysctl variables
+================
+Several DCCP default parameters can be managed by the following sysctls
+(sysctl net.dccp.default or /proc/sys/net/dccp/default):
+
+request_retries
+ The number of active connection initiation retries (the number of
+ Requests minus one) before timing out. In addition, it also governs
+ the behaviour of the other, passive side: this variable also sets
+ the number of times DCCP repeats sending a Response when the initial
+ handshake does not progress from RESPOND to OPEN (i.e. when no Ack
+ is received after the initial Request). This value should be greater
+ than 0, suggested is less than 10. Analogue of tcp_syn_retries.
+
+retries1
+ How often a DCCP Response is retransmitted until the listening DCCP
+ side considers its connecting peer dead. Analogue of tcp_retries1.
+
+retries2
+ The number of times a general DCCP packet is retransmitted. This has
+ importance for retransmitted acknowledgments and feature negotiation,
+ data packets are never retransmitted. Analogue of tcp_retries2.
+
+send_ndp = 1
+ Whether or not to send NDP count options (sec. 7.7.2).
+
+send_ackvec = 1
+ Whether or not to send Ack Vector options (sec. 11.5).
+
+ack_ratio = 2
+ The default Ack Ratio (sec. 11.3) to use.
+
+tx_ccid = 2
+ Default CCID for the sender-receiver half-connection.
+
+rx_ccid = 2
+ Default CCID for the receiver-sender half-connection.
+
+seq_window = 100
+ The initial sequence window (sec. 7.5.2).
+
+tx_qlen = 5
+ The size of the transmit buffer in packets. A value of 0 corresponds
+ to an unbounded transmit buffer.
+
Notes
=====
-SELinux does not yet have support for DCCP. You will need to turn it off or
-else you will get EACCES.
-
-DCCP does not travel through NAT successfully at present. This is because
-the checksum covers the psuedo-header as per TCP and UDP. It should be
-relatively trivial to add Linux NAT support for DCCP.
+DCCP does not travel through NAT successfully at present on many boxes. This is
+because the checksum covers the psuedo-header as per TCP and UDP. Linux NAT
+support for DCCP has been added.
diff --git a/Documentation/networking/e1000.txt b/Documentation/networking/e1000.txt
index 5c0a5cc03998..61b171cf5313 100644
--- a/Documentation/networking/e1000.txt
+++ b/Documentation/networking/e1000.txt
@@ -1,7 +1,7 @@
Linux* Base Driver for the Intel(R) PRO/1000 Family of Adapters
===============================================================
-November 15, 2005
+September 26, 2006
Contents
@@ -9,6 +9,7 @@ Contents
- In This Release
- Identifying Your Adapter
+- Building and Installation
- Command Line Parameters
- Speed and Duplex Configuration
- Additional Configurations
@@ -41,6 +42,9 @@ or later), lspci, and ifconfig to obtain the same information.
Instructions on updating ethtool can be found in the section "Additional
Configurations" later in this document.
+NOTE: The Intel(R) 82562v 10/100 Network Connection only provides 10/100
+support.
+
Identifying Your Adapter
========================
@@ -51,28 +55,27 @@ Driver ID Guide at:
http://support.intel.com/support/network/adapter/pro100/21397.htm
For the latest Intel network drivers for Linux, refer to the following
-website. In the search field, enter your adapter name or type, or use the
+website. In the search field, enter your adapter name or type, or use the
networking link on the left to search for your adapter:
http://downloadfinder.intel.com/scripts-df/support_intel.asp
-Command Line Parameters =======================
+Command Line Parameters
+=======================
If the driver is built as a module, the following optional parameters
-are used by entering them on the command line with the modprobe or insmod
-command using this syntax:
+are used by entering them on the command line with the modprobe command
+using this syntax:
modprobe e1000 [<option>=<VAL1>,<VAL2>,...]
- insmod e1000 [<option>=<VAL1>,<VAL2>,...]
-
For example, with two PRO/1000 PCI adapters, entering:
- insmod e1000 TxDescriptors=80,128
+ modprobe e1000 TxDescriptors=80,128
-loads the e1000 driver with 80 TX descriptors for the first adapter and 128
-TX descriptors for the second adapter.
+loads the e1000 driver with 80 TX descriptors for the first adapter and
+128 TX descriptors for the second adapter.
The default value for each parameter is generally the recommended setting,
unless otherwise noted.
@@ -87,7 +90,7 @@ NOTES: For more information about the AutoNeg, Duplex, and Speed
http://www.intel.com/design/network/applnots/ap450.htm
A descriptor describes a data buffer and attributes related to
- the data buffer. This information is accessed by the hardware.
+ the data buffer. This information is accessed by the hardware.
AutoNeg
@@ -96,9 +99,9 @@ AutoNeg
Valid Range: 0x01-0x0F, 0x20-0x2F
Default Value: 0x2F
-This parameter is a bit mask that specifies which speed and duplex
-settings the board advertises. When this parameter is used, the Speed
-and Duplex parameters must not be specified.
+This parameter is a bit-mask that specifies the speed and duplex settings
+advertised by the adapter. When this parameter is used, the Speed and
+Duplex parameters must not be specified.
NOTE: Refer to the Speed and Duplex section of this readme for more
information on the AutoNeg parameter.
@@ -110,14 +113,15 @@ Duplex
Valid Range: 0-2 (0=auto-negotiate, 1=half, 2=full)
Default Value: 0
-Defines the direction in which data is allowed to flow. Can be either
-one or two-directional. If both Duplex and the link partner are set to
-auto-negotiate, the board auto-detects the correct duplex. If the link
-partner is forced (either full or half), Duplex defaults to half-duplex.
+This defines the direction in which data is allowed to flow. Can be
+either one or two-directional. If both Duplex and the link partner are
+set to auto-negotiate, the board auto-detects the correct duplex. If the
+link partner is forced (either full or half), Duplex defaults to half-
+duplex.
FlowControl
-----------
+-----------
Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
Default Value: Reads flow control settings from the EEPROM
@@ -127,57 +131,107 @@ to Ethernet PAUSE frames.
InterruptThrottleRate
---------------------
-(not supported on Intel 82542, 82543 or 82544-based adapters)
-Valid Range: 100-100000 (0=off, 1=dynamic)
-Default Value: 8000
-
-This value represents the maximum number of interrupts per second the
-controller generates. InterruptThrottleRate is another setting used in
-interrupt moderation. Dynamic mode uses a heuristic algorithm to adjust
-InterruptThrottleRate based on the current traffic load.
+(not supported on Intel(R) 82542, 82543 or 82544-based adapters)
+Valid Range: 0,1,3,100-100000 (0=off, 1=dynamic, 3=dynamic conservative)
+Default Value: 3
+
+The driver can limit the amount of interrupts per second that the adapter
+will generate for incoming packets. It does this by writing a value to the
+adapter that is based on the maximum amount of interrupts that the adapter
+will generate per second.
+
+Setting InterruptThrottleRate to a value greater or equal to 100
+will program the adapter to send out a maximum of that many interrupts
+per second, even if more packets have come in. This reduces interrupt
+load on the system and can lower CPU utilization under heavy load,
+but will increase latency as packets are not processed as quickly.
+
+The default behaviour of the driver previously assumed a static
+InterruptThrottleRate value of 8000, providing a good fallback value for
+all traffic types,but lacking in small packet performance and latency.
+The hardware can handle many more small packets per second however, and
+for this reason an adaptive interrupt moderation algorithm was implemented.
+
+Since 7.3.x, the driver has two adaptive modes (setting 1 or 3) in which
+it dynamically adjusts the InterruptThrottleRate value based on the traffic
+that it receives. After determining the type of incoming traffic in the last
+timeframe, it will adjust the InterruptThrottleRate to an appropriate value
+for that traffic.
+
+The algorithm classifies the incoming traffic every interval into
+classes. Once the class is determined, the InterruptThrottleRate value is
+adjusted to suit that traffic type the best. There are three classes defined:
+"Bulk traffic", for large amounts of packets of normal size; "Low latency",
+for small amounts of traffic and/or a significant percentage of small
+packets; and "Lowest latency", for almost completely small packets or
+minimal traffic.
+
+In dynamic conservative mode, the InterruptThrottleRate value is set to 4000
+for traffic that falls in class "Bulk traffic". If traffic falls in the "Low
+latency" or "Lowest latency" class, the InterruptThrottleRate is increased
+stepwise to 20000. This default mode is suitable for most applications.
+
+For situations where low latency is vital such as cluster or
+grid computing, the algorithm can reduce latency even more when
+InterruptThrottleRate is set to mode 1. In this mode, which operates
+the same as mode 3, the InterruptThrottleRate will be increased stepwise to
+70000 for traffic in class "Lowest latency".
+
+Setting InterruptThrottleRate to 0 turns off any interrupt moderation
+and may improve small packet latency, but is generally not suitable
+for bulk throughput traffic.
NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and
- RxAbsIntDelay parameters. In other words, minimizing the receive
+ RxAbsIntDelay parameters. In other words, minimizing the receive
and/or transmit absolute delays does not force the controller to
generate more interrupts than what the Interrupt Throttle Rate
allows.
-CAUTION: If you are using the Intel PRO/1000 CT Network Connection
+CAUTION: If you are using the Intel(R) PRO/1000 CT Network Connection
(controller 82547), setting InterruptThrottleRate to a value
greater than 75,000, may hang (stop transmitting) adapters
- under certain network conditions. If this occurs a NETDEV
- WATCHDOG message is logged in the system event log. In
+ under certain network conditions. If this occurs a NETDEV
+ WATCHDOG message is logged in the system event log. In
addition, the controller is automatically reset, restoring
- the network connection. To eliminate the potential for the
+ the network connection. To eliminate the potential for the
hang, ensure that InterruptThrottleRate is set no greater
than 75,000 and is not set to 0.
NOTE: When e1000 is loaded with default settings and multiple adapters
are in use simultaneously, the CPU utilization may increase non-
- linearly. In order to limit the CPU utilization without impacting
+ linearly. In order to limit the CPU utilization without impacting
the overall throughput, we recommend that you load the driver as
follows:
- insmod e1000.o InterruptThrottleRate=3000,3000,3000
+ modprobe e1000 InterruptThrottleRate=3000,3000,3000
This sets the InterruptThrottleRate to 3000 interrupts/sec for
- the first, second, and third instances of the driver. The range
+ the first, second, and third instances of the driver. The range
of 2000 to 3000 interrupts per second works on a majority of
systems and is a good starting point, but the optimal value will
- be platform-specific. If CPU utilization is not a concern, use
+ be platform-specific. If CPU utilization is not a concern, use
RX_POLLING (NAPI) and default driver settings.
+
RxDescriptors
-------------
Valid Range: 80-256 for 82542 and 82543-based adapters
80-4096 for all other supported adapters
Default Value: 256
-This value specifies the number of receive descriptors allocated by the
-driver. Increasing this value allows the driver to buffer more incoming
-packets. Each descriptor is 16 bytes. A receive buffer is also
-allocated for each descriptor and is 2048.
+This value specifies the number of receive buffer descriptors allocated
+by the driver. Increasing this value allows the driver to buffer more
+incoming packets, at the expense of increased system memory utilization.
+
+Each descriptor is 16 bytes. A receive buffer is also allocated for each
+descriptor and can be either 2048, 4096, 8192, or 16384 bytes, depending
+on the MTU setting. The maximum MTU size is 16110.
+
+NOTE: MTU designates the frame size. It only needs to be set for Jumbo
+ Frames. Depending on the available system resources, the request
+ for a higher number of receive descriptors may be denied. In this
+ case, use a lower number.
RxIntDelay
@@ -187,17 +241,17 @@ Default Value: 0
This value delays the generation of receive interrupts in units of 1.024
microseconds. Receive interrupt reduction can improve CPU efficiency if
-properly tuned for specific network traffic. Increasing this value adds
+properly tuned for specific network traffic. Increasing this value adds
extra latency to frame reception and can end up decreasing the throughput
-of TCP traffic. If the system is reporting dropped receives, this value
+of TCP traffic. If the system is reporting dropped receives, this value
may be set too high, causing the driver to run out of available receive
descriptors.
CAUTION: When setting RxIntDelay to a value other than 0, adapters may
- hang (stop transmitting) under certain network conditions. If
+ hang (stop transmitting) under certain network conditions. If
this occurs a NETDEV WATCHDOG message is logged in the system
- event log. In addition, the controller is automatically reset,
- restoring the network connection. To eliminate the potential
+ event log. In addition, the controller is automatically reset,
+ restoring the network connection. To eliminate the potential
for the hang ensure that RxIntDelay is set to 0.
@@ -208,7 +262,7 @@ Valid Range: 0-65535 (0=off)
Default Value: 128
This value, in units of 1.024 microseconds, limits the delay in which a
-receive interrupt is generated. Useful only if RxIntDelay is non-zero,
+receive interrupt is generated. Useful only if RxIntDelay is non-zero,
this value ensures that an interrupt is generated after the initial
packet is received within the set amount of time. Proper tuning,
along with RxIntDelay, may improve traffic throughput in specific network
@@ -222,9 +276,9 @@ Valid Settings: 0, 10, 100, 1000
Default Value: 0 (auto-negotiate at all supported speeds)
Speed forces the line speed to the specified value in megabits per second
-(Mbps). If this parameter is not specified or is set to 0 and the link
+(Mbps). If this parameter is not specified or is set to 0 and the link
partner is set to auto-negotiate, the board will auto-detect the correct
-speed. Duplex should also be set when Speed is set to either 10 or 100.
+speed. Duplex should also be set when Speed is set to either 10 or 100.
TxDescriptors
@@ -234,7 +288,7 @@ Valid Range: 80-256 for 82542 and 82543-based adapters
Default Value: 256
This value is the number of transmit descriptors allocated by the driver.
-Increasing this value allows the driver to queue more transmits. Each
+Increasing this value allows the driver to queue more transmits. Each
descriptor is 16 bytes.
NOTE: Depending on the available system resources, the request for a
@@ -248,8 +302,8 @@ Valid Range: 0-65535 (0=off)
Default Value: 64
This value delays the generation of transmit interrupts in units of
-1.024 microseconds. Transmit interrupt reduction can improve CPU
-efficiency if properly tuned for specific network traffic. If the
+1.024 microseconds. Transmit interrupt reduction can improve CPU
+efficiency if properly tuned for specific network traffic. If the
system is reporting dropped transmits, this value may be set too high
causing the driver to run out of available transmit descriptors.
@@ -261,7 +315,7 @@ Valid Range: 0-65535 (0=off)
Default Value: 64
This value, in units of 1.024 microseconds, limits the delay in which a
-transmit interrupt is generated. Useful only if TxIntDelay is non-zero,
+transmit interrupt is generated. Useful only if TxIntDelay is non-zero,
this value ensures that an interrupt is generated after the initial
packet is sent on the wire within the set amount of time. Proper tuning,
along with TxIntDelay, may improve traffic throughput in specific
@@ -288,15 +342,15 @@ fiber interface board only links at 1000 Mbps full-duplex.
For copper-based boards, the keywords interact as follows:
- The default operation is auto-negotiate. The board advertises all
+ The default operation is auto-negotiate. The board advertises all
supported speed and duplex combinations, and it links at the highest
common speed and duplex mode IF the link partner is set to auto-negotiate.
If Speed = 1000, limited auto-negotiation is enabled and only 1000 Mbps
is advertised (The 1000BaseT spec requires auto-negotiation.)
- If Speed = 10 or 100, then both Speed and Duplex should be set. Auto-
- negotiation is disabled, and the AutoNeg parameter is ignored. Partner
+ If Speed = 10 or 100, then both Speed and Duplex should be set. Auto-
+ negotiation is disabled, and the AutoNeg parameter is ignored. Partner
SHOULD also be forced.
The AutoNeg parameter is used when more control is required over the
@@ -304,7 +358,7 @@ auto-negotiation process. It should be used when you wish to control which
speed and duplex combinations are advertised during the auto-negotiation
process.
-The parameter may be specified as either a decimal or hexidecimal value as
+The parameter may be specified as either a decimal or hexadecimal value as
determined by the bitmap below.
Bit position 7 6 5 4 3 2 1 0
@@ -337,20 +391,19 @@ Additional Configurations
Configuring the Driver on Different Distributions
-------------------------------------------------
-
Configuring a network driver to load properly when the system is started
- is distribution dependent. Typically, the configuration process involves
+ is distribution dependent. Typically, the configuration process involves
adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well
- as editing other system startup scripts and/or configuration files. Many
+ as editing other system startup scripts and/or configuration files. Many
popular Linux distributions ship with tools to make these changes for you.
To learn the proper way to configure a network device for your system,
- refer to your distribution documentation. If during this process you are
+ refer to your distribution documentation. If during this process you are
asked for the driver or module name, the name for the Linux Base Driver
- for the Intel PRO/1000 Family of Adapters is e1000.
+ for the Intel(R) PRO/1000 Family of Adapters is e1000.
As an example, if you install the e1000 driver for two PRO/1000 adapters
(eth0 and eth1) and set the speed and duplex to 10full and 100half, add
- the following to modules.conf or modprobe.conf:
+ the following to modules.conf or or modprobe.conf:
alias eth0 e1000
alias eth1 e1000
@@ -358,9 +411,8 @@ Additional Configurations
Viewing Link Messages
---------------------
-
Link messages will not be displayed to the console if the distribution is
- restricting system messages. In order to see network driver link messages
+ restricting system messages. In order to see network driver link messages
on your console, set dmesg to eight by entering the following:
dmesg -n 8
@@ -369,11 +421,9 @@ Additional Configurations
Jumbo Frames
------------
-
- The driver supports Jumbo Frames for all adapters except 82542 and
- 82573-based adapters. Jumbo Frames support is enabled by changing the
- MTU to a value larger than the default of 1500. Use the ifconfig command
- to increase the MTU size. For example:
+ Jumbo Frames support is enabled by changing the MTU to a value larger than
+ the default of 1500. Use the ifconfig command to increase the MTU size.
+ For example:
ifconfig eth<x> mtu 9000 up
@@ -390,26 +440,49 @@ Additional Configurations
- To enable Jumbo Frames, increase the MTU size on the interface beyond
1500.
- - The maximum MTU setting for Jumbo Frames is 16110. This value coincides
+
+ - The maximum MTU setting for Jumbo Frames is 16110. This value coincides
with the maximum Jumbo Frames size of 16128.
+
- Using Jumbo Frames at 10 or 100 Mbps may result in poor performance or
loss of link.
+
- Some Intel gigabit adapters that support Jumbo Frames have a frame size
limit of 9238 bytes, with a corresponding MTU size limit of 9216 bytes.
- The adapters with this limitation are based on the Intel 82571EB and
- 82572EI controllers, which correspond to these product names:
- Intel® PRO/1000 PT Dual Port Server Adapter
- Intel® PRO/1000 PF Dual Port Server Adapter
- Intel® PRO/1000 PT Server Adapter
- Intel® PRO/1000 PT Desktop Adapter
- Intel® PRO/1000 PF Server Adapter
-
- - The Intel PRO/1000 PM Network Connection does not support jumbo frames.
+ The adapters with this limitation are based on the Intel(R) 82571EB,
+ 82572EI, 82573L and 80003ES2LAN controller. These correspond to the
+ following product names:
+ Intel(R) PRO/1000 PT Server Adapter
+ Intel(R) PRO/1000 PT Desktop Adapter
+ Intel(R) PRO/1000 PT Network Connection
+ Intel(R) PRO/1000 PT Dual Port Server Adapter
+ Intel(R) PRO/1000 PT Dual Port Network Connection
+ Intel(R) PRO/1000 PF Server Adapter
+ Intel(R) PRO/1000 PF Network Connection
+ Intel(R) PRO/1000 PF Dual Port Server Adapter
+ Intel(R) PRO/1000 PB Server Connection
+ Intel(R) PRO/1000 PL Network Connection
+ Intel(R) PRO/1000 EB Network Connection with I/O Acceleration
+ Intel(R) PRO/1000 EB Backplane Connection with I/O Acceleration
+ Intel(R) PRO/1000 PT Quad Port Server Adapter
+
+ - Adapters based on the Intel(R) 82542 and 82573V/E controller do not
+ support Jumbo Frames. These correspond to the following product names:
+ Intel(R) PRO/1000 Gigabit Server Adapter
+ Intel(R) PRO/1000 PM Network Connection
+
+ - The following adapters do not support Jumbo Frames:
+ Intel(R) 82562V 10/100 Network Connection
+ Intel(R) 82566DM Gigabit Network Connection
+ Intel(R) 82566DC Gigabit Network Connection
+ Intel(R) 82566MM Gigabit Network Connection
+ Intel(R) 82566MC Gigabit Network Connection
+ Intel(R) 82562GT 10/100 Network Connection
+ Intel(R) 82562G 10/100 Network Connection
Ethtool
-------
-
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. Ethtool
version 1.6 or later is required for this functionality.
@@ -417,15 +490,14 @@ Additional Configurations
The latest release of ethtool can be found from
http://sourceforge.net/projects/gkernel.
- NOTE: Ethtool 1.6 only supports a limited set of ethtool options. Support
+ NOTE: Ethtool 1.6 only supports a limited set of ethtool options. Support
for a more complete ethtool feature set can be enabled by upgrading
ethtool to ethtool-1.8.1.
Enabling Wake on LAN* (WoL)
---------------------------
-
- WoL is configured through the Ethtool* utility. Ethtool is included with
- all versions of Red Hat after Red Hat 7.2. For other Linux distributions,
+ WoL is configured through the Ethtool* utility. Ethtool is included with
+ all versions of Red Hat after Red Hat 7.2. For other Linux distributions,
download and install Ethtool from the following website:
http://sourceforge.net/projects/gkernel.
@@ -436,11 +508,17 @@ Additional Configurations
For this driver version, in order to enable WoL, the e1000 driver must be
loaded when shutting down or rebooting the system.
+ Wake On LAN is only supported on port A for the following devices:
+ Intel(R) PRO/1000 PT Dual Port Network Connection
+ Intel(R) PRO/1000 PT Dual Port Server Connection
+ Intel(R) PRO/1000 PT Dual Port Server Adapter
+ Intel(R) PRO/1000 PF Dual Port Server Adapter
+ Intel(R) PRO/1000 PT Quad Port Server Adapter
+
NAPI
----
-
- NAPI (Rx polling mode) is supported in the e1000 driver. NAPI is enabled
- or disabled based on the configuration of the kernel. To override
+ NAPI (Rx polling mode) is supported in the e1000 driver. NAPI is enabled
+ or disabled based on the configuration of the kernel. To override
the default, use the following compile-time flags.
To enable NAPI, compile the driver module, passing in a configuration option:
@@ -457,88 +535,105 @@ Additional Configurations
Known Issues
============
- Jumbo Frames System Requirement
- -------------------------------
-
- Memory allocation failures have been observed on Linux systems with 64 MB
- of RAM or less that are running Jumbo Frames. If you are using Jumbo
- Frames, your system may require more than the advertised minimum
- requirement of 64 MB of system memory.
-
- Performance Degradation with Jumbo Frames
- -----------------------------------------
-
- Degradation in throughput performance may be observed in some Jumbo frames
- environments. If this is observed, increasing the application's socket
- buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values
- may help. See the specific application manual and
- /usr/src/linux*/Documentation/
- networking/ip-sysctl.txt for more details.
-
- Jumbo frames on Foundry BigIron 8000 switch
- -------------------------------------------
- There is a known issue using Jumbo frames when connected to a Foundry
- BigIron 8000 switch. This is a 3rd party limitation. If you experience
- loss of packets, lower the MTU size.
-
- Multiple Interfaces on Same Ethernet Broadcast Network
- ------------------------------------------------------
-
- Due to the default ARP behavior on Linux, it is not possible to have
- one system on two IP networks in the same Ethernet broadcast domain
- (non-partitioned switch) behave as expected. All Ethernet interfaces
- will respond to IP traffic for any IP address assigned to the system.
- This results in unbalanced receive traffic.
-
- If you have multiple interfaces in a server, either turn on ARP
- filtering by entering:
-
- echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
- (this only works if your kernel's version is higher than 2.4.5),
-
- NOTE: This setting is not saved across reboots. The configuration
- change can be made permanent by adding the line:
- net.ipv4.conf.all.arp_filter = 1
- to the file /etc/sysctl.conf
-
- or,
-
- install the interfaces in separate broadcast domains (either in
- different switches or in a switch partitioned to VLANs).
-
- 82541/82547 can't link or are slow to link with some link partners
- -----------------------------------------------------------------
-
- There is a known compatibility issue with 82541/82547 and some
- low-end switches where the link will not be established, or will
- be slow to establish. In particular, these switches are known to
- be incompatible with 82541/82547:
-
- Planex FXG-08TE
- I-O Data ETG-SH8
-
- To workaround this issue, the driver can be compiled with an override
- of the PHY's master/slave setting. Forcing master or forcing slave
- mode will improve time-to-link.
-
- # make EXTRA_CFLAGS=-DE1000_MASTER_SLAVE=<n>
-
- Where <n> is:
-
- 0 = Hardware default
- 1 = Master mode
- 2 = Slave mode
- 3 = Auto master/slave
-
- Disable rx flow control with ethtool
- ------------------------------------
-
- In order to disable receive flow control using ethtool, you must turn
- off auto-negotiation on the same command line.
-
- For example:
-
- ethtool -A eth? autoneg off rx off
+Dropped Receive Packets on Half-duplex 10/100 Networks
+------------------------------------------------------
+If you have an Intel PCI Express adapter running at 10mbps or 100mbps, half-
+duplex, you may observe occasional dropped receive packets. There are no
+workarounds for this problem in this network configuration. The network must
+be updated to operate in full-duplex, and/or 1000mbps only.
+
+Jumbo Frames System Requirement
+-------------------------------
+Memory allocation failures have been observed on Linux systems with 64 MB
+of RAM or less that are running Jumbo Frames. If you are using Jumbo
+Frames, your system may require more than the advertised minimum
+requirement of 64 MB of system memory.
+
+Performance Degradation with Jumbo Frames
+-----------------------------------------
+Degradation in throughput performance may be observed in some Jumbo frames
+environments. If this is observed, increasing the application's socket
+buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values
+may help. See the specific application manual and
+/usr/src/linux*/Documentation/
+networking/ip-sysctl.txt for more details.
+
+Jumbo Frames on Foundry BigIron 8000 switch
+-------------------------------------------
+There is a known issue using Jumbo frames when connected to a Foundry
+BigIron 8000 switch. This is a 3rd party limitation. If you experience
+loss of packets, lower the MTU size.
+
+Allocating Rx Buffers when Using Jumbo Frames
+---------------------------------------------
+Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if
+the available memory is heavily fragmented. This issue may be seen with PCI-X
+adapters or with packet split disabled. This can be reduced or eliminated
+by changing the amount of available memory for receive buffer allocation, by
+increasing /proc/sys/vm/min_free_kbytes.
+
+Multiple Interfaces on Same Ethernet Broadcast Network
+------------------------------------------------------
+Due to the default ARP behavior on Linux, it is not possible to have
+one system on two IP networks in the same Ethernet broadcast domain
+(non-partitioned switch) behave as expected. All Ethernet interfaces
+will respond to IP traffic for any IP address assigned to the system.
+This results in unbalanced receive traffic.
+
+If you have multiple interfaces in a server, either turn on ARP
+filtering by entering:
+
+ echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
+(this only works if your kernel's version is higher than 2.4.5),
+
+NOTE: This setting is not saved across reboots. The configuration
+change can be made permanent by adding the line:
+ net.ipv4.conf.all.arp_filter = 1
+to the file /etc/sysctl.conf
+
+ or,
+
+install the interfaces in separate broadcast domains (either in
+different switches or in a switch partitioned to VLANs).
+
+82541/82547 can't link or are slow to link with some link partners
+-----------------------------------------------------------------
+There is a known compatibility issue with 82541/82547 and some
+low-end switches where the link will not be established, or will
+be slow to establish. In particular, these switches are known to
+be incompatible with 82541/82547:
+
+ Planex FXG-08TE
+ I-O Data ETG-SH8
+
+To workaround this issue, the driver can be compiled with an override
+of the PHY's master/slave setting. Forcing master or forcing slave
+mode will improve time-to-link.
+
+ # make CFLAGS_EXTRA=-DE1000_MASTER_SLAVE=<n>
+
+Where <n> is:
+
+ 0 = Hardware default
+ 1 = Master mode
+ 2 = Slave mode
+ 3 = Auto master/slave
+
+Disable rx flow control with ethtool
+------------------------------------
+In order to disable receive flow control using ethtool, you must turn
+off auto-negotiation on the same command line.
+
+For example:
+
+ ethtool -A eth? autoneg off rx off
+
+Unplugging network cable while ethtool -p is running
+----------------------------------------------------
+In kernel versions 2.5.50 and later (including 2.6 kernel), unplugging
+the network cable while ethtool -p is running will cause the system to
+become unresponsive to keyboard commands, except for control-alt-delete.
+Restarting the system appears to be the only remedy.
Support
@@ -548,24 +643,10 @@ For general information, go to the Intel support website at:
http://support.intel.com
- or the Intel Wired Networking project hosted by Sourceforge at:
+or the Intel Wired Networking project hosted by Sourceforge at:
http://sourceforge.net/projects/e1000
If an issue is identified with the released source code on the supported
kernel with a supported adapter, email the specific information related
-to the issue to e1000-devel@lists.sourceforge.net
-
-
-License
-=======
-
-This software program is released under the terms of a license agreement
-between you ('Licensee') and Intel. Do not use or load this software or any
-associated materials (collectively, the 'Software') until you have carefully
-read the full terms and conditions of the file COPYING located in this software
-package. By loading or using the Software, you agree to the terms of this
-Agreement. If you do not agree with the terms of this Agreement, do not
-install or use the Software.
-
-* Other names and brands may be claimed as the property of others.
+to the issue to e1000-devel@lists.sf.net
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index fd3c0c012351..a0f6842368c3 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -101,6 +101,11 @@ inet_peer_gc_maxtime - INTEGER
TCP variables:
+somaxconn - INTEGER
+ Limit of socket listen() backlog, known in userspace as SOMAXCONN.
+ Defaults to 128. See also tcp_max_syn_backlog for additional tuning
+ for TCP sockets.
+
tcp_abc - INTEGER
Controls Appropriate Byte Count (ABC) defined in RFC3465.
ABC is a way of increasing congestion window (cwnd) more slowly
@@ -112,48 +117,51 @@ tcp_abc - INTEGER
of two segments to compensate for delayed acknowledgments.
Default: 0 (off)
-tcp_syn_retries - INTEGER
- Number of times initial SYNs for an active TCP connection attempt
- will be retransmitted. Should not be higher than 255. Default value
- is 5, which corresponds to ~180seconds.
+tcp_abort_on_overflow - BOOLEAN
+ If listening service is too slow to accept new connections,
+ reset them. Default state is FALSE. It means that if overflow
+ occurred due to a burst, connection will recover. Enable this
+ option _only_ if you are really sure that listening daemon
+ cannot be tuned to accept connections faster. Enabling this
+ option can harm clients of your server.
-tcp_synack_retries - INTEGER
- Number of times SYNACKs for a passive TCP connection attempt will
- be retransmitted. Should not be higher than 255. Default value
- is 5, which corresponds to ~180seconds.
+tcp_adv_win_scale - INTEGER
+ Count buffering overhead as bytes/2^tcp_adv_win_scale
+ (if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale),
+ if it is <= 0.
+ Default: 2
-tcp_keepalive_time - INTEGER
- How often TCP sends out keepalive messages when keepalive is enabled.
- Default: 2hours.
+tcp_allowed_congestion_control - STRING
+ Show/set the congestion control choices available to non-privileged
+ processes. The list is a subset of those listed in
+ tcp_available_congestion_control.
+ Default is "reno" and the default setting (tcp_congestion_control).
-tcp_keepalive_probes - INTEGER
- How many keepalive probes TCP sends out, until it decides that the
- connection is broken. Default value: 9.
+tcp_app_win - INTEGER
+ Reserve max(window/2^tcp_app_win, mss) of window for application
+ buffer. Value 0 is special, it means that nothing is reserved.
+ Default: 31
-tcp_keepalive_intvl - INTEGER
- How frequently the probes are send out. Multiplied by
- tcp_keepalive_probes it is time to kill not responding connection,
- after probes started. Default value: 75sec i.e. connection
- will be aborted after ~11 minutes of retries.
+tcp_available_congestion_control - STRING
+ Shows the available congestion control choices that are registered.
+ More congestion control algorithms may be available as modules,
+ but not loaded.
-tcp_retries1 - INTEGER
- How many times to retry before deciding that something is wrong
- and it is necessary to report this suspicion to network layer.
- Minimal RFC value is 3, it is default, which corresponds
- to ~3sec-8min depending on RTO.
+tcp_congestion_control - STRING
+ Set the congestion control algorithm to be used for new
+ connections. The algorithm "reno" is always available, but
+ additional choices may be available based on kernel configuration.
+ Default is set as part of kernel configuration.
-tcp_retries2 - INTEGER
- How may times to retry before killing alive TCP connection.
- RFC1122 says that the limit should be longer than 100 sec.
- It is too small number. Default value 15 corresponds to ~13-30min
- depending on RTO.
+tcp_dsack - BOOLEAN
+ Allows TCP to send "duplicate" SACKs.
-tcp_orphan_retries - INTEGER
- How may times to retry before killing TCP connection, closed
- by our side. Default value 7 corresponds to ~50sec-16min
- depending on RTO. If you machine is loaded WEB server,
- you should think about lowering this value, such sockets
- may consume significant resources. Cf. tcp_max_orphans.
+tcp_ecn - BOOLEAN
+ Enable Explicit Congestion Notification in TCP.
+
+tcp_fack - BOOLEAN
+ Enable FACK congestion avoidance and fast retransmission.
+ The value is not used, if tcp_sack is not enabled.
tcp_fin_timeout - INTEGER
Time to hold socket in state FIN-WAIT-2, if it was closed
@@ -166,24 +174,33 @@ tcp_fin_timeout - INTEGER
because they eat maximum 1.5K of memory, but they tend
to live longer. Cf. tcp_max_orphans.
-tcp_max_tw_buckets - INTEGER
- Maximal number of timewait sockets held by system simultaneously.
- If this number is exceeded time-wait socket is immediately destroyed
- and warning is printed. This limit exists only to prevent
- simple DoS attacks, you _must_ not lower the limit artificially,
- but rather increase it (probably, after increasing installed memory),
- if network conditions require more than default value.
+tcp_frto - BOOLEAN
+ Enables F-RTO, an enhanced recovery algorithm for TCP retransmission
+ timeouts. It is particularly beneficial in wireless environments
+ where packet loss is typically due to random radio interference
+ rather than intermediate router congestion.
-tcp_tw_recycle - BOOLEAN
- Enable fast recycling TIME-WAIT sockets. Default value is 0.
- It should not be changed without advice/request of technical
- experts.
+tcp_keepalive_time - INTEGER
+ How often TCP sends out keepalive messages when keepalive is enabled.
+ Default: 2hours.
-tcp_tw_reuse - BOOLEAN
- Allow to reuse TIME-WAIT sockets for new connections when it is
- safe from protocol viewpoint. Default value is 0.
- It should not be changed without advice/request of technical
- experts.
+tcp_keepalive_probes - INTEGER
+ How many keepalive probes TCP sends out, until it decides that the
+ connection is broken. Default value: 9.
+
+tcp_keepalive_intvl - INTEGER
+ How frequently the probes are send out. Multiplied by
+ tcp_keepalive_probes it is time to kill not responding connection,
+ after probes started. Default value: 75sec i.e. connection
+ will be aborted after ~11 minutes of retries.
+
+tcp_low_latency - BOOLEAN
+ If set, the TCP stack makes decisions that prefer lower
+ latency as opposed to higher throughput. By default, this
+ option is not set meaning that higher throughput is preferred.
+ An example of an application where this default should be
+ changed would be a Beowulf compute cluster.
+ Default: 0
tcp_max_orphans - INTEGER
Maximal number of TCP sockets not attached to any user file handle,
@@ -197,41 +214,6 @@ tcp_max_orphans - INTEGER
more aggressively. Let me to remind again: each orphan eats
up to ~64K of unswappable memory.
-tcp_abort_on_overflow - BOOLEAN
- If listening service is too slow to accept new connections,
- reset them. Default state is FALSE. It means that if overflow
- occurred due to a burst, connection will recover. Enable this
- option _only_ if you are really sure that listening daemon
- cannot be tuned to accept connections faster. Enabling this
- option can harm clients of your server.
-
-tcp_syncookies - BOOLEAN
- Only valid when the kernel was compiled with CONFIG_SYNCOOKIES
- Send out syncookies when the syn backlog queue of a socket
- overflows. This is to prevent against the common 'syn flood attack'
- Default: FALSE
-
- Note, that syncookies is fallback facility.
- It MUST NOT be used to help highly loaded servers to stand
- against legal connection rate. If you see synflood warnings
- in your logs, but investigation shows that they occur
- because of overload with legal connections, you should tune
- another parameters until this warning disappear.
- See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.
-
- syncookies seriously violate TCP protocol, do not allow
- to use TCP extensions, can result in serious degradation
- of some services (f.e. SMTP relaying), visible not by you,
- but your clients and relays, contacting you. While you see
- synflood warnings in logs not being really flooded, your server
- is seriously misconfigured.
-
-tcp_stdurg - BOOLEAN
- Use the Host requirements interpretation of the TCP urg pointer field.
- Most hosts use the older BSD interpretation, so if you turn this on
- Linux might not communicate correctly with them.
- Default: FALSE
-
tcp_max_syn_backlog - INTEGER
Maximal number of remembered connection requests, which are
still did not receive an acknowledgment from connecting client.
@@ -239,24 +221,34 @@ tcp_max_syn_backlog - INTEGER
and 128 for low memory machines. If server suffers of overload,
try to increase this number.
-tcp_window_scaling - BOOLEAN
- Enable window scaling as defined in RFC1323.
+tcp_max_tw_buckets - INTEGER
+ Maximal number of timewait sockets held by system simultaneously.
+ If this number is exceeded time-wait socket is immediately destroyed
+ and warning is printed. This limit exists only to prevent
+ simple DoS attacks, you _must_ not lower the limit artificially,
+ but rather increase it (probably, after increasing installed memory),
+ if network conditions require more than default value.
-tcp_timestamps - BOOLEAN
- Enable timestamps as defined in RFC1323.
+tcp_mem - vector of 3 INTEGERs: min, pressure, max
+ min: below this number of pages TCP is not bothered about its
+ memory appetite.
-tcp_sack - BOOLEAN
- Enable select acknowledgments (SACKS).
+ pressure: when amount of memory allocated by TCP exceeds this number
+ of pages, TCP moderates its memory consumption and enters memory
+ pressure mode, which is exited when memory consumption falls
+ under "min".
-tcp_fack - BOOLEAN
- Enable FACK congestion avoidance and fast retransmission.
- The value is not used, if tcp_sack is not enabled.
+ max: number of pages allowed for queueing by all TCP sockets.
-tcp_dsack - BOOLEAN
- Allows TCP to send "duplicate" SACKs.
+ Defaults are calculated at boot time from amount of available
+ memory.
-tcp_ecn - BOOLEAN
- Enable Explicit Congestion Notification in TCP.
+tcp_orphan_retries - INTEGER
+ How may times to retry before killing TCP connection, closed
+ by our side. Default value 7 corresponds to ~50sec-16min
+ depending on RTO. If you machine is loaded WEB server,
+ you should think about lowering this value, such sockets
+ may consume significant resources. Cf. tcp_max_orphans.
tcp_reordering - INTEGER
Maximal reordering of packets in a TCP stream.
@@ -267,20 +259,23 @@ tcp_retrans_collapse - BOOLEAN
On retransmit try to send bigger packets to work around bugs in
certain TCP stacks.
-tcp_wmem - vector of 3 INTEGERs: min, default, max
- min: Amount of memory reserved for send buffers for TCP socket.
- Each TCP socket has rights to use it due to fact of its birth.
- Default: 4K
+tcp_retries1 - INTEGER
+ How many times to retry before deciding that something is wrong
+ and it is necessary to report this suspicion to network layer.
+ Minimal RFC value is 3, it is default, which corresponds
+ to ~3sec-8min depending on RTO.
- default: Amount of memory allowed for send buffers for TCP socket
- by default. This value overrides net.core.wmem_default used
- by other protocols, it is usually lower than net.core.wmem_default.
- Default: 16K
+tcp_retries2 - INTEGER
+ How may times to retry before killing alive TCP connection.
+ RFC1122 says that the limit should be longer than 100 sec.
+ It is too small number. Default value 15 corresponds to ~13-30min
+ depending on RTO.
- max: Maximal amount of memory allowed for automatically selected
- send buffers for TCP socket. This value does not override
- net.core.wmem_max, "static" selection via SO_SNDBUF does not use this.
- Default: 128K
+tcp_rfc1337 - BOOLEAN
+ If set, the TCP stack behaves conforming to RFC1337. If unset,
+ we are not conforming to RFC, but prevent TCP TIME_WAIT
+ assassination.
+ Default: 0
tcp_rmem - vector of 3 INTEGERs: min, default, max
min: Minimal size of receive buffer used by TCP sockets.
@@ -299,67 +294,91 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
net.core.rmem_max, "static" selection via SO_RCVBUF does not use this.
Default: 87380*2 bytes.
-tcp_mem - vector of 3 INTEGERs: min, pressure, max
- min: below this number of pages TCP is not bothered about its
- memory appetite.
+tcp_sack - BOOLEAN
+ Enable select acknowledgments (SACKS).
- pressure: when amount of memory allocated by TCP exceeds this number
- of pages, TCP moderates its memory consumption and enters memory
- pressure mode, which is exited when memory consumption falls
- under "min".
+tcp_slow_start_after_idle - BOOLEAN
+ If set, provide RFC2861 behavior and time out the congestion
+ window after an idle period. An idle period is defined at
+ the current RTO. If unset, the congestion window will not
+ be timed out after an idle period.
+ Default: 1
- max: number of pages allowed for queueing by all TCP sockets.
+tcp_stdurg - BOOLEAN
+ Use the Host requirements interpretation of the TCP urg pointer field.
+ Most hosts use the older BSD interpretation, so if you turn this on
+ Linux might not communicate correctly with them.
+ Default: FALSE
- Defaults are calculated at boot time from amount of available
- memory.
+tcp_synack_retries - INTEGER
+ Number of times SYNACKs for a passive TCP connection attempt will
+ be retransmitted. Should not be higher than 255. Default value
+ is 5, which corresponds to ~180seconds.
-tcp_app_win - INTEGER
- Reserve max(window/2^tcp_app_win, mss) of window for application
- buffer. Value 0 is special, it means that nothing is reserved.
- Default: 31
+tcp_syncookies - BOOLEAN
+ Only valid when the kernel was compiled with CONFIG_SYNCOOKIES
+ Send out syncookies when the syn backlog queue of a socket
+ overflows. This is to prevent against the common 'syn flood attack'
+ Default: FALSE
-tcp_adv_win_scale - INTEGER
- Count buffering overhead as bytes/2^tcp_adv_win_scale
- (if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale),
- if it is <= 0.
- Default: 2
+ Note, that syncookies is fallback facility.
+ It MUST NOT be used to help highly loaded servers to stand
+ against legal connection rate. If you see synflood warnings
+ in your logs, but investigation shows that they occur
+ because of overload with legal connections, you should tune
+ another parameters until this warning disappear.
+ See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.
-tcp_rfc1337 - BOOLEAN
- If set, the TCP stack behaves conforming to RFC1337. If unset,
- we are not conforming to RFC, but prevent TCP TIME_WAIT
- assassination.
- Default: 0
+ syncookies seriously violate TCP protocol, do not allow
+ to use TCP extensions, can result in serious degradation
+ of some services (f.e. SMTP relaying), visible not by you,
+ but your clients and relays, contacting you. While you see
+ synflood warnings in logs not being really flooded, your server
+ is seriously misconfigured.
-tcp_low_latency - BOOLEAN
- If set, the TCP stack makes decisions that prefer lower
- latency as opposed to higher throughput. By default, this
- option is not set meaning that higher throughput is preferred.
- An example of an application where this default should be
- changed would be a Beowulf compute cluster.
- Default: 0
+tcp_syn_retries - INTEGER
+ Number of times initial SYNs for an active TCP connection attempt
+ will be retransmitted. Should not be higher than 255. Default value
+ is 5, which corresponds to ~180seconds.
+
+tcp_timestamps - BOOLEAN
+ Enable timestamps as defined in RFC1323.
tcp_tso_win_divisor - INTEGER
- This allows control over what percentage of the congestion window
- can be consumed by a single TSO frame.
- The setting of this parameter is a choice between burstiness and
- building larger TSO frames.
- Default: 3
+ This allows control over what percentage of the congestion window
+ can be consumed by a single TSO frame.
+ The setting of this parameter is a choice between burstiness and
+ building larger TSO frames.
+ Default: 3
-tcp_frto - BOOLEAN
- Enables F-RTO, an enhanced recovery algorithm for TCP retransmission
- timeouts. It is particularly beneficial in wireless environments
- where packet loss is typically due to random radio interference
- rather than intermediate router congestion.
+tcp_tw_recycle - BOOLEAN
+ Enable fast recycling TIME-WAIT sockets. Default value is 0.
+ It should not be changed without advice/request of technical
+ experts.
-tcp_congestion_control - STRING
- Set the congestion control algorithm to be used for new
- connections. The algorithm "reno" is always available, but
- additional choices may be available based on kernel configuration.
+tcp_tw_reuse - BOOLEAN
+ Allow to reuse TIME-WAIT sockets for new connections when it is
+ safe from protocol viewpoint. Default value is 0.
+ It should not be changed without advice/request of technical
+ experts.
-somaxconn - INTEGER
- Limit of socket listen() backlog, known in userspace as SOMAXCONN.
- Defaults to 128. See also tcp_max_syn_backlog for additional tuning
- for TCP sockets.
+tcp_window_scaling - BOOLEAN
+ Enable window scaling as defined in RFC1323.
+
+tcp_wmem - vector of 3 INTEGERs: min, default, max
+ min: Amount of memory reserved for send buffers for TCP socket.
+ Each TCP socket has rights to use it due to fact of its birth.
+ Default: 4K
+
+ default: Amount of memory allowed for send buffers for TCP socket
+ by default. This value overrides net.core.wmem_default used
+ by other protocols, it is usually lower than net.core.wmem_default.
+ Default: 16K
+
+ max: Maximal amount of memory allowed for automatically selected
+ send buffers for TCP socket. This value does not override
+ net.core.wmem_max, "static" selection via SO_SNDBUF does not use this.
+ Default: 128K
tcp_workaround_signed_windows - BOOLEAN
If set, assume no receipt of a window scaling option means the
@@ -368,13 +387,6 @@ tcp_workaround_signed_windows - BOOLEAN
not receive a window scaling option from them.
Default: 0
-tcp_slow_start_after_idle - BOOLEAN
- If set, provide RFC2861 behavior and time out the congestion
- window after an idle period. An idle period is defined at
- the current RTO. If unset, the congestion window will not
- be timed out after an idle period.
- Default: 1
-
CIPSOv4 Variables:
cipso_cache_enable - BOOLEAN
@@ -974,4 +986,3 @@ no_cong_thresh FIXME
slot_timeout FIXME
warn_noreply_time FIXME
-$Id: ip-sysctl.txt,v 1.20 2001/12/13 09:00:18 davem Exp $
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 29ccae409031..0bc95eab1512 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -1,7 +1,7 @@
-------
PHY Abstraction Layer
-(Updated 2005-07-21)
+(Updated 2006-11-30)
Purpose
@@ -97,11 +97,12 @@ Letting the PHY Abstraction Layer do Everything
Next, you need to know the device name of the PHY connected to this device.
The name will look something like, "phy0:0", where the first number is the
- bus id, and the second is the PHY's address on that bus.
+ bus id, and the second is the PHY's address on that bus. Typically,
+ the bus is responsible for making its ID unique.
Now, to connect, just call this function:
- phydev = phy_connect(dev, phy_name, &adjust_link, flags);
+ phydev = phy_connect(dev, phy_name, &adjust_link, flags, interface);
phydev is a pointer to the phy_device structure which represents the PHY. If
phy_connect is successful, it will return the pointer. dev, here, is the
@@ -115,6 +116,10 @@ Letting the PHY Abstraction Layer do Everything
This is useful if the system has put hardware restrictions on
the PHY/controller, of which the PHY needs to be aware.
+ interface is a u32 which specifies the connection type used
+ between the controller and the PHY. Examples are GMII, MII,
+ RGMII, and SGMII. For a full list, see include/linux/phy.h
+
Now just make sure that phydev->supported and phydev->advertising have any
values pruned from them which don't make sense for your controller (a 10/100
controller may be connected to a gigabit capable PHY, so you would need to
@@ -191,7 +196,7 @@ Doing it all yourself
start, or disables then frees them for stop.
struct phy_device * phy_attach(struct net_device *dev, const char *phy_id,
- u32 flags);
+ u32 flags, phy_interface_t interface);
Attaches a network device to a particular PHY, binding the PHY to a generic
driver if none was found during bus initialization. Passes in
diff --git a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.txt
new file mode 100644
index 000000000000..dd6f46b83dab
--- /dev/null
+++ b/Documentation/networking/udplite.txt
@@ -0,0 +1,281 @@
+ ===========================================================================
+ The UDP-Lite protocol (RFC 3828)
+ ===========================================================================
+
+
+ UDP-Lite is a Standards-Track IETF transport protocol whose characteristic
+ is a variable-length checksum. This has advantages for transport of multimedia
+ (video, VoIP) over wireless networks, as partly damaged packets can still be
+ fed into the codec instead of being discarded due to a failed checksum test.
+
+ This file briefly describes the existing kernel support and the socket API.
+ For in-depth information, you can consult:
+
+ o The UDP-Lite Homepage: http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/
+ Fom here you can also download some example application source code.
+
+ o The UDP-Lite HOWTO on
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/UDP-Lite-HOWTO.txt
+
+ o The Wireshark UDP-Lite WiKi (with capture files):
+ http://wiki.wireshark.org/Lightweight_User_Datagram_Protocol
+
+ o The Protocol Spec, RFC 3828, http://www.ietf.org/rfc/rfc3828.txt
+
+
+ I) APPLICATIONS
+
+ Several applications have been ported successfully to UDP-Lite. Ethereal
+ (now called wireshark) has UDP-Litev4/v6 support by default. The tarball on
+
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz
+
+ has source code for several v4/v6 client-server and network testing examples.
+
+ Porting applications to UDP-Lite is straightforward: only socket level and
+ IPPROTO need to be changed; senders additionally set the checksum coverage
+ length (default = header length = 8). Details are in the next section.
+
+
+ II) PROGRAMMING API
+
+ UDP-Lite provides a connectionless, unreliable datagram service and hence
+ uses the same socket type as UDP. In fact, porting from UDP to UDP-Lite is
+ very easy: simply add `IPPROTO_UDPLITE' as the last argument of the socket(2)
+ call so that the statement looks like:
+
+ s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ or, respectively,
+
+ s = socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ With just the above change you are able to run UDP-Lite services or connect
+ to UDP-Lite servers. The kernel will assume that you are not interested in
+ using partial checksum coverage and so emulate UDP mode (full coverage).
+
+ To make use of the partial checksum coverage facilities requires setting a
+ single socket option, which takes an integer specifying the coverage length:
+
+ * Sender checksum coverage: UDPLITE_SEND_CSCOV
+
+ For example,
+
+ int val = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_SEND_CSCOV, &val, sizeof(int));
+
+ sets the checksum coverage length to 20 bytes (12b data + 8b header).
+ Of each packet only the first 20 bytes (plus the pseudo-header) will be
+ checksummed. This is useful for RTP applications which have a 12-byte
+ base header.
+
+
+ * Receiver checksum coverage: UDPLITE_RECV_CSCOV
+
+ This option is the receiver-side analogue. It is truly optional, i.e. not
+ required to enable traffic with partial checksum coverage. Its function is
+ that of a traffic filter: when enabled, it instructs the kernel to drop
+ all packets which have a coverage _less_ than this value. For example, if
+ RTP and UDP headers are to be protected, a receiver can enforce that only
+ packets with a minimum coverage of 20 are admitted:
+
+ int min = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_RECV_CSCOV, &min, sizeof(int));
+
+ The calls to getsockopt(2) are analogous. Being an extension and not a stand-
+ alone protocol, all socket options known from UDP can be used in exactly the
+ same manner as before, e.g. UDP_CORK or UDP_ENCAP.
+
+ A detailed discussion of UDP-Lite checksum coverage options is in section IV.
+
+
+ III) HEADER FILES
+
+ The socket API requires support through header files in /usr/include:
+
+ * /usr/include/netinet/in.h
+ to define IPPROTO_UDPLITE
+
+ * /usr/include/netinet/udplite.h
+ for UDP-Lite header fields and protocol constants
+
+ For testing purposes, the following can serve as a `mini' header file:
+
+ #define IPPROTO_UDPLITE 136
+ #define SOL_UDPLITE 136
+ #define UDPLITE_SEND_CSCOV 10
+ #define UDPLITE_RECV_CSCOV 11
+
+ Ready-made header files for various distros are in the UDP-Lite tarball.
+
+
+ IV) KERNEL BEHAVIOUR WITH REGARD TO THE VARIOUS SOCKET OPTIONS
+
+ To enable debugging messages, the log level need to be set to 8, as most
+ messages use the KERN_DEBUG level (7).
+
+ 1) Sender Socket Options
+
+ If the sender specifies a value of 0 as coverage length, the module
+ assumes full coverage, transmits a packet with coverage length of 0
+ and according checksum. If the sender specifies a coverage < 8 and
+ different from 0, the kernel assumes 8 as default value. Finally,
+ if the specified coverage length exceeds the packet length, the packet
+ length is used instead as coverage length.
+
+ 2) Receiver Socket Options
+
+ The receiver specifies the minimum value of the coverage length it
+ is willing to accept. A value of 0 here indicates that the receiver
+ always wants the whole of the packet covered. In this case, all
+ partially covered packets are dropped and an error is logged.
+
+ It is not possible to specify illegal values (<0 and <8); in these
+ cases the default of 8 is assumed.
+
+ All packets arriving with a coverage value less than the specified
+ threshold are discarded, these events are also logged.
+
+ 3) Disabling the Checksum Computation
+
+ On both sender and receiver, checksumming will always be performed
+ and can not be disabled using SO_NO_CHECK. Thus
+
+ setsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, ... );
+
+ will always will be ignored, while the value of
+
+ getsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, &value, ...);
+
+ is meaningless (as in TCP). Packets with a zero checksum field are
+ illegal (cf. RFC 3828, sec. 3.1) will be silently discarded.
+
+ 4) Fragmentation
+
+ The checksum computation respects both buffersize and MTU. The size
+ of UDP-Lite packets is determined by the size of the send buffer. The
+ minimum size of the send buffer is 2048 (defined as SOCK_MIN_SNDBUF
+ in include/net/sock.h), the default value is configurable as
+ net.core.wmem_default or via setting the SO_SNDBUF socket(7)
+ option. The maximum upper bound for the send buffer is determined
+ by net.core.wmem_max.
+
+ Given a payload size larger than the send buffer size, UDP-Lite will
+ split the payload into several individual packets, filling up the
+ send buffer size in each case.
+
+ The precise value also depends on the interface MTU. The interface MTU,
+ in turn, may trigger IP fragmentation. In this case, the generated
+ UDP-Lite packet is split into several IP packets, of which only the
+ first one contains the L4 header.
+
+ The send buffer size has implications on the checksum coverage length.
+ Consider the following example:
+
+ Payload: 1536 bytes Send Buffer: 1024 bytes
+ MTU: 1500 bytes Coverage Length: 856 bytes
+
+ UDP-Lite will ship the 1536 bytes in two separate packets:
+
+ Packet 1: 1024 payload + 8 byte header + 20 byte IP header = 1052 bytes
+ Packet 2: 512 payload + 8 byte header + 20 byte IP header = 540 bytes
+
+ The coverage packet covers the UDP-Lite header and 848 bytes of the
+ payload in the first packet, the second packet is fully covered. Note
+ that for the second packet, the coverage length exceeds the packet
+ length. The kernel always re-adjusts the coverage length to the packet
+ length in such cases.
+
+ As an example of what happens when one UDP-Lite packet is split into
+ several tiny fragments, consider the following example.
+
+ Payload: 1024 bytes Send buffer size: 1024 bytes
+ MTU: 300 bytes Coverage length: 575 bytes
+
+ +-+-----------+--------------+--------------+--------------+
+ |8| 272 | 280 | 280 | 280 |
+ +-+-----------+--------------+--------------+--------------+
+ 280 560 840 1032
+ ^
+ *****checksum coverage*************
+
+ The UDP-Lite module generates one 1032 byte packet (1024 + 8 byte
+ header). According to the interface MTU, these are split into 4 IP
+ packets (280 byte IP payload + 20 byte IP header). The kernel module
+ sums the contents of the entire first two packets, plus 15 bytes of
+ the last packet before releasing the fragments to the IP module.
+
+ To see the analogous case for IPv6 fragmentation, consider a link
+ MTU of 1280 bytes and a write buffer of 3356 bytes. If the checksum
+ coverage is less than 1232 bytes (MTU minus IPv6/fragment header
+ lengths), only the first fragment needs to be considered. When using
+ larger checksum coverage lengths, each eligible fragment needs to be
+ checksummed. Suppose we have a checksum coverage of 3062. The buffer
+ of 3356 bytes will be split into the following fragments:
+
+ Fragment 1: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 2: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 3: 948 bytes carrying 900 bytes of UDP-Lite data
+
+ The first two fragments have to be checksummed in full, of the last
+ fragment only 598 (= 3062 - 2*1232) bytes are checksummed.
+
+ While it is important that such cases are dealt with correctly, they
+ are (annoyingly) rare: UDP-Lite is designed for optimising multimedia
+ performance over wireless (or generally noisy) links and thus smaller
+ coverage lenghts are likely to be expected.
+
+
+ V) UDP-LITE RUNTIME STATISTICS AND THEIR MEANING
+
+ Exceptional and error conditions are logged to syslog at the KERN_DEBUG
+ level. Live statistics about UDP-Lite are available in /proc/net/snmp
+ and can (with newer versions of netstat) be viewed using
+
+ netstat -svu
+
+ This displays UDP-Lite statistics variables, whose meaning is as follows.
+
+ InDatagrams: Total number of received datagrams.
+
+ NoPorts: Number of packets received to an unknown port.
+ These cases are counted separately (not as InErrors).
+
+ InErrors: Number of erroneous UDP-Lite packets. Errors include:
+ * internal socket queue receive errors
+ * packet too short (less than 8 bytes or stated
+ coverage length exceeds received length)
+ * xfrm4_policy_check() returned with error
+ * application has specified larger min. coverage
+ length than that of incoming packet
+ * checksum coverage violated
+ * bad checksum
+
+ OutDatagrams: Total number of sent datagrams.
+
+ These statistics derive from the UDP MIB (RFC 2013).
+
+
+ VI) IPTABLES
+
+ There is packet match support for UDP-Lite as well as support for the LOG target.
+ If you copy and paste the following line into /etc/protcols,
+
+ udplite 136 UDP-Lite # UDP-Lite [RFC 3828]
+
+ then
+ iptables -A INPUT -p udplite -j LOG
+
+ will produce logging output to syslog. Dropping and rejecting packets also works.
+
+
+ VII) MAINTAINER ADDRESS
+
+ The UDP-Lite patch was developed at
+ University of Aberdeen
+ Electronics Research Group
+ Department of Engineering
+ Fraser Noble Building
+ Aberdeen AB24 3UE; UK
+ The current maintainer is Gerrit Renker, <gerrit@erg.abdn.ac.uk>. Initial
+ code was developed by William Stanislaus, <william@erg.abdn.ac.uk>.
diff --git a/Documentation/networking/xfrm_sync.txt b/Documentation/networking/xfrm_sync.txt
index 8be626f7c0b8..d7aac9dedeb4 100644
--- a/Documentation/networking/xfrm_sync.txt
+++ b/Documentation/networking/xfrm_sync.txt
@@ -47,10 +47,13 @@ aevent_id structure looks like:
struct xfrm_aevent_id {
struct xfrm_usersa_id sa_id;
+ xfrm_address_t saddr;
__u32 flags;
+ __u32 reqid;
};
-xfrm_usersa_id in this message layout identifies the SA.
+The unique SA is identified by the combination of xfrm_usersa_id,
+reqid and saddr.
flags are used to indicate different things. The possible
flags are:
diff --git a/Documentation/s390/CommonIO b/Documentation/s390/CommonIO
index d684a6ac69a8..22f82f21bc60 100644
--- a/Documentation/s390/CommonIO
+++ b/Documentation/s390/CommonIO
@@ -74,7 +74,7 @@ Command line parameters
Note: While already known devices can be added to the list of devices to be
ignored, there will be no effect on then. However, if such a device
- disappears and then reappeares, it will then be ignored.
+ disappears and then reappears, it will then be ignored.
For example,
"echo add 0.0.a000-0.0.accc, 0.0.af00-0.0.afff > /proc/cio_ignore"
@@ -82,7 +82,7 @@ Command line parameters
devices.
The devices can be specified either by bus id (0.0.abcd) or, for 2.4 backward
- compatibilty, by the device number in hexadecimal (0xabcd or abcd).
+ compatibility, by the device number in hexadecimal (0xabcd or abcd).
* /proc/s390dbf/cio_*/ (S/390 debug feature)
diff --git a/Documentation/s390/Debugging390.txt b/Documentation/s390/Debugging390.txt
index 4dd25ee549e9..3f9ddbc23b27 100644
--- a/Documentation/s390/Debugging390.txt
+++ b/Documentation/s390/Debugging390.txt
@@ -7,7 +7,7 @@
Overview of Document:
=====================
-This document is intended to give an good overview of how to debug
+This document is intended to give a good overview of how to debug
Linux for s/390 & z/Architecture. It isn't intended as a complete reference & not a
tutorial on the fundamentals of C & assembly. It doesn't go into
390 IO in any detail. It is intended to complement the documents in the
@@ -300,7 +300,7 @@ On z/Architecture our page indexes are now 2k in size
but only mess with 2 segment indices each time we mess with
a PMD.
-3) As z/Architecture supports upto a massive 5-level page table lookup we
+3) As z/Architecture supports up to a massive 5-level page table lookup we
can only use 3 currently on Linux ( as this is all the generic kernel
currently supports ) however this may change in future
this allows us to access ( according to my sums )
@@ -502,7 +502,7 @@ Notes:
------
1) The only requirement is that registers which are used
by the callee are saved, e.g. the compiler is perfectly
-capible of using r11 for purposes other than a frame a
+capable of using r11 for purposes other than a frame a
frame pointer if a frame pointer is not needed.
2) In functions with variable arguments e.g. printf the calling procedure
is identical to one without variable arguments & the same number of
@@ -846,7 +846,7 @@ of time searching for debugging info. The following self explanatory line should
instead if the code isn't compiled -g, as it is much faster:
objdump --disassemble-all --syms vmlinux > vmlinux.lst
-As hard drive space is valuble most of us use the following approach.
+As hard drive space is valuable most of us use the following approach.
1) Look at the emitted psw on the console to find the crash address in the kernel.
2) Look at the file System.map ( in the linux directory ) produced when building
the kernel to find the closest address less than the current PSW to find the
@@ -902,7 +902,7 @@ A. It is a tool for intercepting calls to the kernel & logging them
to a file & on the screen.
Q. What use is it ?
-A. You can used it to find out what files a particular program opens.
+A. You can use it to find out what files a particular program opens.
@@ -911,7 +911,7 @@ Example 1
If you wanted to know does ping work but didn't have the source
strace ping -c 1 127.0.0.1
& then look at the man pages for each of the syscalls below,
-( In fact this is sometimes easier than looking at some spagetti
+( In fact this is sometimes easier than looking at some spaghetti
source which conditionally compiles for several architectures ).
Not everything that it throws out needs to make sense immediately.
@@ -1037,7 +1037,7 @@ e.g. man strace, man alarm, man socket.
Performance Debugging
=====================
-gcc is capible of compiling in profiling code just add the -p option
+gcc is capable of compiling in profiling code just add the -p option
to the CFLAGS, this obviously affects program size & performance.
This can be used by the gprof gnu profiling tool or the
gcov the gnu code coverage tool ( code coverage is a means of testing
@@ -1419,7 +1419,7 @@ On a SMP guest issue a command to all CPUs try prefixing the command with cpu al
To issue a command to a particular cpu try cpu <cpu number> e.g.
CPU 01 TR I R 2000.3000
If you are running on a guest with several cpus & you have a IO related problem
-& cannot follow the flow of code but you know it isnt smp related.
+& cannot follow the flow of code but you know it isn't smp related.
from the bash prompt issue
shutdown -h now or halt.
do a Q CPUS to find out how many cpus you have
@@ -1602,7 +1602,7 @@ V000FFFD0 00010400 80010802 8001085A 000FFFA0
our 3rd return address is 8001085A
as the 04B52002 looks suspiciously like rubbish it is fair to assume that the kernel entry routines
-for the sake of optimisation dont set up a backchain.
+for the sake of optimisation don't set up a backchain.
now look at System.map to see if the addresses make any sense.
@@ -1638,11 +1638,11 @@ more useful information.
Unlike other bus architectures modern 390 systems do their IO using mostly
fibre optics & devices such as tapes & disks can be shared between several mainframes,
-also S390 can support upto 65536 devices while a high end PC based system might be choking
+also S390 can support up to 65536 devices while a high end PC based system might be choking
with around 64. Here is some of the common IO terminology
Subchannel:
-This is the logical number most IO commands use to talk to an IO device there can be upto
+This is the logical number most IO commands use to talk to an IO device there can be up to
0x10000 (65536) of these in a configuration typically there is a few hundred. Under VM
for simplicity they are allocated contiguously, however on the native hardware they are not
they typically stay consistent between boots provided no new hardware is inserted or removed.
@@ -1651,7 +1651,7 @@ HALT SUBCHANNEL,MODIFY SUBCHANNEL,RESUME SUBCHANNEL,START SUBCHANNEL,STORE SUBCH
TEST SUBCHANNEL ) we use this as the ID of the device we wish to talk to, the most
important of these instructions are START SUBCHANNEL ( to start IO ), TEST SUBCHANNEL ( to check
whether the IO completed successfully ), & HALT SUBCHANNEL ( to kill IO ), a subchannel
-can have up to 8 channel paths to a device this offers redunancy if one is not available.
+can have up to 8 channel paths to a device this offers redundancy if one is not available.
Device Number:
@@ -1659,7 +1659,7 @@ This number remains static & Is closely tied to the hardware, there are 65536 of
also they are made up of a CHPID ( Channel Path ID, the most significant 8 bits )
& another lsb 8 bits. These remain static even if more devices are inserted or removed
from the hardware, there is a 1 to 1 mapping between Subchannels & Device Numbers provided
-devices arent inserted or removed.
+devices aren't inserted or removed.
Channel Control Words:
CCWS are linked lists of instructions initially pointed to by an operation request block (ORB),
@@ -1674,7 +1674,7 @@ concurrently, you check how the IO went on by issuing a TEST SUBCHANNEL at each
from which you receive an Interruption response block (IRB). If you get channel & device end
status in the IRB without channel checks etc. your IO probably went okay. If you didn't you
probably need a doctor to examine the IRB & extended status word etc.
-If an error occurs, more sophistocated control units have a facitity known as
+If an error occurs, more sophisticated control units have a facility known as
concurrent sense this means that if an error occurs Extended sense information will
be presented in the Extended status word in the IRB if not you have to issue a
subsequent SENSE CCW command after the test subchannel.
@@ -1749,7 +1749,7 @@ Interface (OEMI).
This byte wide Parallel channel path/bus has parity & data on the "Bus" cable
& control lines on the "Tag" cable. These can operate in byte multiplex mode for
sharing between several slow devices or burst mode & monopolize the channel for the
-whole burst. Upto 256 devices can be addressed on one of these cables. These cables are
+whole burst. Up to 256 devices can be addressed on one of these cables. These cables are
about one inch in diameter. The maximum unextended length supported by these cables is
125 Meters but this can be extended up to 2km with a fibre optic channel extended
such as a 3044. The maximum burst speed supported is 4.5 megabytes per second however
@@ -1759,7 +1759,7 @@ One of these paths can be daisy chained to up to 8 control units.
ESCON if fibre optic it is also called FICON
Was introduced by IBM in 1990. Has 2 fibre optic cables & uses either leds or lasers
-for communication at a signaling rate of upto 200 megabits/sec. As 10bits are transferred
+for communication at a signaling rate of up to 200 megabits/sec. As 10bits are transferred
for every 8 bits info this drops to 160 megabits/sec & to 18.6 Megabytes/sec once
control info & CRC are added. ESCON only operates in burst mode.
@@ -1767,7 +1767,7 @@ ESCONs typical max cable length is 3km for the led version & 20km for the laser
known as XDF ( extended distance facility ). This can be further extended by using an
ESCON director which triples the above mentioned ranges. Unlike Bus & Tag as ESCON is
serial it uses a packet switching architecture the standard Bus & Tag control protocol
-is however present within the packets. Upto 256 devices can be attached to each control
+is however present within the packets. Up to 256 devices can be attached to each control
unit that uses one of these interfaces.
Common 390 Devices include:
@@ -2050,7 +2050,7 @@ list test.c:1,10
directory:
Adds directories to be searched for source if gdb cannot find the source.
-(note it is a bit sensititive about slashes)
+(note it is a bit sensitive about slashes)
e.g. To add the root of the filesystem to the searchpath do
directory //
@@ -2152,7 +2152,7 @@ program as if it just crashed on your system, it is usually called core & create
current working directory.
This is very useful in that a customer can mail a core dump to a technical support department
& the technical support department can reconstruct what happened.
-Provided the have an identical copy of this program with debugging symbols compiled in &
+Provided they have an identical copy of this program with debugging symbols compiled in &
the source base of this build is available.
In short it is far more useful than something like a crash log could ever hope to be.
diff --git a/Documentation/s390/cds.txt b/Documentation/s390/cds.txt
index 32a96cc39215..05a2b4f7e38f 100644
--- a/Documentation/s390/cds.txt
+++ b/Documentation/s390/cds.txt
@@ -98,7 +98,7 @@ The following chapters describe the I/O related interface routines the
Linux/390 common device support (CDS) provides to allow for device specific
driver implementations on the IBM ESA/390 hardware platform. Those interfaces
intend to provide the functionality required by every device driver
-implementaion to allow to drive a specific hardware device on the ESA/390
+implementation to allow to drive a specific hardware device on the ESA/390
platform. Some of the interface routines are specific to Linux/390 and some
of them can be found on other Linux platforms implementations too.
Miscellaneous function prototypes, data declarations, and macro definitions
@@ -114,7 +114,7 @@ the ESA/390 architecture has implemented a so called channel subsystem, that
provides a unified view of the devices physically attached to the systems.
Though the ESA/390 hardware platform knows about a huge variety of different
peripheral attachments like disk devices (aka. DASDs), tapes, communication
-controllers, etc. they can all by accessed by a well defined access method and
+controllers, etc. they can all be accessed by a well defined access method and
they are presenting I/O completion a unified way : I/O interruptions. Every
single device is uniquely identified to the system by a so called subchannel,
where the ESA/390 architecture allows for 64k devices be attached.
@@ -338,7 +338,7 @@ DOIO_REPORT_ALL - report all interrupt conditions
The ccw_device_start() function returns :
0 - successful completion or request successfully initiated
--EBUSY - The device is currently processing a previous I/O request, or ther is
+-EBUSY - The device is currently processing a previous I/O request, or there is
a status pending at the device.
-ENODEV - cdev is invalid, the device is not operational or the ccw_device is
not online.
@@ -361,7 +361,7 @@ first:
-EIO: the common I/O layer terminated the request due to an error state
If the concurrent sense flag in the extended status word in the irb is set, the
-field irb->scsw.count describes the numer of device specific sense bytes
+field irb->scsw.count describes the number of device specific sense bytes
available in the extended control word irb->scsw.ecw[0]. No device sensing by
the device driver itself is required.
@@ -410,7 +410,7 @@ ccw_device_start() must be called disabled and with the ccw device lock held.
The device driver is allowed to issue the next ccw_device_start() call from
within its interrupt handler already. It is not required to schedule a
-bottom-half, unless an non deterministically long running error recovery procedure
+bottom-half, unless a non deterministically long running error recovery procedure
or similar needs to be scheduled. During I/O processing the Linux/390 generic
I/O device driver support has already obtained the IRQ lock, i.e. the handler
must not try to obtain it again when calling ccw_device_start() or we end in a
@@ -431,7 +431,7 @@ information prior to device-end the device driver urgently relies on. In this
case all I/O interruptions are presented to the device driver until final
status is recognized.
-If a device is able to recover from asynchronosly presented I/O errors, it can
+If a device is able to recover from asynchronously presented I/O errors, it can
perform overlapping I/O using the DOIO_EARLY_NOTIFICATION flag. While some
devices always report channel-end and device-end together, with a single
interrupt, others present primary status (channel-end) when the channel is
diff --git a/Documentation/s390/crypto/crypto-API.txt b/Documentation/s390/crypto/crypto-API.txt
index 41a8b07da05a..71ae6ca9f2c2 100644
--- a/Documentation/s390/crypto/crypto-API.txt
+++ b/Documentation/s390/crypto/crypto-API.txt
@@ -17,8 +17,8 @@ arch/s390/crypto directory.
2. Probing for availability of MSA
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It should be possible to use Kernels with the z990 crypto implementations both
-on machines with MSA available an on those without MSA (pre z990 or z990
-without MSA). Therefore a simple probing mechanisms has been implemented:
+on machines with MSA available and on those without MSA (pre z990 or z990
+without MSA). Therefore a simple probing mechanism has been implemented:
In the init function of each crypto module the availability of MSA and of the
respective crypto algorithm in particular will be tested. If the algorithm is
available the module will load and register its algorithm with the crypto API.
@@ -26,7 +26,7 @@ available the module will load and register its algorithm with the crypto API.
If the respective crypto algorithm is not available, the init function will
return -ENOSYS. In that case a fallback to the standard software implementation
of the crypto algorithm must be taken ( -> the standard crypto modules are
-also build when compiling the kernel).
+also built when compiling the kernel).
3. Ensuring z990 crypto module preference
diff --git a/Documentation/s390/s390dbf.txt b/Documentation/s390/s390dbf.txt
index 000230cd26db..0eb7c58916de 100644
--- a/Documentation/s390/s390dbf.txt
+++ b/Documentation/s390/s390dbf.txt
@@ -36,7 +36,7 @@ switches to the next debug area. This is done in order to be sure
that the records which describe the origin of the exception are not
overwritten when a wrap around for the current area occurs.
-The debug areas itselve are also ordered in form of a ring buffer.
+The debug areas themselves are also ordered in form of a ring buffer.
When an exception is thrown in the last debug area, the following debug
entries are then written again in the very first area.
@@ -55,7 +55,7 @@ The debug logs can be inspected in a live system through entries in
the debugfs-filesystem. Under the toplevel directory "s390dbf" there is
a directory for each registered component, which is named like the
corresponding component. The debugfs normally should be mounted to
-/sys/kernel/debug therefore the debug feature can be accessed unter
+/sys/kernel/debug therefore the debug feature can be accessed under
/sys/kernel/debug/s390dbf.
The content of the directories are files which represent different views
@@ -87,11 +87,11 @@ There are currently 2 possible triggers, which stop the debug feature
globally. The first possibility is to use the "debug_active" sysctl. If
set to 1 the debug feature is running. If "debug_active" is set to 0 the
debug feature is turned off.
-The second trigger which stops the debug feature is an kernel oops.
+The second trigger which stops the debug feature is a kernel oops.
That prevents the debug feature from overwriting debug information that
happened before the oops. After an oops you can reactivate the debug feature
by piping 1 to /proc/sys/s390dbf/debug_active. Nevertheless, its not
-suggested to use an oopsed kernel in an production environment.
+suggested to use an oopsed kernel in a production environment.
If you want to disallow the deactivation of the debug feature, you can use
the "debug_stoppable" sysctl. If you set "debug_stoppable" to 0 the debug
feature cannot be stopped. If the debug feature is already stopped, it