diff options
Diffstat (limited to 'Documentation')
123 files changed, 3499 insertions, 813 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index 1b777b960492..1f89424c36a6 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX @@ -192,10 +192,6 @@ kernel-docs.txt - listing of various WWW + books that document kernel internals. kernel-parameters.txt - summary listing of command line / boot prompt args for the kernel. -keys-request-key.txt - - description of the kernel key request service. -keys.txt - - description of the kernel key retention service. kobject.txt - info of the kobject infrastructure of the Linux kernel. kprobes.txt @@ -294,6 +290,8 @@ scheduler/ - directory with info on the scheduler. scsi/ - directory with info on Linux scsi support. +security/ + - directory that contains security-related info serial/ - directory with info on the low level serial API. serial-console.txt diff --git a/Documentation/ABI/obsolete/o2cb b/Documentation/ABI/removed/o2cb index 9c49d8e6c0cc..7f5daa465093 100644 --- a/Documentation/ABI/obsolete/o2cb +++ b/Documentation/ABI/removed/o2cb @@ -1,11 +1,10 @@ What: /sys/o2cb symlink -Date: Dec 2005 -KernelVersion: 2.6.16 +Date: May 2011 +KernelVersion: 2.6.40 Contact: ocfs2-devel@oss.oracle.com -Description: This is a symlink: /sys/o2cb to /sys/fs/o2cb. The symlink will - be removed when new versions of ocfs2-tools which know to look +Description: This is a symlink: /sys/o2cb to /sys/fs/o2cb. The symlink is + removed when new versions of ocfs2-tools which know to look in /sys/fs/o2cb are sufficiently prevalent. Don't code new software to look here, it should try /sys/fs/o2cb instead. - See Documentation/ABI/stable/o2cb for more information on usage. Users: ocfs2-tools. It's sufficient to mail proposed changes to ocfs2-devel@oss.oracle.com. diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index 4873c759d535..c1eb41cb9876 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block @@ -142,3 +142,67 @@ Description: with the previous I/O request are enabled. When set to 2, all merge tries are disabled. The default value is 0 - which enables all types of merge tries. + +What: /sys/block/<disk>/discard_alignment +Date: May 2011 +Contact: Martin K. Petersen <martin.petersen@oracle.com> +Description: + Devices that support discard functionality may + internally allocate space in units that are bigger than + the exported logical block size. The discard_alignment + parameter indicates how many bytes the beginning of the + device is offset from the internal allocation unit's + natural alignment. + +What: /sys/block/<disk>/<partition>/discard_alignment +Date: May 2011 +Contact: Martin K. Petersen <martin.petersen@oracle.com> +Description: + Devices that support discard functionality may + internally allocate space in units that are bigger than + the exported logical block size. The discard_alignment + parameter indicates how many bytes the beginning of the + partition is offset from the internal allocation unit's + natural alignment. + +What: /sys/block/<disk>/queue/discard_granularity +Date: May 2011 +Contact: Martin K. Petersen <martin.petersen@oracle.com> +Description: + Devices that support discard functionality may + internally allocate space using units that are bigger + than the logical block size. The discard_granularity + parameter indicates the size of the internal allocation + unit in bytes if reported by the device. Otherwise the + discard_granularity will be set to match the device's + physical block size. A discard_granularity of 0 means + that the device does not support discard functionality. + +What: /sys/block/<disk>/queue/discard_max_bytes +Date: May 2011 +Contact: Martin K. Petersen <martin.petersen@oracle.com> +Description: + Devices that support discard functionality may have + internal limits on the number of bytes that can be + trimmed or unmapped in a single operation. Some storage + protocols also have inherent limits on the number of + blocks that can be described in a single command. The + discard_max_bytes parameter is set by the device driver + to the maximum number of bytes that can be discarded in + a single operation. Discard requests issued to the + device must not exceed this limit. A discard_max_bytes + value of 0 means that the device does not support + discard functionality. + +What: /sys/block/<disk>/queue/discard_zeroes_data +Date: May 2011 +Contact: Martin K. Petersen <martin.petersen@oracle.com> +Description: + Devices that support discard functionality may return + stale or random data when a previously discarded block + is read back. This can cause problems if the filesystem + expects discarded blocks to be explicitly cleared. If a + device reports that it deterministically returns zeroes + when a discarded area is read the discard_zeroes_data + parameter will be set to one. Otherwise it will be 0 and + the result of reading a discarded area is undefined. diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 36bf454ba855..349ecf26ce10 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -74,6 +74,15 @@ Description: hot-remove the PCI device and any of its children. Depends on CONFIG_HOTPLUG. +What: /sys/bus/pci/devices/.../pci_bus/.../rescan +Date: May 2011 +Contact: Linux PCI developers <linux-pci@vger.kernel.org> +Description: + Writing a non-zero value to this attribute will + force a rescan of the bus and all child buses, + and re-discover devices removed earlier from this + part of the device tree. Depends on CONFIG_HOTPLUG. + What: /sys/bus/pci/devices/.../rescan Date: January 2009 Contact: Linux PCI developers <linux-pci@vger.kernel.org> diff --git a/Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 b/Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 new file mode 100644 index 000000000000..aa11dbdd794b --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 @@ -0,0 +1,56 @@ +What: /sys/class/backlight/<backlight>/<ambient light zone>_max +What: /sys/class/backlight/<backlight>/l1_daylight_max +What: /sys/class/backlight/<backlight>/l2_bright_max +What: /sys/class/backlight/<backlight>/l3_office_max +What: /sys/class/backlight/<backlight>/l4_indoor_max +What: /sys/class/backlight/<backlight>/l5_dark_max +Date: Mai 2011 +KernelVersion: 2.6.40 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Control the maximum brightness for <ambient light zone> + on this <backlight>. Values are between 0 and 127. This file + will also show the brightness level stored for this + <ambient light zone>. + +What: /sys/class/backlight/<backlight>/<ambient light zone>_dim +What: /sys/class/backlight/<backlight>/l2_bright_dim +What: /sys/class/backlight/<backlight>/l3_office_dim +What: /sys/class/backlight/<backlight>/l4_indoor_dim +What: /sys/class/backlight/<backlight>/l5_dark_dim +Date: Mai 2011 +KernelVersion: 2.6.40 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Control the dim brightness for <ambient light zone> + on this <backlight>. Values are between 0 and 127, typically + set to 0. Full off when the backlight is disabled. + This file will also show the dim brightness level stored for + this <ambient light zone>. + +What: /sys/class/backlight/<backlight>/ambient_light_level +Date: Mai 2011 +KernelVersion: 2.6.40 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Get conversion value of the light sensor. + This value is updated every 80 ms (when the light sensor + is enabled). Returns integer between 0 (dark) and + 8000 (max ambient brightness) + +What: /sys/class/backlight/<backlight>/ambient_light_zone +Date: Mai 2011 +KernelVersion: 2.6.40 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Get/Set current ambient light zone. Reading returns + integer between 1..5 (1 = daylight, 2 = bright, ..., 5 = dark). + Writing a value between 1..5 forces the backlight controller + to enter the corresponding ambient light zone. + Writing 0 returns to normal/automatic ambient light level + operation. The ambient light sensing feature on these devices + is an extension to the API documented in + Documentation/ABI/stable/sysfs-class-backlight. + It can be enabled by writing the value stored in + /sys/class/backlight/<backlight>/max_brightness to + /sys/class/backlight/<backlight>/brightness.
\ No newline at end of file diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-cleancache b/Documentation/ABI/testing/sysfs-kernel-mm-cleancache new file mode 100644 index 000000000000..662ae646ea12 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-mm-cleancache @@ -0,0 +1,11 @@ +What: /sys/kernel/mm/cleancache/ +Date: April 2011 +Contact: Dan Magenheimer <dan.magenheimer@oracle.com> +Description: + /sys/kernel/mm/cleancache/ contains a number of files which + record a count of various cleancache operations + (sum across all filesystems): + succ_gets + failed_gets + puts + flushes diff --git a/Documentation/ABI/testing/sysfs-ptp b/Documentation/ABI/testing/sysfs-ptp new file mode 100644 index 000000000000..d40d2b550502 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-ptp @@ -0,0 +1,98 @@ +What: /sys/class/ptp/ +Date: September 2010 +Contact: Richard Cochran <richardcochran@gmail.com> +Description: + This directory contains files and directories + providing a standardized interface to the ancillary + features of PTP hardware clocks. + +What: /sys/class/ptp/ptpN/ +Date: September 2010 +Contact: Richard Cochran <richardcochran@gmail.com> +Description: + This directory contains the attributes of the Nth PTP + hardware clock registered into the PTP class driver + subsystem. + +What: /sys/class/ptp/ptpN/clock_name +Date: September 2010 +Contact: Richard Cochran <richardcochran@gmail.com> +Description: + This file contains the name of the PTP hardware clock + as a human readable string. + +What: /sys/class/ptp/ptpN/max_adjustment +Date: September 2010 +Contact: Richard Cochran <richardcochran@gmail.com> +Description: + This file contains the PTP hardware clock's maximum + frequency adjustment value (a positive integer) in + parts per billion. + +What: /sys/class/ptp/ptpN/n_alarms +Date: September 2010 +Contact: Richard Cochran <richardcochran@gmail.com> +Description: + This file contains the number of periodic or one shot + alarms offer by the PTP hardware clock. + +What: /sys/class/ptp/ptpN/n_external_timestamps +Date: September 2010 +Contact: Richard Cochran <richardcochran@gmail.com> +Description: + This file contains the number of external timestamp + channels offered by the PTP hardware clock. + +What: /sys/class/ptp/ptpN/n_periodic_outputs +Date: September 2010 +Contact: Richard Cochran <richardcochran@gmail.com> +Description: + This file contains the number of programmable periodic + output channels offered by the PTP hardware clock. + +What: /sys/class/ptp/ptpN/pps_avaiable +Date: September 2010 +Contact: Richard Cochran <richardcochran@gmail.com> +Description: + This file indicates whether the PTP hardware clock + supports a Pulse Per Second to the host CPU. Reading + "1" means that the PPS is supported, while "0" means + not supported. + +What: /sys/class/ptp/ptpN/extts_enable +Date: September 2010 +Contact: Richard Cochran <richardcochran@gmail.com> +Description: + This write-only file enables or disables external + timestamps. To enable external timestamps, write the + channel index followed by a "1" into the file. + To disable external timestamps, write the channel + index followed by a "0" into the file. + +What: /sys/class/ptp/ptpN/fifo +Date: September 2010 +Contact: Richard Cochran <richardcochran@gmail.com> +Description: + This file provides timestamps on external events, in + the form of three integers: channel index, seconds, + and nanoseconds. + +What: /sys/class/ptp/ptpN/period +Date: September 2010 +Contact: Richard Cochran <richardcochran@gmail.com> +Description: + This write-only file enables or disables periodic + outputs. To enable a periodic output, write five + integers into the file: channel index, start time + seconds, start time nanoseconds, period seconds, and + period nanoseconds. To disable a periodic output, set + all the seconds and nanoseconds values to zero. + +What: /sys/class/ptp/ptpN/pps_enable +Date: September 2010 +Contact: Richard Cochran <richardcochran@gmail.com> +Description: + This write-only file enables or disables delivery of + PPS events to the Linux PPS subsystem. To enable PPS + events, write a "1" into the file. To disable events, + write a "0" into the file. diff --git a/Documentation/Changes b/Documentation/Changes index 5f4828a034e3..b17580885273 100644 --- a/Documentation/Changes +++ b/Documentation/Changes @@ -2,13 +2,7 @@ Intro ===== This document is designed to provide a list of the minimum levels of -software necessary to run the 2.6 kernels, as well as provide brief -instructions regarding any other "Gotchas" users may encounter when -trying life on the Bleeding Edge. If upgrading from a pre-2.4.x -kernel, please consult the Changes file included with 2.4.x kernels for -additional information; most of that information will not be repeated -here. Basically, this document assumes that your system is already -functional and running at least 2.4.x kernels. +software necessary to run the 3.0 kernels. This document is originally based on my "Changes" file for 2.0.x kernels and therefore owes credit to the same people as that file (Jared Mauch, @@ -22,11 +16,10 @@ Upgrade to at *least* these software revisions before thinking you've encountered a bug! If you're unsure what version you're currently running, the suggested command should tell you. -Again, keep in mind that this list assumes you are already -functionally running a Linux 2.4 kernel. Also, not all tools are -necessary on all systems; obviously, if you don't have any ISDN -hardware, for example, you probably needn't concern yourself with -isdn4k-utils. +Again, keep in mind that this list assumes you are already functionally +running a Linux kernel. Also, not all tools are necessary on all +systems; obviously, if you don't have any ISDN hardware, for example, +you probably needn't concern yourself with isdn4k-utils. o Gnu C 3.2 # gcc --version o Gnu make 3.80 # make --version @@ -114,12 +107,12 @@ Ksymoops If the unthinkable happens and your kernel oopses, you may need the ksymoops tool to decode it, but in most cases you don't. -In the 2.6 kernel it is generally preferred to build the kernel with -CONFIG_KALLSYMS so that it produces readable dumps that can be used as-is -(this also produces better output than ksymoops). -If for some reason your kernel is not build with CONFIG_KALLSYMS and -you have no way to rebuild and reproduce the Oops with that option, then -you can still decode that Oops with ksymoops. +It is generally preferred to build the kernel with CONFIG_KALLSYMS so +that it produces readable dumps that can be used as-is (this also +produces better output than ksymoops). If for some reason your kernel +is not build with CONFIG_KALLSYMS and you have no way to rebuild and +reproduce the Oops with that option, then you can still decode that Oops +with ksymoops. Module-Init-Tools ----------------- @@ -261,8 +254,8 @@ needs to be recompiled or (preferably) upgraded. NFS-utils --------- -In 2.4 and earlier kernels, the nfs server needed to know about any -client that expected to be able to access files via NFS. This +In ancient (2.4 and earlier) kernels, the nfs server needed to know +about any client that expected to be able to access files via NFS. This information would be given to the kernel by "mountd" when the client mounted the filesystem, or by "exportfs" at system startup. exportfs would take information about active clients from /var/lib/nfs/rmtab. @@ -272,11 +265,11 @@ which is not always easy, particularly when trying to implement fail-over. Even when the system is working well, rmtab suffers from getting lots of old entries that never get removed. -With 2.6 we have the option of having the kernel tell mountd when it -gets a request from an unknown host, and mountd can give appropriate -export information to the kernel. This removes the dependency on -rmtab and means that the kernel only needs to know about currently -active clients. +With modern kernels we have the option of having the kernel tell mountd +when it gets a request from an unknown host, and mountd can give +appropriate export information to the kernel. This removes the +dependency on rmtab and means that the kernel only needs to know about +currently active clients. To enable this new functionality, you need to: diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle index 58b0bf917834..fa6e25b94a54 100644 --- a/Documentation/CodingStyle +++ b/Documentation/CodingStyle @@ -680,8 +680,8 @@ ones already enabled by DEBUG. Chapter 14: Allocating memory The kernel provides the following general purpose memory allocators: -kmalloc(), kzalloc(), kcalloc(), and vmalloc(). Please refer to the API -documentation for further information about them. +kmalloc(), kzalloc(), kcalloc(), vmalloc(), and vzalloc(). Please refer to +the API documentation for further information about them. The preferred form for passing a size of a struct is the following: diff --git a/Documentation/DocBook/.gitignore b/Documentation/DocBook/.gitignore index c6def352fe39..679034cbd686 100644 --- a/Documentation/DocBook/.gitignore +++ b/Documentation/DocBook/.gitignore @@ -8,3 +8,4 @@ *.dvi *.log *.out +media/ diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 8436b018c289..3cebfa0d1611 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -73,7 +73,7 @@ installmandocs: mandocs ### #External programs used KERNELDOC = $(srctree)/scripts/kernel-doc -DOCPROC = $(objtree)/scripts/basic/docproc +DOCPROC = $(objtree)/scripts/docproc XMLTOFLAGS = -m $(srctree)/Documentation/DocBook/stylesheet.xsl XMLTOFLAGS += --skip-validation diff --git a/Documentation/DocBook/dvb/dvbapi.xml b/Documentation/DocBook/dvb/dvbapi.xml index ad8678d48916..9fad86ce7f5e 100644 --- a/Documentation/DocBook/dvb/dvbapi.xml +++ b/Documentation/DocBook/dvb/dvbapi.xml @@ -35,6 +35,14 @@ <revhistory> <!-- Put document revisions here, newest first. --> <revision> + <revnumber>2.0.4</revnumber> + <date>2011-05-06</date> + <authorinitials>mcc</authorinitials> + <revremark> + Add more information about DVB APIv5, better describing the frontend GET/SET props ioctl's. + </revremark> +</revision> +<revision> <revnumber>2.0.3</revnumber> <date>2010-07-03</date> <authorinitials>mcc</authorinitials> diff --git a/Documentation/DocBook/dvb/dvbproperty.xml b/Documentation/DocBook/dvb/dvbproperty.xml index 97f397e2fb3a..b5365f61d69b 100644 --- a/Documentation/DocBook/dvb/dvbproperty.xml +++ b/Documentation/DocBook/dvb/dvbproperty.xml @@ -1,6 +1,330 @@ -<section id="FE_GET_PROPERTY"> +<section id="FE_GET_SET_PROPERTY"> <title>FE_GET_PROPERTY/FE_SET_PROPERTY</title> +<programlisting> +/* Reserved fields should be set to 0 */ +struct dtv_property { + __u32 cmd; + union { + __u32 data; + struct { + __u8 data[32]; + __u32 len; + __u32 reserved1[3]; + void *reserved2; + } buffer; + } u; + int result; +} __attribute__ ((packed)); + +/* num of properties cannot exceed DTV_IOCTL_MAX_MSGS per ioctl */ +#define DTV_IOCTL_MAX_MSGS 64 + +struct dtv_properties { + __u32 num; + struct dtv_property *props; +}; +</programlisting> + +<section id="FE_GET_PROPERTY"> +<title>FE_GET_PROPERTY</title> +<para>DESCRIPTION +</para> +<informaltable><tgroup cols="1"><tbody><row><entry + align="char"> +<para>This ioctl call returns one or more frontend properties. This call only + requires read-only access to the device.</para> +</entry> + </row></tbody></tgroup></informaltable> +<para>SYNOPSIS +</para> +<informaltable><tgroup cols="1"><tbody><row><entry + align="char"> +<para>int ioctl(int fd, int request = <link linkend="FE_GET_PROPERTY">FE_GET_PROPERTY</link>, + dtv_properties ⋆props);</para> +</entry> + </row></tbody></tgroup></informaltable> +<para>PARAMETERS +</para> +<informaltable><tgroup cols="2"><tbody><row><entry align="char"> +<para>int fd</para> +</entry><entry + align="char"> +<para>File descriptor returned by a previous call to open().</para> +</entry> + </row><row><entry + align="char"> +<para>int num</para> +</entry><entry + align="char"> +<para>Equals <link linkend="FE_GET_PROPERTY">FE_GET_PROPERTY</link> for this command.</para> +</entry> + </row><row><entry + align="char"> +<para>struct dtv_property *props</para> +</entry><entry + align="char"> +<para>Points to the location where the front-end property commands are stored.</para> +</entry> + </row></tbody></tgroup></informaltable> +<para>ERRORS</para> +<informaltable><tgroup cols="2"><tbody><row> + <entry align="char"><para>EINVAL</para></entry> + <entry align="char"><para>Invalid parameter(s) received or number of parameters out of the range.</para></entry> + </row><row> + <entry align="char"><para>ENOMEM</para></entry> + <entry align="char"><para>Out of memory.</para></entry> + </row><row> + <entry align="char"><para>EFAULT</para></entry> + <entry align="char"><para>Failure while copying data from/to userspace.</para></entry> + </row><row> + <entry align="char"><para>EOPNOTSUPP</para></entry> + <entry align="char"><para>Property type not supported.</para></entry> + </row></tbody></tgroup></informaltable> +</section> + +<section id="FE_SET_PROPERTY"> +<title>FE_SET_PROPERTY</title> +<para>DESCRIPTION +</para> +<informaltable><tgroup cols="1"><tbody><row><entry + align="char"> +<para>This ioctl call sets one or more frontend properties. This call only + requires read-only access to the device.</para> +</entry> + </row></tbody></tgroup></informaltable> +<para>SYNOPSIS +</para> +<informaltable><tgroup cols="1"><tbody><row><entry + align="char"> +<para>int ioctl(int fd, int request = <link linkend="FE_SET_PROPERTY">FE_SET_PROPERTY</link>, + dtv_properties ⋆props);</para> +</entry> + </row></tbody></tgroup></informaltable> +<para>PARAMETERS +</para> +<informaltable><tgroup cols="2"><tbody><row><entry align="char"> +<para>int fd</para> +</entry><entry + align="char"> +<para>File descriptor returned by a previous call to open().</para> +</entry> + </row><row><entry + align="char"> +<para>int num</para> +</entry><entry + align="char"> +<para>Equals <link linkend="FE_SET_PROPERTY">FE_SET_PROPERTY</link> for this command.</para> +</entry> + </row><row><entry + align="char"> +<para>struct dtv_property *props</para> +</entry><entry + align="char"> +<para>Points to the location where the front-end property commands are stored.</para> +</entry> + </row></tbody></tgroup></informaltable> +<para>ERRORS +</para> +<informaltable><tgroup cols="2"><tbody><row> + <entry align="char"><para>EINVAL</para></entry> + <entry align="char"><para>Invalid parameter(s) received or number of parameters out of the range.</para></entry> + </row><row> + <entry align="char"><para>ENOMEM</para></entry> + <entry align="char"><para>Out of memory.</para></entry> + </row><row> + <entry align="char"><para>EFAULT</para></entry> + <entry align="char"><para>Failure while copying data from/to userspace.</para></entry> + </row><row> + <entry align="char"><para>EOPNOTSUPP</para></entry> + <entry align="char"><para>Property type not supported.</para></entry> + </row></tbody></tgroup></informaltable> +</section> + +<section> + <title>Property types</title> +<para> +On <link linkend="FE_GET_PROPERTY">FE_GET_PROPERTY</link>/<link linkend="FE_SET_PROPERTY">FE_SET_PROPERTY</link>, +the actual action is determined by the dtv_property cmd/data pairs. With one single ioctl, is possible to +get/set up to 64 properties. The actual meaning of each property is described on the next sections. +</para> + +<para>The available frontend property types are:</para> +<programlisting> +#define DTV_UNDEFINED 0 +#define DTV_TUNE 1 +#define DTV_CLEAR 2 +#define DTV_FREQUENCY 3 +#define DTV_MODULATION 4 +#define DTV_BANDWIDTH_HZ 5 +#define DTV_INVERSION 6 +#define DTV_DISEQC_MASTER 7 +#define DTV_SYMBOL_RATE 8 +#define DTV_INNER_FEC 9 +#define DTV_VOLTAGE 10 +#define DTV_TONE 11 +#define DTV_PILOT 12 +#define DTV_ROLLOFF 13 +#define DTV_DISEQC_SLAVE_REPLY 14 +#define DTV_FE_CAPABILITY_COUNT 15 +#define DTV_FE_CAPABILITY 16 +#define DTV_DELIVERY_SYSTEM 17 +#define DTV_ISDBT_PARTIAL_RECEPTION 18 +#define DTV_ISDBT_SOUND_BROADCASTING 19 +#define DTV_ISDBT_SB_SUBCHANNEL_ID 20 +#define DTV_ISDBT_SB_SEGMENT_IDX 21 +#define DTV_ISDBT_SB_SEGMENT_COUNT 22 +#define DTV_ISDBT_LAYERA_FEC 23 +#define DTV_ISDBT_LAYERA_MODULATION 24 +#define DTV_ISDBT_LAYERA_SEGMENT_COUNT 25 +#define DTV_ISDBT_LAYERA_TIME_INTERLEAVING 26 +#define DTV_ISDBT_LAYERB_FEC 27 +#define DTV_ISDBT_LAYERB_MODULATION 28 +#define DTV_ISDBT_LAYERB_SEGMENT_COUNT 29 +#define DTV_ISDBT_LAYERB_TIME_INTERLEAVING 30 +#define DTV_ISDBT_LAYERC_FEC 31 +#define DTV_ISDBT_LAYERC_MODULATION 32 +#define DTV_ISDBT_LAYERC_SEGMENT_COUNT 33 +#define DTV_ISDBT_LAYERC_TIME_INTERLEAVING 34 +#define DTV_API_VERSION 35 +#define DTV_CODE_RATE_HP 36 +#define DTV_CODE_RATE_LP 37 +#define DTV_GUARD_INTERVAL 38 +#define DTV_TRANSMISSION_MODE 39 +#define DTV_HIERARCHY 40 +#define DTV_ISDBT_LAYER_ENABLED 41 +#define DTV_ISDBS_TS_ID 42 +</programlisting> +</section> + +<section id="fe_property_common"> + <title>Parameters that are common to all Digital TV standards</title> + <section id="DTV_FREQUENCY"> + <title><constant>DTV_FREQUENCY</constant></title> + + <para>Central frequency of the channel, in HZ.</para> + + <para>Notes:</para> + <para>1)For ISDB-T, the channels are usually transmitted with an offset of 143kHz. + E.g. a valid frequncy could be 474143 kHz. The stepping is bound to the bandwidth of + the channel which is 6MHz.</para> + + <para>2)As in ISDB-Tsb the channel consists of only one or three segments the + frequency step is 429kHz, 3*429 respectively. As for ISDB-T the + central frequency of the channel is expected.</para> + </section> + + <section id="DTV_BANDWIDTH_HZ"> + <title><constant>DTV_BANDWIDTH_HZ</constant></title> + + <para>Bandwidth for the channel, in HZ.</para> + + <para>Possible values: + <constant>1712000</constant>, + <constant>5000000</constant>, + <constant>6000000</constant>, + <constant>7000000</constant>, + <constant>8000000</constant>, + <constant>10000000</constant>. + </para> + + <para>Notes:</para> + + <para>1) For ISDB-T it should be always 6000000Hz (6MHz)</para> + <para>2) For ISDB-Tsb it can vary depending on the number of connected segments</para> + <para>3) Bandwidth doesn't apply for DVB-C transmissions, as the bandwidth + for DVB-C depends on the symbol rate</para> + <para>4) Bandwidth in ISDB-T is fixed (6MHz) or can be easily derived from + other parameters (DTV_ISDBT_SB_SEGMENT_IDX, + DTV_ISDBT_SB_SEGMENT_COUNT).</para> + <para>5) DVB-T supports 6, 7 and 8MHz.</para> + <para>6) In addition, DVB-T2 supports 1.172, 5 and 10MHz.</para> + </section> + + <section id="DTV_DELIVERY_SYSTEM"> + <title><constant>DTV_DELIVERY_SYSTEM</constant></title> + + <para>Specifies the type of Delivery system</para> + + <para>Possible values: </para> +<programlisting> +typedef enum fe_delivery_system { + SYS_UNDEFINED, + SYS_DVBC_ANNEX_AC, + SYS_DVBC_ANNEX_B, + SYS_DVBT, + SYS_DSS, + SYS_DVBS, + SYS_DVBS2, + SYS_DVBH, + SYS_ISDBT, + SYS_ISDBS, + SYS_ISDBC, + SYS_ATSC, + SYS_ATSCMH, + SYS_DMBTH, + SYS_CMMB, + SYS_DAB, + SYS_DVBT2, +} fe_delivery_system_t; +</programlisting> + + </section> + + <section id="DTV_TRANSMISSION_MODE"> + <title><constant>DTV_TRANSMISSION_MODE</constant></title> + + <para>Specifies the number of carriers used by the standard</para> + + <para>Possible values are:</para> +<programlisting> +typedef enum fe_transmit_mode { + TRANSMISSION_MODE_2K, + TRANSMISSION_MODE_8K, + TRANSMISSION_MODE_AUTO, + TRANSMISSION_MODE_4K, + TRANSMISSION_MODE_1K, + TRANSMISSION_MODE_16K, + TRANSMISSION_MODE_32K, +} fe_transmit_mode_t; +</programlisting> + + <para>Notes:</para> + <para>1) ISDB-T supports three carrier/symbol-size: 8K, 4K, 2K. It is called + 'mode' in the standard: Mode 1 is 2K, mode 2 is 4K, mode 3 is 8K</para> + + <para>2) If <constant>DTV_TRANSMISSION_MODE</constant> is set the <constant>TRANSMISSION_MODE_AUTO</constant> the + hardware will try to find the correct FFT-size (if capable) and will + use TMCC to fill in the missing parameters.</para> + <para>3) DVB-T specifies 2K and 8K as valid sizes.</para> + <para>4) DVB-T2 specifies 1K, 2K, 4K, 8K, 16K and 32K.</para> + </section> + + <section id="DTV_GUARD_INTERVAL"> + <title><constant>DTV_GUARD_INTERVAL</constant></title> + + <para>Possible values are:</para> +<programlisting> +typedef enum fe_guard_interval { + GUARD_INTERVAL_1_32, + GUARD_INTERVAL_1_16, + GUARD_INTERVAL_1_8, + GUARD_INTERVAL_1_4, + GUARD_INTERVAL_AUTO, + GUARD_INTERVAL_1_128, + GUARD_INTERVAL_19_128, + GUARD_INTERVAL_19_256, +} fe_guard_interval_t; +</programlisting> + + <para>Notes:</para> + <para>1) If <constant>DTV_GUARD_INTERVAL</constant> is set the <constant>GUARD_INTERVAL_AUTO</constant> the hardware will + try to find the correct guard interval (if capable) and will use TMCC to fill + in the missing parameters.</para> + <para>2) Intervals 1/128, 19/128 and 19/256 are used only for DVB-T2 at present</para> + </section> +</section> + <section id="isdbt"> <title>ISDB-T frontend</title> <para>This section describes shortly what are the possible parameters in the Linux @@ -32,73 +356,6 @@ <para>Parameters used by ISDB-T and ISDB-Tsb.</para> - <section id="isdbt-parms"> - <title>Parameters that are common with DVB-T and ATSC</title> - - <section id="isdbt-freq"> - <title><constant>DTV_FREQUENCY</constant></title> - - <para>Central frequency of the channel.</para> - - <para>For ISDB-T the channels are usually transmitted with an offset of 143kHz. E.g. a - valid frequncy could be 474143 kHz. The stepping is bound to the bandwidth of - the channel which is 6MHz.</para> - - <para>As in ISDB-Tsb the channel consists of only one or three segments the - frequency step is 429kHz, 3*429 respectively. As for ISDB-T the - central frequency of the channel is expected.</para> - </section> - - <section id="isdbt-bw"> - <title><constant>DTV_BANDWIDTH_HZ</constant> (optional)</title> - - <para>Possible values:</para> - - <para>For ISDB-T it should be always 6000000Hz (6MHz)</para> - <para>For ISDB-Tsb it can vary depending on the number of connected segments</para> - - <para>Note: Hardware specific values might be given here, but standard - applications should not bother to set a value to this field as - standard demods are ignoring it anyway.</para> - - <para>Bandwidth in ISDB-T is fixed (6MHz) or can be easily derived from - other parameters (DTV_ISDBT_SB_SEGMENT_IDX, - DTV_ISDBT_SB_SEGMENT_COUNT).</para> - </section> - - <section id="isdbt-delivery-sys"> - <title><constant>DTV_DELIVERY_SYSTEM</constant></title> - - <para>Possible values: <constant>SYS_ISDBT</constant></para> - </section> - - <section id="isdbt-tx-mode"> - <title><constant>DTV_TRANSMISSION_MODE</constant></title> - - <para>ISDB-T supports three carrier/symbol-size: 8K, 4K, 2K. It is called - 'mode' in the standard: Mode 1 is 2K, mode 2 is 4K, mode 3 is 8K</para> - - <para>Possible values: <constant>TRANSMISSION_MODE_2K</constant>, <constant>TRANSMISSION_MODE_8K</constant>, - <constant>TRANSMISSION_MODE_AUTO</constant>, <constant>TRANSMISSION_MODE_4K</constant></para> - - <para>If <constant>DTV_TRANSMISSION_MODE</constant> is set the <constant>TRANSMISSION_MODE_AUTO</constant> the - hardware will try to find the correct FFT-size (if capable) and will - use TMCC to fill in the missing parameters.</para> - - <para><constant>TRANSMISSION_MODE_4K</constant> is added at the same time as the other new parameters.</para> - </section> - - <section id="isdbt-guard-interval"> - <title><constant>DTV_GUARD_INTERVAL</constant></title> - - <para>Possible values: <constant>GUARD_INTERVAL_1_32</constant>, <constant>GUARD_INTERVAL_1_16</constant>, <constant>GUARD_INTERVAL_1_8</constant>, - <constant>GUARD_INTERVAL_1_4</constant>, <constant>GUARD_INTERVAL_AUTO</constant></para> - - <para>If <constant>DTV_GUARD_INTERVAL</constant> is set the <constant>GUARD_INTERVAL_AUTO</constant> the hardware will - try to find the correct guard interval (if capable) and will use TMCC to fill - in the missing parameters.</para> - </section> - </section> <section id="isdbt-new-parms"> <title>ISDB-T only parameters</title> @@ -314,5 +571,20 @@ </section> </section> </section> + <section id="dvbt2-params"> + <title>DVB-T2 parameters</title> + + <para>This section covers parameters that apply only to the DVB-T2 delivery method. DVB-T2 + support is currently in the early stages development so expect this section to grow + and become more detailed with time.</para> + + <section id="dvbt2-plp-id"> + <title><constant>DTV_DVBT2_PLP_ID</constant></title> + + <para>DVB-T2 supports Physical Layer Pipes (PLP) to allow transmission of + many data types via a single multiplex. The API will soon support this + at which point this section will be expanded.</para> + </section> + </section> </section> </section> diff --git a/Documentation/DocBook/dvb/frontend.h.xml b/Documentation/DocBook/dvb/frontend.h.xml index d08e0d401418..d792f789ad3b 100644 --- a/Documentation/DocBook/dvb/frontend.h.xml +++ b/Documentation/DocBook/dvb/frontend.h.xml @@ -176,14 +176,20 @@ typedef enum fe_transmit_mode { TRANSMISSION_MODE_2K, TRANSMISSION_MODE_8K, TRANSMISSION_MODE_AUTO, - TRANSMISSION_MODE_4K + TRANSMISSION_MODE_4K, + TRANSMISSION_MODE_1K, + TRANSMISSION_MODE_16K, + TRANSMISSION_MODE_32K, } fe_transmit_mode_t; typedef enum fe_bandwidth { BANDWIDTH_8_MHZ, BANDWIDTH_7_MHZ, BANDWIDTH_6_MHZ, - BANDWIDTH_AUTO + BANDWIDTH_AUTO, + BANDWIDTH_5_MHZ, + BANDWIDTH_10_MHZ, + BANDWIDTH_1_712_MHZ, } fe_bandwidth_t; @@ -192,7 +198,10 @@ typedef enum fe_guard_interval { GUARD_INTERVAL_1_16, GUARD_INTERVAL_1_8, GUARD_INTERVAL_1_4, - GUARD_INTERVAL_AUTO + GUARD_INTERVAL_AUTO, + GUARD_INTERVAL_1_128, + GUARD_INTERVAL_19_128, + GUARD_INTERVAL_19_256, } fe_guard_interval_t; @@ -306,7 +315,9 @@ struct dvb_frontend_event { #define DTV_ISDBS_TS_ID 42 -#define DTV_MAX_COMMAND DTV_ISDBS_TS_ID +#define DTV_DVBT2_PLP_ID 43 + +#define DTV_MAX_COMMAND DTV_DVBT2_PLP_ID typedef enum fe_pilot { PILOT_ON, @@ -338,6 +349,7 @@ typedef enum fe_delivery_system { SYS_DMBTH, SYS_CMMB, SYS_DAB, + SYS_DVBT2, } fe_delivery_system_t; struct dtv_cmds_h { diff --git a/Documentation/DocBook/media-entities.tmpl b/Documentation/DocBook/media-entities.tmpl index fea63b45471a..e5fe09430fd9 100644 --- a/Documentation/DocBook/media-entities.tmpl +++ b/Documentation/DocBook/media-entities.tmpl @@ -270,6 +270,7 @@ <!ENTITY sub-write SYSTEM "v4l/func-write.xml"> <!ENTITY sub-io SYSTEM "v4l/io.xml"> <!ENTITY sub-grey SYSTEM "v4l/pixfmt-grey.xml"> +<!ENTITY sub-m420 SYSTEM "v4l/pixfmt-m420.xml"> <!ENTITY sub-nv12 SYSTEM "v4l/pixfmt-nv12.xml"> <!ENTITY sub-nv12m SYSTEM "v4l/pixfmt-nv12m.xml"> <!ENTITY sub-nv12mt SYSTEM "v4l/pixfmt-nv12mt.xml"> @@ -292,9 +293,11 @@ <!ENTITY sub-yuyv SYSTEM "v4l/pixfmt-yuyv.xml"> <!ENTITY sub-yvyu SYSTEM "v4l/pixfmt-yvyu.xml"> <!ENTITY sub-srggb10 SYSTEM "v4l/pixfmt-srggb10.xml"> +<!ENTITY sub-srggb12 SYSTEM "v4l/pixfmt-srggb12.xml"> <!ENTITY sub-srggb8 SYSTEM "v4l/pixfmt-srggb8.xml"> <!ENTITY sub-y10 SYSTEM "v4l/pixfmt-y10.xml"> <!ENTITY sub-y12 SYSTEM "v4l/pixfmt-y12.xml"> +<!ENTITY sub-y10b SYSTEM "v4l/pixfmt-y10b.xml"> <!ENTITY sub-pixfmt SYSTEM "v4l/pixfmt.xml"> <!ENTITY sub-cropcap SYSTEM "v4l/vidioc-cropcap.xml"> <!ENTITY sub-dbg-g-register SYSTEM "v4l/vidioc-dbg-g-register.xml"> @@ -371,9 +374,9 @@ <!ENTITY sub-media-indices SYSTEM "media-indices.tmpl"> <!ENTITY sub-media-controller SYSTEM "v4l/media-controller.xml"> -<!ENTITY sub-media-open SYSTEM "v4l/media-func-open.xml"> -<!ENTITY sub-media-close SYSTEM "v4l/media-func-close.xml"> -<!ENTITY sub-media-ioctl SYSTEM "v4l/media-func-ioctl.xml"> +<!ENTITY sub-media-func-open SYSTEM "v4l/media-func-open.xml"> +<!ENTITY sub-media-func-close SYSTEM "v4l/media-func-close.xml"> +<!ENTITY sub-media-func-ioctl SYSTEM "v4l/media-func-ioctl.xml"> <!ENTITY sub-media-ioc-device-info SYSTEM "v4l/media-ioc-device-info.xml"> <!ENTITY sub-media-ioc-enum-entities SYSTEM "v4l/media-ioc-enum-entities.xml"> <!ENTITY sub-media-ioc-enum-links SYSTEM "v4l/media-ioc-enum-links.xml"> diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl index 6f242d5dee9a..17910e2052ad 100644 --- a/Documentation/DocBook/mtdnand.tmpl +++ b/Documentation/DocBook/mtdnand.tmpl @@ -189,8 +189,7 @@ static void __iomem *baseaddr; <title>Partition defines</title> <para> If you want to divide your device into partitions, then - enable the configuration switch CONFIG_MTD_PARTITIONS and define - a partitioning scheme suitable to your board. + define a partitioning scheme suitable to your board. </para> <programlisting> #define NUM_PARTITIONS 2 diff --git a/Documentation/DocBook/v4l/media-controller.xml b/Documentation/DocBook/v4l/media-controller.xml index 2dc25e1d4089..873ac3a621f0 100644 --- a/Documentation/DocBook/v4l/media-controller.xml +++ b/Documentation/DocBook/v4l/media-controller.xml @@ -78,9 +78,9 @@ <appendix id="media-user-func"> <title>Function Reference</title> <!-- Keep this alphabetically sorted. --> - &sub-media-open; - &sub-media-close; - &sub-media-ioctl; + &sub-media-func-open; + &sub-media-func-close; + &sub-media-func-ioctl; <!-- All ioctls go here. --> &sub-media-ioc-device-info; &sub-media-ioc-enum-entities; diff --git a/Documentation/DocBook/v4l/pixfmt-m420.xml b/Documentation/DocBook/v4l/pixfmt-m420.xml new file mode 100644 index 000000000000..ce4bc019e5c0 --- /dev/null +++ b/Documentation/DocBook/v4l/pixfmt-m420.xml @@ -0,0 +1,147 @@ + <refentry id="V4L2-PIX-FMT-M420"> + <refmeta> + <refentrytitle>V4L2_PIX_FMT_M420 ('M420')</refentrytitle> + &manvol; + </refmeta> + <refnamediv> + <refname><constant>V4L2_PIX_FMT_M420</constant></refname> + <refpurpose>Format with ½ horizontal and vertical chroma + resolution, also known as YUV 4:2:0. Hybrid plane line-interleaved + layout.</refpurpose> + </refnamediv> + <refsect1> + <title>Description</title> + + <para>M420 is a YUV format with ½ horizontal and vertical chroma + subsampling (YUV 4:2:0). Pixels are organized as interleaved luma and + chroma planes. Two lines of luma data are followed by one line of chroma + data.</para> + <para>The luma plane has one byte per pixel. The chroma plane contains + interleaved CbCr pixels subsampled by ½ in the horizontal and + vertical directions. Each CbCr pair belongs to four pixels. For example, +Cb<subscript>0</subscript>/Cr<subscript>0</subscript> belongs to +Y'<subscript>00</subscript>, Y'<subscript>01</subscript>, +Y'<subscript>10</subscript>, Y'<subscript>11</subscript>.</para> + + <para>All line lengths are identical: if the Y lines include pad bytes + so do the CbCr lines.</para> + + <example> + <title><constant>V4L2_PIX_FMT_M420</constant> 4 × 4 +pixel image</title> + + <formalpara> + <title>Byte Order.</title> + <para>Each cell is one byte. + <informaltable frame="none"> + <tgroup cols="5" align="center"> + <colspec align="left" colwidth="2*" /> + <tbody valign="top"> + <row> + <entry>start + 0:</entry> + <entry>Y'<subscript>00</subscript></entry> + <entry>Y'<subscript>01</subscript></entry> + <entry>Y'<subscript>02</subscript></entry> + <entry>Y'<subscript>03</subscript></entry> + </row> + <row> + <entry>start + 4:</entry> + <entry>Y'<subscript>10</subscript></entry> + <entry>Y'<subscript>11</subscript></entry> + <entry>Y'<subscript>12</subscript></entry> + <entry>Y'<subscript>13</subscript></entry> + </row> + <row> + <entry>start + 8:</entry> + <entry>Cb<subscript>00</subscript></entry> + <entry>Cr<subscript>00</subscript></entry> + <entry>Cb<subscript>01</subscript></entry> + <entry>Cr<subscript>01</subscript></entry> + </row> + <row> + <entry>start + 16:</entry> + <entry>Y'<subscript>20</subscript></entry> + <entry>Y'<subscript>21</subscript></entry> + <entry>Y'<subscript>22</subscript></entry> + <entry>Y'<subscript>23</subscript></entry> + </row> + <row> + <entry>start + 20:</entry> + <entry>Y'<subscript>30</subscript></entry> + <entry>Y'<subscript>31</subscript></entry> + <entry>Y'<subscript>32</subscript></entry> + <entry>Y'<subscript>33</subscript></entry> + </row> + <row> + <entry>start + 24:</entry> + <entry>Cb<subscript>10</subscript></entry> + <entry>Cr<subscript>10</subscript></entry> + <entry>Cb<subscript>11</subscript></entry> + <entry>Cr<subscript>11</subscript></entry> + </row> + </tbody> + </tgroup> + </informaltable> + </para> + </formalpara> + + <formalpara> + <title>Color Sample Location.</title> + <para> + <informaltable frame="none"> + <tgroup cols="7" align="center"> + <tbody valign="top"> + <row> + <entry></entry> + <entry>0</entry><entry></entry><entry>1</entry><entry></entry> + <entry>2</entry><entry></entry><entry>3</entry> + </row> + <row> + <entry>0</entry> + <entry>Y</entry><entry></entry><entry>Y</entry><entry></entry> + <entry>Y</entry><entry></entry><entry>Y</entry> + </row> + <row> + <entry></entry> + <entry></entry><entry>C</entry><entry></entry><entry></entry> + <entry></entry><entry>C</entry><entry></entry> + </row> + <row> + <entry>1</entry> + <entry>Y</entry><entry></entry><entry>Y</entry><entry></entry> + <entry>Y</entry><entry></entry><entry>Y</entry> + </row> + <row> + <entry></entry> + </row> + <row> + <entry>2</entry> + <entry>Y</entry><entry></entry><entry>Y</entry><entry></entry> + <entry>Y</entry><entry></entry><entry>Y</entry> + </row> + <row> + <entry></entry> + <entry></entry><entry>C</entry><entry></entry><entry></entry> + <entry></entry><entry>C</entry><entry></entry> + </row> + <row> + <entry>3</entry> + <entry>Y</entry><entry></entry><entry>Y</entry><entry></entry> + <entry>Y</entry><entry></entry><entry>Y</entry> + </row> + </tbody> + </tgroup> + </informaltable> + </para> + </formalpara> + </example> + </refsect1> + </refentry> + + <!-- +Local Variables: +mode: sgml +sgml-parent-document: "pixfmt.sgml" +indent-tabs-mode: nil +End: + --> diff --git a/Documentation/DocBook/v4l/pixfmt-y10b.xml b/Documentation/DocBook/v4l/pixfmt-y10b.xml new file mode 100644 index 000000000000..adb0ad808c93 --- /dev/null +++ b/Documentation/DocBook/v4l/pixfmt-y10b.xml @@ -0,0 +1,43 @@ +<refentry id="V4L2-PIX-FMT-Y10BPACK"> + <refmeta> + <refentrytitle>V4L2_PIX_FMT_Y10BPACK ('Y10B')</refentrytitle> + &manvol; + </refmeta> + <refnamediv> + <refname><constant>V4L2_PIX_FMT_Y10BPACK</constant></refname> + <refpurpose>Grey-scale image as a bit-packed array</refpurpose> + </refnamediv> + <refsect1> + <title>Description</title> + + <para>This is a packed grey-scale image format with a depth of 10 bits per + pixel. Pixels are stored in a bit-packed array of 10bit bits per pixel, + with no padding between them and with the most significant bits coming + first from the left.</para> + + <example> + <title><constant>V4L2_PIX_FMT_Y10BPACK</constant> 4 pixel data stream taking 5 bytes</title> + + <formalpara> + <title>Bit-packed representation</title> + <para>pixels cross the byte boundary and have a ratio of 5 bytes for each 4 + pixels. + <informaltable frame="all"> + <tgroup cols="5" align="center"> + <colspec align="left" colwidth="2*" /> + <tbody valign="top"> + <row> + <entry>Y'<subscript>00[9:2]</subscript></entry> + <entry>Y'<subscript>00[1:0]</subscript>Y'<subscript>01[9:4]</subscript></entry> + <entry>Y'<subscript>01[3:0]</subscript>Y'<subscript>02[9:6]</subscript></entry> + <entry>Y'<subscript>02[5:0]</subscript>Y'<subscript>03[9:8]</subscript></entry> + <entry>Y'<subscript>03[7:0]</subscript></entry> + </row> + </tbody> + </tgroup> + </informaltable> + </para> + </formalpara> + </example> + </refsect1> +</refentry> diff --git a/Documentation/DocBook/v4l/pixfmt.xml b/Documentation/DocBook/v4l/pixfmt.xml index 40af4beb48b9..deb660207f94 100644 --- a/Documentation/DocBook/v4l/pixfmt.xml +++ b/Documentation/DocBook/v4l/pixfmt.xml @@ -673,6 +673,7 @@ access the palette, this must be done with ioctls of the Linux framebuffer API.< &sub-srggb8; &sub-sbggr16; &sub-srggb10; + &sub-srggb12; </section> <section id="yuv-formats"> @@ -697,6 +698,7 @@ information.</para> &sub-grey; &sub-y10; &sub-y12; + &sub-y10b; &sub-y16; &sub-yuyv; &sub-uyvy; @@ -712,6 +714,7 @@ information.</para> &sub-nv12m; &sub-nv12mt; &sub-nv16; + &sub-m420; </section> <section> diff --git a/Documentation/DocBook/v4l/subdev-formats.xml b/Documentation/DocBook/v4l/subdev-formats.xml index d7ccd25edcc1..8d3409d2c632 100644 --- a/Documentation/DocBook/v4l/subdev-formats.xml +++ b/Documentation/DocBook/v4l/subdev-formats.xml @@ -2522,5 +2522,51 @@ </tgroup> </table> </section> + + <section> + <title>JPEG Compressed Formats</title> + + <para>Those data formats consist of an ordered sequence of 8-bit bytes + obtained from JPEG compression process. Additionally to the + <constant>_JPEG</constant> prefix the format code is made of + the following information. + <itemizedlist> + <listitem><para>The number of bus samples per entropy encoded byte.</para></listitem> + <listitem><para>The bus width.</para></listitem> + </itemizedlist> + </para> + + <para>For instance, for a JPEG baseline process and an 8-bit bus width + the format will be named <constant>V4L2_MBUS_FMT_JPEG_1X8</constant>. + </para> + + <para>The following table lists existing JPEG compressed formats.</para> + + <table pgwide="0" frame="none" id="v4l2-mbus-pixelcode-jpeg"> + <title>JPEG Formats</title> + <tgroup cols="3"> + <colspec colname="id" align="left" /> + <colspec colname="code" align="left"/> + <colspec colname="remarks" align="left"/> + <thead> + <row> + <entry>Identifier</entry> + <entry>Code</entry> + <entry>Remarks</entry> + </row> + </thead> + <tbody valign="top"> + <row id="V4L2-MBUS-FMT-JPEG-1X8"> + <entry>V4L2_MBUS_FMT_JPEG_1X8</entry> + <entry>0x4001</entry> + <entry>Besides of its usage for the parallel bus this format is + recommended for transmission of JPEG data over MIPI CSI bus + using the User Defined 8-bit Data types. + </entry> + </row> + </tbody> + </tgroup> + </table> + </section> </section> </section> diff --git a/Documentation/DocBook/v4l/videodev2.h.xml b/Documentation/DocBook/v4l/videodev2.h.xml index 2b796a2ee98a..c50536a4f596 100644 --- a/Documentation/DocBook/v4l/videodev2.h.xml +++ b/Documentation/DocBook/v4l/videodev2.h.xml @@ -311,6 +311,9 @@ struct <link linkend="v4l2-pix-format">v4l2_pix_format</link> { #define <link linkend="V4L2-PIX-FMT-Y10">V4L2_PIX_FMT_Y10</link> v4l2_fourcc('Y', '1', '0', ' ') /* 10 Greyscale */ #define <link linkend="V4L2-PIX-FMT-Y16">V4L2_PIX_FMT_Y16</link> v4l2_fourcc('Y', '1', '6', ' ') /* 16 Greyscale */ +/* Grey bit-packed formats */ +#define <link linkend="V4L2-PIX-FMT-Y10BPACK">V4L2_PIX_FMT_Y10BPACK</link> v4l2_fourcc('Y', '1', '0', 'B') /* 10 Greyscale bit-packed */ + /* Palette formats */ #define <link linkend="V4L2-PIX-FMT-PAL8">V4L2_PIX_FMT_PAL8</link> v4l2_fourcc('P', 'A', 'L', '8') /* 8 8-bit palette */ @@ -333,6 +336,7 @@ struct <link linkend="v4l2-pix-format">v4l2_pix_format</link> { #define <link linkend="V4L2-PIX-FMT-YUV420">V4L2_PIX_FMT_YUV420</link> v4l2_fourcc('Y', 'U', '1', '2') /* 12 YUV 4:2:0 */ #define <link linkend="V4L2-PIX-FMT-HI240">V4L2_PIX_FMT_HI240</link> v4l2_fourcc('H', 'I', '2', '4') /* 8 8-bit color */ #define <link linkend="V4L2-PIX-FMT-HM12">V4L2_PIX_FMT_HM12</link> v4l2_fourcc('H', 'M', '1', '2') /* 8 YUV 4:2:0 16x16 macroblocks */ +#define <link linkend="V4L2-PIX-FMT-M420">V4L2_PIX_FMT_M420</link> v4l2_fourcc('M', '4', '2', '0') /* 12 YUV 4:2:0 2 lines y, 1 line uv interleaved */ /* two planes -- one Y, one Cr + Cb interleaved */ #define <link linkend="V4L2-PIX-FMT-NV12">V4L2_PIX_FMT_NV12</link> v4l2_fourcc('N', 'V', '1', '2') /* 12 Y/CbCr 4:2:0 */ diff --git a/Documentation/HOWTO b/Documentation/HOWTO index 365bda9a0d94..81bc1a9ab9d8 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO @@ -209,7 +209,7 @@ tools. One such tool that is particularly recommended is the Linux Cross-Reference project, which is able to present source code in a self-referential, indexed webpage format. An excellent up-to-date repository of the kernel code may be found at: - http://users.sosdg.org/~qiyong/lxr/ + http://lxr.linux.no/+trees The development process diff --git a/Documentation/IRQ-affinity.txt b/Documentation/IRQ-affinity.txt index b4a615b78403..7890fae18529 100644 --- a/Documentation/IRQ-affinity.txt +++ b/Documentation/IRQ-affinity.txt @@ -4,10 +4,11 @@ ChangeLog: SMP IRQ affinity -/proc/irq/IRQ#/smp_affinity specifies which target CPUs are permitted -for a given IRQ source. It's a bitmask of allowed CPUs. It's not allowed -to turn off all CPUs, and if an IRQ controller does not support IRQ -affinity then the value will not change from the default 0xffffffff. +/proc/irq/IRQ#/smp_affinity and /proc/irq/IRQ#/smp_affinity_list specify +which target CPUs are permitted for a given IRQ source. It's a bitmask +(smp_affinity) or cpu list (smp_affinity_list) of allowed CPUs. It's not +allowed to turn off all CPUs, and if an IRQ controller does not support +IRQ affinity then the value will not change from the default of all cpus. /proc/irq/default_smp_affinity specifies default affinity mask that applies to all non-active IRQs. Once IRQ is allocated/activated its affinity bitmask @@ -54,3 +55,11 @@ round-trip min/avg/max = 0.1/0.5/585.4 ms This time around IRQ44 was delivered only to the last four processors. i.e counters for the CPU0-3 did not change. +Here is an example of limiting that same irq (44) to cpus 1024 to 1031: + +[root@moon 44]# echo 1024-1031 > smp_affinity +[root@moon 44]# cat smp_affinity +1024-1031 + +Note that to do this with a bitmask would require 32 bitmasks of zero +to follow the pertinent one. diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index c078ad48f7a1..8173cec473aa 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -99,18 +99,11 @@ o "qp" indicates that RCU still expects a quiescent state from o "dt" is the current value of the dyntick counter that is incremented when entering or leaving dynticks idle state, either by the - scheduler or by irq. The number after the "/" is the interrupt - nesting depth when in dyntick-idle state, or one greater than - the interrupt-nesting depth otherwise. - - This field is displayed only for CONFIG_NO_HZ kernels. - -o "dn" is the current value of the dyntick counter that is incremented - when entering or leaving dynticks idle state via NMI. If both - the "dt" and "dn" values are even, then this CPU is in dynticks - idle mode and may be ignored by RCU. If either of these two - counters is odd, then RCU must be alert to the possibility of - an RCU read-side critical section running on this CPU. + scheduler or by irq. This number is even if the CPU is in + dyntick idle mode and odd otherwise. The number after the first + "/" is the interrupt nesting depth when in dyntick-idle state, + or one greater than the interrupt-nesting depth otherwise. + The number after the second "/" is the NMI nesting depth. This field is displayed only for CONFIG_NO_HZ kernels. diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index e439cd0d3375..569f3532e138 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -714,10 +714,11 @@ Jeff Garzik, "Linux kernel patch submission format". <http://linux.yyz.us/patch-format.html> Greg Kroah-Hartman, "How to piss off a kernel subsystem maintainer". - <http://www.kroah.com/log/2005/03/31/> - <http://www.kroah.com/log/2005/07/08/> - <http://www.kroah.com/log/2005/10/19/> - <http://www.kroah.com/log/2006/01/11/> + <http://www.kroah.com/log/linux/maintainer.html> + <http://www.kroah.com/log/linux/maintainer-02.html> + <http://www.kroah.com/log/linux/maintainer-03.html> + <http://www.kroah.com/log/linux/maintainer-04.html> + <http://www.kroah.com/log/linux/maintainer-05.html> NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people! <http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2> diff --git a/Documentation/accounting/cgroupstats.txt b/Documentation/accounting/cgroupstats.txt index eda40fd39cad..d16a9849e60e 100644 --- a/Documentation/accounting/cgroupstats.txt +++ b/Documentation/accounting/cgroupstats.txt @@ -21,7 +21,7 @@ information will not be available. To extract cgroup statistics a utility very similar to getdelays.c has been developed, the sample output of the utility is shown below -~/balbir/cgroupstats # ./getdelays -C "/cgroup/a" +~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup/a" sleeping 1, blocked 0, running 1, stopped 0, uninterruptible 0 -~/balbir/cgroupstats # ./getdelays -C "/cgroup" +~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup" sleeping 155, blocked 0, running 1, stopped 0, uninterruptible 2 diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c index e9c77788a39d..f6318f6d7baf 100644 --- a/Documentation/accounting/getdelays.c +++ b/Documentation/accounting/getdelays.c @@ -177,6 +177,8 @@ static int get_family_id(int sd) rc = send_cmd(sd, GENL_ID_CTRL, getpid(), CTRL_CMD_GETFAMILY, CTRL_ATTR_FAMILY_NAME, (void *)name, strlen(TASKSTATS_GENL_NAME)+1); + if (rc < 0) + return 0; /* sendto() failure? */ rep_len = recv(sd, &ans, sizeof(ans), 0); if (ans.n.nlmsg_type == NLMSG_ERROR || @@ -191,30 +193,37 @@ static int get_family_id(int sd) return id; } +#define average_ms(t, c) (t / 1000000ULL / (c ? c : 1)) + static void print_delayacct(struct taskstats *t) { - printf("\n\nCPU %15s%15s%15s%15s\n" - " %15llu%15llu%15llu%15llu\n" - "IO %15s%15s\n" - " %15llu%15llu\n" - "SWAP %15s%15s\n" - " %15llu%15llu\n" - "RECLAIM %12s%15s\n" - " %15llu%15llu\n", - "count", "real total", "virtual total", "delay total", + printf("\n\nCPU %15s%15s%15s%15s%15s\n" + " %15llu%15llu%15llu%15llu%15.3fms\n" + "IO %15s%15s%15s\n" + " %15llu%15llu%15llums\n" + "SWAP %15s%15s%15s\n" + " %15llu%15llu%15llums\n" + "RECLAIM %12s%15s%15s\n" + " %15llu%15llu%15llums\n", + "count", "real total", "virtual total", + "delay total", "delay average", (unsigned long long)t->cpu_count, (unsigned long long)t->cpu_run_real_total, (unsigned long long)t->cpu_run_virtual_total, (unsigned long long)t->cpu_delay_total, - "count", "delay total", + average_ms((double)t->cpu_delay_total, t->cpu_count), + "count", "delay total", "delay average", (unsigned long long)t->blkio_count, (unsigned long long)t->blkio_delay_total, - "count", "delay total", + average_ms(t->blkio_delay_total, t->blkio_count), + "count", "delay total", "delay average", (unsigned long long)t->swapin_count, (unsigned long long)t->swapin_delay_total, - "count", "delay total", + average_ms(t->swapin_delay_total, t->swapin_count), + "count", "delay total", "delay average", (unsigned long long)t->freepages_count, - (unsigned long long)t->freepages_delay_total); + (unsigned long long)t->freepages_delay_total, + average_ms(t->freepages_delay_total, t->freepages_count)); } static void task_context_switch_counts(struct taskstats *t) @@ -433,8 +442,6 @@ int main(int argc, char *argv[]) } do { - int i; - rep_len = recv(nl_sd, &msg, sizeof(msg), 0); PRINTF("received %d bytes\n", rep_len); @@ -459,7 +466,6 @@ int main(int argc, char *argv[]) na = (struct nlattr *) GENLMSG_DATA(&msg); len = 0; - i = 0; while (len < rep_len) { len += NLA_ALIGN(na->nla_len); switch (na->nla_type) { diff --git a/Documentation/acpi/method-customizing.txt b/Documentation/acpi/method-customizing.txt index 3e1d25aee3fb..5f55373dd53b 100644 --- a/Documentation/acpi/method-customizing.txt +++ b/Documentation/acpi/method-customizing.txt @@ -66,3 +66,8 @@ Note: We can use a kernel with multiple custom ACPI method running, But each individual write to debugfs can implement a SINGLE method override. i.e. if we want to insert/override multiple ACPI methods, we need to redo step c) ~ g) for multiple times. + +Note: Be aware that root can mis-use this driver to modify arbitrary + memory and gain additional rights, if root's privileges got + restricted (for example if root is not allowed to load additional + modules after boot). diff --git a/Documentation/arm/Booting b/Documentation/arm/Booting index 76850295af8f..4e686a2ed91e 100644 --- a/Documentation/arm/Booting +++ b/Documentation/arm/Booting @@ -65,13 +65,19 @@ looks at the connected hardware is beyond the scope of this document. The boot loader must ultimately be able to provide a MACH_TYPE_xxx value to the kernel. (see linux/arch/arm/tools/mach-types). - -4. Setup the kernel tagged list -------------------------------- +4. Setup boot data +------------------ Existing boot loaders: OPTIONAL, HIGHLY RECOMMENDED New boot loaders: MANDATORY +The boot loader must provide either a tagged list or a dtb image for +passing configuration data to the kernel. The physical address of the +boot data is passed to the kernel in register r2. + +4a. Setup the kernel tagged list +-------------------------------- + The boot loader must create and initialise the kernel tagged list. A valid tagged list starts with ATAG_CORE and ends with ATAG_NONE. The ATAG_CORE tag may or may not be empty. An empty ATAG_CORE tag @@ -101,6 +107,24 @@ The tagged list must be placed in a region of memory where neither the kernel decompressor nor initrd 'bootp' program will overwrite it. The recommended placement is in the first 16KiB of RAM. +4b. Setup the device tree +------------------------- + +The boot loader must load a device tree image (dtb) into system ram +at a 64bit aligned address and initialize it with the boot data. The +dtb format is documented in Documentation/devicetree/booting-without-of.txt. +The kernel will look for the dtb magic value of 0xd00dfeed at the dtb +physical address to determine if a dtb has been passed instead of a +tagged list. + +The boot loader must pass at a minimum the size and location of the +system memory, and the root filesystem location. The dtb must be +placed in a region of memory where the kernel decompressor will not +overwrite it. The recommended placement is in the first 16KiB of RAM +with the caveat that it may not be located at physical address 0 since +the kernel interprets a value of 0 in r2 to mean neither a tagged list +nor a dtb were passed. + 5. Calling the kernel image --------------------------- @@ -125,7 +149,8 @@ In either case, the following conditions must be met: - CPU register settings r0 = 0, r1 = machine type number discovered in (3) above. - r2 = physical address of tagged list in system RAM. + r2 = physical address of tagged list in system RAM, or + physical address of device tree block (dtb) in system RAM - CPU mode All forms of interrupts must be disabled (IRQs and FIQs) diff --git a/Documentation/arm/Samsung/Overview.txt b/Documentation/arm/Samsung/Overview.txt index c3094ea51aa7..658abb258cef 100644 --- a/Documentation/arm/Samsung/Overview.txt +++ b/Documentation/arm/Samsung/Overview.txt @@ -14,7 +14,6 @@ Introduction - S3C24XX: See Documentation/arm/Samsung-S3C24XX/Overview.txt for full list - S3C64XX: S3C6400 and S3C6410 - S5P6440 - - S5P6442 - S5PC100 - S5PC110 / S5PV210 @@ -36,7 +35,6 @@ Configuration unifying all the SoCs into one kernel. s5p6440_defconfig - S5P6440 specific default configuration - s5p6442_defconfig - S5P6442 specific default configuration s5pc100_defconfig - S5PC100 specific default configuration s5pc110_defconfig - S5PC110 specific default configuration s5pv210_defconfig - S5PV210 specific default configuration diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt index ac4d47187122..3bd585b44927 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/atomic_ops.txt @@ -12,7 +12,7 @@ Also, it should be made opaque such that any kind of cast to a normal C integer type will fail. Something like the following should suffice: - typedef struct { volatile int counter; } atomic_t; + typedef struct { int counter; } atomic_t; Historically, counter has been declared volatile. This is now discouraged. See Documentation/volatile-considered-harmful.txt for the complete rationale. diff --git a/Documentation/blockdev/cciss.txt b/Documentation/blockdev/cciss.txt index 89698e8df7d4..c00c6a5ab21f 100644 --- a/Documentation/blockdev/cciss.txt +++ b/Documentation/blockdev/cciss.txt @@ -169,3 +169,18 @@ is issued which positions the tape to a known position. Typically you must rewind the tape (by issuing "mt -f /dev/st0 rewind" for example) before i/o can proceed again to a tape drive which was reset. +There is a cciss_tape_cmds module parameter which can be used to make cciss +allocate more commands for use by tape drives. Ordinarily only a few commands +(6) are allocated for tape drives because tape drives are slow and +infrequently used and the primary purpose of Smart Array controllers is to +act as a RAID controller for disk drives, so the vast majority of commands +are allocated for disk devices. However, if you have more than a few tape +drives attached to a smart array, the default number of commands may not be +enought (for example, if you have 8 tape drives, you could only rewind 6 +at one time with the default number of commands.) The cciss_tape_cmds module +parameter allows more commands (up to 16 more) to be allocated for use by +tape drives. For example: + + insmod cciss.ko cciss_tape_cmds=16 + +Or, as a kernel boot parameter passed in via grub: cciss.cciss_tape_cmds=8 diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt index 9164ae3b83bc..9b728dc17535 100644 --- a/Documentation/cachetlb.txt +++ b/Documentation/cachetlb.txt @@ -16,7 +16,7 @@ on all processors in the system. Don't let this scare you into thinking SMP cache/tlb flushing must be so inefficient, this is in fact an area where many optimizations are possible. For example, if it can be proven that a user address space has never executed -on a cpu (see vma->cpu_vm_mask), one need not perform a flush +on a cpu (see mm_cpumask()), one need not perform a flush for this address space on that cpu. First, the TLB flushing interfaces, since they are the simplest. The diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt index 465351d4cf85..84f0a15fc210 100644 --- a/Documentation/cgroups/blkio-controller.txt +++ b/Documentation/cgroups/blkio-controller.txt @@ -28,16 +28,19 @@ cgroups. Here is what you can do. - Enable group scheduling in CFQ CONFIG_CFQ_GROUP_IOSCHED=y -- Compile and boot into kernel and mount IO controller (blkio). +- Compile and boot into kernel and mount IO controller (blkio); see + cgroups.txt, Why are cgroups needed?. - mount -t cgroup -o blkio none /cgroup + mount -t tmpfs cgroup_root /sys/fs/cgroup + mkdir /sys/fs/cgroup/blkio + mount -t cgroup -o blkio none /sys/fs/cgroup/blkio - Create two cgroups - mkdir -p /cgroup/test1/ /cgroup/test2 + mkdir -p /sys/fs/cgroup/blkio/test1/ /sys/fs/cgroup/blkio/test2 - Set weights of group test1 and test2 - echo 1000 > /cgroup/test1/blkio.weight - echo 500 > /cgroup/test2/blkio.weight + echo 1000 > /sys/fs/cgroup/blkio/test1/blkio.weight + echo 500 > /sys/fs/cgroup/blkio/test2/blkio.weight - Create two same size files (say 512MB each) on same disk (file1, file2) and launch two dd threads in different cgroup to read those files. @@ -46,12 +49,12 @@ cgroups. Here is what you can do. echo 3 > /proc/sys/vm/drop_caches dd if=/mnt/sdb/zerofile1 of=/dev/null & - echo $! > /cgroup/test1/tasks - cat /cgroup/test1/tasks + echo $! > /sys/fs/cgroup/blkio/test1/tasks + cat /sys/fs/cgroup/blkio/test1/tasks dd if=/mnt/sdb/zerofile2 of=/dev/null & - echo $! > /cgroup/test2/tasks - cat /cgroup/test2/tasks + echo $! > /sys/fs/cgroup/blkio/test2/tasks + cat /sys/fs/cgroup/blkio/test2/tasks - At macro level, first dd should finish first. To get more precise data, keep on looking at (with the help of script), at blkio.disk_time and @@ -68,13 +71,13 @@ Throttling/Upper Limit policy - Enable throttling in block layer CONFIG_BLK_DEV_THROTTLING=y -- Mount blkio controller - mount -t cgroup -o blkio none /cgroup/blkio +- Mount blkio controller (see cgroups.txt, Why are cgroups needed?) + mount -t cgroup -o blkio none /sys/fs/cgroup/blkio - Specify a bandwidth rate on particular device for root group. The format for policy is "<major>:<minor> <byes_per_second>". - echo "8:16 1048576" > /cgroup/blkio/blkio.read_bps_device + echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device Above will put a limit of 1MB/second on reads happening for root group on device having major/minor number 8:16. @@ -87,7 +90,7 @@ Throttling/Upper Limit policy 1024+0 records out 4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s - Limits for writes can be put using blkio.write_bps_device file. + Limits for writes can be put using blkio.throttle.write_bps_device file. Hierarchical Cgroups ==================== @@ -108,7 +111,7 @@ Hierarchical Cgroups CFQ and throttling will practically treat all groups at same level. pivot - / | \ \ + / / \ \ root test1 test2 test3 Down the line we can implement hierarchical accounting/control support @@ -149,7 +152,7 @@ Proportional weight policy files Following is the format. - #echo dev_maj:dev_minor weight > /path/to/cgroup/blkio.weight_device + # echo dev_maj:dev_minor weight > blkio.weight_device Configure weight=300 on /dev/sdb (8:16) in this cgroup # echo 8:16 300 > blkio.weight_device # cat blkio.weight_device @@ -283,28 +286,28 @@ Throttling/Upper limit policy files specified in bytes per second. Rules are per deivce. Following is the format. - echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.read_bps_device + echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.read_bps_device - blkio.throttle.write_bps_device - Specifies upper limit on WRITE rate to the device. IO rate is specified in bytes per second. Rules are per deivce. Following is the format. - echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.write_bps_device + echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.write_bps_device - blkio.throttle.read_iops_device - Specifies upper limit on READ rate from the device. IO rate is specified in IO per second. Rules are per deivce. Following is the format. - echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.read_iops_device + echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.read_iops_device - blkio.throttle.write_iops_device - Specifies upper limit on WRITE rate to the device. IO rate is specified in io per second. Rules are per deivce. Following is the format. - echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.write_iops_device + echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.write_iops_device Note: If both BW and IOPS rules are specified for a device, then IO is subjectd to both the constraints. diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index aedf1bd02fdd..cd67e90003c0 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -138,11 +138,11 @@ With the ability to classify tasks differently for different resources the admin can easily set up a script which receives exec notifications and depending on who is launching the browser he can - # echo browser_pid > /mnt/<restype>/<userclass>/tasks + # echo browser_pid > /sys/fs/cgroup/<restype>/<userclass>/tasks With only a single hierarchy, he now would potentially have to create a separate cgroup for every browser launched and associate it with -approp network and other resource class. This may lead to +appropriate network and other resource class. This may lead to proliferation of such cgroups. Also lets say that the administrator would like to give enhanced network @@ -153,9 +153,9 @@ apps enhanced CPU power, With ability to write pids directly to resource classes, it's just a matter of : - # echo pid > /mnt/network/<new_class>/tasks + # echo pid > /sys/fs/cgroup/network/<new_class>/tasks (after some time) - # echo pid > /mnt/network/<orig_class>/tasks + # echo pid > /sys/fs/cgroup/network/<orig_class>/tasks Without this ability, he would have to split the cgroup into multiple separate ones and then associate the new cgroups with the @@ -236,7 +236,8 @@ containing the following files describing that cgroup: - cgroup.procs: list of tgids in the cgroup. This list is not guaranteed to be sorted or free of duplicate tgids, and userspace should sort/uniquify the list if this property is required. - This is a read-only file, for now. + Writing a thread group id into this file moves all threads in that + group into this cgroup. - notify_on_release flag: run the release agent on exit? - release_agent: the path to use for release notifications (this file exists in the top cgroup only) @@ -309,21 +310,24 @@ subsystem, this is the case for the cpuset. To start a new job that is to be contained within a cgroup, using the "cpuset" cgroup subsystem, the steps are something like: - 1) mkdir /dev/cgroup - 2) mount -t cgroup -ocpuset cpuset /dev/cgroup - 3) Create the new cgroup by doing mkdir's and write's (or echo's) in - the /dev/cgroup virtual file system. - 4) Start a task that will be the "founding father" of the new job. - 5) Attach that task to the new cgroup by writing its pid to the - /dev/cgroup tasks file for that cgroup. - 6) fork, exec or clone the job tasks from this founding father task. + 1) mount -t tmpfs cgroup_root /sys/fs/cgroup + 2) mkdir /sys/fs/cgroup/cpuset + 3) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset + 4) Create the new cgroup by doing mkdir's and write's (or echo's) in + the /sys/fs/cgroup virtual file system. + 5) Start a task that will be the "founding father" of the new job. + 6) Attach that task to the new cgroup by writing its pid to the + /sys/fs/cgroup/cpuset/tasks file for that cgroup. + 7) fork, exec or clone the job tasks from this founding father task. For example, the following sequence of commands will setup a cgroup named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, and then start a subshell 'sh' in that cgroup: - mount -t cgroup cpuset -ocpuset /dev/cgroup - cd /dev/cgroup + mount -t tmpfs cgroup_root /sys/fs/cgroup + mkdir /sys/fs/cgroup/cpuset + mount -t cgroup cpuset -ocpuset /sys/fs/cgroup/cpuset + cd /sys/fs/cgroup/cpuset mkdir Charlie cd Charlie /bin/echo 2-3 > cpuset.cpus @@ -344,7 +348,7 @@ Creating, modifying, using the cgroups can be done through the cgroup virtual filesystem. To mount a cgroup hierarchy with all available subsystems, type: -# mount -t cgroup xxx /dev/cgroup +# mount -t cgroup xxx /sys/fs/cgroup The "xxx" is not interpreted by the cgroup code, but will appear in /proc/mounts so may be any useful identifying string that you like. @@ -353,23 +357,32 @@ Note: Some subsystems do not work without some user input first. For instance, if cpusets are enabled the user will have to populate the cpus and mems files for each new cgroup created before that group can be used. +As explained in section `1.2 Why are cgroups needed?' you should create +different hierarchies of cgroups for each single resource or group of +resources you want to control. Therefore, you should mount a tmpfs on +/sys/fs/cgroup and create directories for each cgroup resource or resource +group. + +# mount -t tmpfs cgroup_root /sys/fs/cgroup +# mkdir /sys/fs/cgroup/rg1 + To mount a cgroup hierarchy with just the cpuset and memory subsystems, type: -# mount -t cgroup -o cpuset,memory hier1 /dev/cgroup +# mount -t cgroup -o cpuset,memory hier1 /sys/fs/cgroup/rg1 To change the set of subsystems bound to a mounted hierarchy, just remount with different options: -# mount -o remount,cpuset,blkio hier1 /dev/cgroup +# mount -o remount,cpuset,blkio hier1 /sys/fs/cgroup/rg1 Now memory is removed from the hierarchy and blkio is added. Note this will add blkio to the hierarchy but won't remove memory or cpuset, because the new options are appended to the old ones: -# mount -o remount,blkio /dev/cgroup +# mount -o remount,blkio /sys/fs/cgroup/rg1 To Specify a hierarchy's release_agent: # mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \ - xxx /dev/cgroup + xxx /sys/fs/cgroup/rg1 Note that specifying 'release_agent' more than once will return failure. @@ -378,17 +391,17 @@ when the hierarchy consists of a single (root) cgroup. Supporting the ability to arbitrarily bind/unbind subsystems from an existing cgroup hierarchy is intended to be implemented in the future. -Then under /dev/cgroup you can find a tree that corresponds to the -tree of the cgroups in the system. For instance, /dev/cgroup +Then under /sys/fs/cgroup/rg1 you can find a tree that corresponds to the +tree of the cgroups in the system. For instance, /sys/fs/cgroup/rg1 is the cgroup that holds the whole system. If you want to change the value of release_agent: -# echo "/sbin/new_release_agent" > /dev/cgroup/release_agent +# echo "/sbin/new_release_agent" > /sys/fs/cgroup/rg1/release_agent It can also be changed via remount. -If you want to create a new cgroup under /dev/cgroup: -# cd /dev/cgroup +If you want to create a new cgroup under /sys/fs/cgroup/rg1: +# cd /sys/fs/cgroup/rg1 # mkdir my_cgroup Now you want to do something with this cgroup. @@ -430,6 +443,12 @@ You can attach the current shell task by echoing 0: # echo 0 > tasks +You can use the cgroup.procs file instead of the tasks file to move all +threads in a threadgroup at once. Echoing the pid of any task in a +threadgroup to cgroup.procs causes all tasks in that threadgroup to be +be attached to the cgroup. Writing 0 to cgroup.procs moves all tasks +in the writing task's threadgroup. + Note: Since every task is always a member of exactly one cgroup in each mounted hierarchy, to remove a task from its current cgroup you must move it into a new cgroup (possibly the root cgroup) by writing to the @@ -575,7 +594,7 @@ rmdir() will fail with it. From this behavior, pre_destroy() can be called multiple times against a cgroup. int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct task_struct *task, bool threadgroup) + struct task_struct *task) (cgroup_mutex held by caller) Called prior to moving a task into a cgroup; if the subsystem @@ -584,9 +603,14 @@ task is passed, then a successful result indicates that *any* unspecified task can be moved into the cgroup. Note that this isn't called on a fork. If this method returns 0 (success) then this should remain valid while the caller holds cgroup_mutex and it is ensured that either -attach() or cancel_attach() will be called in future. If threadgroup is -true, then a successful result indicates that all threads in the given -thread's threadgroup can be moved together. +attach() or cancel_attach() will be called in future. + +int can_attach_task(struct cgroup *cgrp, struct task_struct *tsk); +(cgroup_mutex held by caller) + +As can_attach, but for operations that must be run once per task to be +attached (possibly many when using cgroup_attach_proc). Called after +can_attach. void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, struct task_struct *task, bool threadgroup) @@ -598,15 +622,24 @@ function, so that the subsystem can implement a rollback. If not, not necessary. This will be called only about subsystems whose can_attach() operation have succeeded. +void pre_attach(struct cgroup *cgrp); +(cgroup_mutex held by caller) + +For any non-per-thread attachment work that needs to happen before +attach_task. Needed by cpuset. + void attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct cgroup *old_cgrp, struct task_struct *task, - bool threadgroup) + struct cgroup *old_cgrp, struct task_struct *task) (cgroup_mutex held by caller) Called after the task has been attached to the cgroup, to allow any post-attachment activity that requires memory allocations or blocking. -If threadgroup is true, the subsystem should take care of all threads -in the specified thread's threadgroup. Currently does not support any + +void attach_task(struct cgroup *cgrp, struct task_struct *tsk); +(cgroup_mutex held by caller) + +As attach, but for operations that must be run once per task to be attached, +like can_attach_task. Called before attach. Currently does not support any subsystem that might need the old_cgrp for every thread in the group. void fork(struct cgroup_subsy *ss, struct task_struct *task) @@ -630,7 +663,7 @@ always handled well. void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp) (cgroup_mutex held by caller) -Called at the end of cgroup_clone() to do any parameter +Called during cgroup_create() to do any parameter initialization which might be required before a task could attach. For example in cpusets, no task may attach before 'cpus' and 'mems' are set up. diff --git a/Documentation/cgroups/cpuacct.txt b/Documentation/cgroups/cpuacct.txt index 8b930946c52a..9ad85df4b983 100644 --- a/Documentation/cgroups/cpuacct.txt +++ b/Documentation/cgroups/cpuacct.txt @@ -10,26 +10,25 @@ directly present in its group. Accounting groups can be created by first mounting the cgroup filesystem. -# mkdir /cgroups -# mount -t cgroup -ocpuacct none /cgroups - -With the above step, the initial or the parent accounting group -becomes visible at /cgroups. At bootup, this group includes all the -tasks in the system. /cgroups/tasks lists the tasks in this cgroup. -/cgroups/cpuacct.usage gives the CPU time (in nanoseconds) obtained by -this group which is essentially the CPU time obtained by all the tasks +# mount -t cgroup -ocpuacct none /sys/fs/cgroup + +With the above step, the initial or the parent accounting group becomes +visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in +the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. +/sys/fs/cgroup/cpuacct.usage gives the CPU time (in nanoseconds) obtained +by this group which is essentially the CPU time obtained by all the tasks in the system. -New accounting groups can be created under the parent group /cgroups. +New accounting groups can be created under the parent group /sys/fs/cgroup. -# cd /cgroups +# cd /sys/fs/cgroup # mkdir g1 # echo $$ > g1 The above steps create a new group g1 and move the current shell process (bash) into it. CPU time consumed by this bash and its children can be obtained from g1/cpuacct.usage and the same is accumulated in -/cgroups/cpuacct.usage also. +/sys/fs/cgroup/cpuacct.usage also. cpuacct.stat file lists a few statistics which further divide the CPU time obtained by the cgroup into user and system times. Currently diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt index 98a30829af7a..5b0d78e55ccc 100644 --- a/Documentation/cgroups/cpusets.txt +++ b/Documentation/cgroups/cpusets.txt @@ -661,21 +661,21 @@ than stress the kernel. To start a new job that is to be contained within a cpuset, the steps are: - 1) mkdir /dev/cpuset - 2) mount -t cgroup -ocpuset cpuset /dev/cpuset + 1) mkdir /sys/fs/cgroup/cpuset + 2) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset 3) Create the new cpuset by doing mkdir's and write's (or echo's) in - the /dev/cpuset virtual file system. + the /sys/fs/cgroup/cpuset virtual file system. 4) Start a task that will be the "founding father" of the new job. 5) Attach that task to the new cpuset by writing its pid to the - /dev/cpuset tasks file for that cpuset. + /sys/fs/cgroup/cpuset tasks file for that cpuset. 6) fork, exec or clone the job tasks from this founding father task. For example, the following sequence of commands will setup a cpuset named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, and then start a subshell 'sh' in that cpuset: - mount -t cgroup -ocpuset cpuset /dev/cpuset - cd /dev/cpuset + mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset + cd /sys/fs/cgroup/cpuset mkdir Charlie cd Charlie /bin/echo 2-3 > cpuset.cpus @@ -710,14 +710,14 @@ Creating, modifying, using the cpusets can be done through the cpuset virtual filesystem. To mount it, type: -# mount -t cgroup -o cpuset cpuset /dev/cpuset +# mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset -Then under /dev/cpuset you can find a tree that corresponds to the -tree of the cpusets in the system. For instance, /dev/cpuset +Then under /sys/fs/cgroup/cpuset you can find a tree that corresponds to the +tree of the cpusets in the system. For instance, /sys/fs/cgroup/cpuset is the cpuset that holds the whole system. -If you want to create a new cpuset under /dev/cpuset: -# cd /dev/cpuset +If you want to create a new cpuset under /sys/fs/cgroup/cpuset: +# cd /sys/fs/cgroup/cpuset # mkdir my_cpuset Now you want to do something with this cpuset. @@ -765,12 +765,12 @@ wrapper around the cgroup filesystem. The command -mount -t cpuset X /dev/cpuset +mount -t cpuset X /sys/fs/cgroup/cpuset is equivalent to -mount -t cgroup -ocpuset,noprefix X /dev/cpuset -echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent +mount -t cgroup -ocpuset,noprefix X /sys/fs/cgroup/cpuset +echo "/sbin/cpuset_release_agent" > /sys/fs/cgroup/cpuset/release_agent 2.2 Adding/removing cpus ------------------------ diff --git a/Documentation/cgroups/devices.txt b/Documentation/cgroups/devices.txt index 57ca4c89fe5c..16624a7f8222 100644 --- a/Documentation/cgroups/devices.txt +++ b/Documentation/cgroups/devices.txt @@ -22,16 +22,16 @@ removed from the child(ren). An entry is added using devices.allow, and removed using devices.deny. For instance - echo 'c 1:3 mr' > /cgroups/1/devices.allow + echo 'c 1:3 mr' > /sys/fs/cgroup/1/devices.allow allows cgroup 1 to read and mknod the device usually known as /dev/null. Doing - echo a > /cgroups/1/devices.deny + echo a > /sys/fs/cgroup/1/devices.deny will remove the default 'a *:* rwm' entry. Doing - echo a > /cgroups/1/devices.allow + echo a > /sys/fs/cgroup/1/devices.allow will add the 'a *:* rwm' entry to the whitelist. diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt index 41f37fea1276..c21d77742a07 100644 --- a/Documentation/cgroups/freezer-subsystem.txt +++ b/Documentation/cgroups/freezer-subsystem.txt @@ -59,28 +59,28 @@ is non-freezable. * Examples of usage : - # mkdir /containers - # mount -t cgroup -ofreezer freezer /containers - # mkdir /containers/0 - # echo $some_pid > /containers/0/tasks + # mkdir /sys/fs/cgroup/freezer + # mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer + # mkdir /sys/fs/cgroup/freezer/0 + # echo $some_pid > /sys/fs/cgroup/freezer/0/tasks to get status of the freezer subsystem : - # cat /containers/0/freezer.state + # cat /sys/fs/cgroup/freezer/0/freezer.state THAWED to freeze all tasks in the container : - # echo FROZEN > /containers/0/freezer.state - # cat /containers/0/freezer.state + # echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state + # cat /sys/fs/cgroup/freezer/0/freezer.state FREEZING - # cat /containers/0/freezer.state + # cat /sys/fs/cgroup/freezer/0/freezer.state FROZEN to unfreeze all tasks in the container : - # echo THAWED > /containers/0/freezer.state - # cat /containers/0/freezer.state + # echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state + # cat /sys/fs/cgroup/freezer/0/freezer.state THAWED This is the basic mechanism which should do the right thing for user space task diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 7c163477fcd8..06eb6d957c83 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -1,8 +1,8 @@ Memory Resource Controller -NOTE: The Memory Resource Controller has been generically been referred - to as the memory controller in this document. Do not confuse memory - controller used here with the memory controller that is used in hardware. +NOTE: The Memory Resource Controller has generically been referred to as the + memory controller in this document. Do not confuse memory controller + used here with the memory controller that is used in hardware. (For editors) In this document: @@ -70,6 +70,7 @@ Brief summary of control files. (See sysctl's vm.swappiness) memory.move_charge_at_immigrate # set/show controls of moving charges memory.oom_control # set/show oom controls. + memory.numa_stat # show the number of memory usage per numa node 1. History @@ -181,7 +182,7 @@ behind this approach is that a cgroup that aggressively uses a shared page will eventually get charged for it (once it is uncharged from the cgroup that brought it in -- this will happen on memory pressure). -Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.. +Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used. When you do swapoff and make swapped-out pages of shmem(tmpfs) to be backed into memory in force, charges for pages are accounted against the caller of swapoff rather than the users of shmem. @@ -213,7 +214,7 @@ affecting global LRU, memory+swap limit is better than just limiting swap from OS point of view. * What happens when a cgroup hits memory.memsw.limit_in_bytes -When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out +When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out in this cgroup. Then, swap-out will not be done by cgroup routine and file caches are dropped. But as mentioned above, global LRU can do swapout memory from it for sanity of the system's memory management state. You can't forbid @@ -263,16 +264,17 @@ b. Enable CONFIG_RESOURCE_COUNTERS c. Enable CONFIG_CGROUP_MEM_RES_CTLR d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) -1. Prepare the cgroups -# mkdir -p /cgroups -# mount -t cgroup none /cgroups -o memory +1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) +# mount -t tmpfs none /sys/fs/cgroup +# mkdir /sys/fs/cgroup/memory +# mount -t cgroup none /sys/fs/cgroup/memory -o memory 2. Make the new group and move bash into it -# mkdir /cgroups/0 -# echo $$ > /cgroups/0/tasks +# mkdir /sys/fs/cgroup/memory/0 +# echo $$ > /sys/fs/cgroup/memory/0/tasks Since now we're in the 0 cgroup, we can alter the memory limit: -# echo 4M > /cgroups/0/memory.limit_in_bytes +# echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) @@ -280,11 +282,11 @@ mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). NOTE: We cannot set limits on the root cgroup any more. -# cat /cgroups/0/memory.limit_in_bytes +# cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes 4194304 We can check the usage: -# cat /cgroups/0/memory.usage_in_bytes +# cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes 1216512 A successful write to this file does not guarantee a successful set of @@ -464,6 +466,24 @@ value for efficient access. (Of course, when necessary, it's synchronized.) If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat(see 5.2). +5.6 numa_stat + +This is similar to numa_maps but operates on a per-memcg basis. This is +useful for providing visibility into the numa locality information within +an memcg since the pages are allowed to be allocated from any physical +node. One of the usecases is evaluating application performance by +combining this information with the application's cpu allocation. + +We export "total", "file", "anon" and "unevictable" pages per-node for +each memcg. The ouput format of memory.numa_stat is: + +total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ... +file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ... +anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... +unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... + +And we have total = file + anon + unevictable. + 6. Hierarchy support The memory controller supports a deep hierarchy and hierarchical accounting. @@ -471,13 +491,13 @@ The hierarchy is created by creating the appropriate cgroups in the cgroup filesystem. Consider for example, the following cgroup filesystem hierarchy - root + root / | \ - / | \ - a b c - | \ - | \ - d e + / | \ + a b c + | \ + | \ + d e In the diagram above, with hierarchical accounting enabled, all memory usage of e, is accounted to its ancestors up until the root (i.e, c and root), diff --git a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt index edb7ae19e868..2c6be0377f55 100644 --- a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt +++ b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt @@ -74,3 +74,57 @@ Example: interrupt-parent = <&mpic>; phy-handle = <&phy0> }; + +* Gianfar PTP clock nodes + +General Properties: + + - compatible Should be "fsl,etsec-ptp" + - reg Offset and length of the register set for the device + - interrupts There should be at least two interrupts. Some devices + have as many as four PTP related interrupts. + +Clock Properties: + + - fsl,tclk-period Timer reference clock period in nanoseconds. + - fsl,tmr-prsc Prescaler, divides the output clock. + - fsl,tmr-add Frequency compensation value. + - fsl,tmr-fiper1 Fixed interval period pulse generator. + - fsl,tmr-fiper2 Fixed interval period pulse generator. + - fsl,max-adj Maximum frequency adjustment in parts per billion. + + These properties set the operational parameters for the PTP + clock. You must choose these carefully for the clock to work right. + Here is how to figure good values: + + TimerOsc = system clock MHz + tclk_period = desired clock period nanoseconds + NominalFreq = 1000 / tclk_period MHz + FreqDivRatio = TimerOsc / NominalFreq (must be greater that 1.0) + tmr_add = ceil(2^32 / FreqDivRatio) + OutputClock = NominalFreq / tmr_prsc MHz + PulseWidth = 1 / OutputClock microseconds + FiperFreq1 = desired frequency in Hz + FiperDiv1 = 1000000 * OutputClock / FiperFreq1 + tmr_fiper1 = tmr_prsc * tclk_period * FiperDiv1 - tclk_period + max_adj = 1000000000 * (FreqDivRatio - 1.0) - 1 + + The calculation for tmr_fiper2 is the same as for tmr_fiper1. The + driver expects that tmr_fiper1 will be correctly set to produce a 1 + Pulse Per Second (PPS) signal, since this will be offered to the PPS + subsystem to synchronize the Linux clock. + +Example: + + ptp_clock@24E00 { + compatible = "fsl,etsec-ptp"; + reg = <0x24E00 0xB0>; + interrupts = <12 0x8 13 0x8>; + interrupt-parent = < &ipic >; + fsl,tclk-period = <10>; + fsl,tmr-prsc = <100>; + fsl,tmr-add = <0x999999A4>; + fsl,tmr-fiper1 = <0x3B9AC9F6>; + fsl,tmr-fiper2 = <0x00018696>; + fsl,max-adj = <659999998>; + }; diff --git a/Documentation/devicetree/booting-without-of.txt b/Documentation/devicetree/booting-without-of.txt index 50619a0720a8..7c1329de0596 100644 --- a/Documentation/devicetree/booting-without-of.txt +++ b/Documentation/devicetree/booting-without-of.txt @@ -12,8 +12,9 @@ Table of Contents ================= I - Introduction - 1) Entry point for arch/powerpc - 2) Entry point for arch/x86 + 1) Entry point for arch/arm + 2) Entry point for arch/powerpc + 3) Entry point for arch/x86 II - The DT block format 1) Header @@ -148,7 +149,46 @@ upgrades without significantly impacting the kernel code or cluttering it with special cases. -1) Entry point for arch/powerpc +1) Entry point for arch/arm +--------------------------- + + There is one single entry point to the kernel, at the start + of the kernel image. That entry point supports two calling + conventions. A summary of the interface is described here. A full + description of the boot requirements is documented in + Documentation/arm/Booting + + a) ATAGS interface. Minimal information is passed from firmware + to the kernel with a tagged list of predefined parameters. + + r0 : 0 + + r1 : Machine type number + + r2 : Physical address of tagged list in system RAM + + b) Entry with a flattened device-tree block. Firmware loads the + physical address of the flattened device tree block (dtb) into r2, + r1 is not used, but it is considered good practise to use a valid + machine number as described in Documentation/arm/Booting. + + r0 : 0 + + r1 : Valid machine type number. When using a device tree, + a single machine type number will often be assigned to + represent a class or family of SoCs. + + r2 : physical pointer to the device-tree block + (defined in chapter II) in RAM. Device tree can be located + anywhere in system RAM, but it should be aligned on a 64 bit + boundary. + + The kernel will differentiate between ATAGS and device tree booting by + reading the memory pointed to by r2 and looking for either the flattened + device tree block magic value (0xd00dfeed) or the ATAG_CORE value at + offset 0x4 from r2 (0x54410001). + +2) Entry point for arch/powerpc ------------------------------- There is one single entry point to the kernel, at the start @@ -226,7 +266,7 @@ it with special cases. cannot support both configurations with Book E and configurations with classic Powerpc architectures. -2) Entry point for arch/x86 +3) Entry point for arch/x86 ------------------------------- There is one single 32bit entry point to the kernel at code32_start, diff --git a/Documentation/dmaengine.txt b/Documentation/dmaengine.txt index 0c1c2f63c0a9..5a0cb1ef6164 100644 --- a/Documentation/dmaengine.txt +++ b/Documentation/dmaengine.txt @@ -1 +1,96 @@ -See Documentation/crypto/async-tx-api.txt + DMA Engine API Guide + ==================== + + Vinod Koul <vinod dot koul at intel.com> + +NOTE: For DMA Engine usage in async_tx please see: + Documentation/crypto/async-tx-api.txt + + +Below is a guide to device driver writers on how to use the Slave-DMA API of the +DMA Engine. This is applicable only for slave DMA usage only. + +The slave DMA usage consists of following steps +1. Allocate a DMA slave channel +2. Set slave and controller specific parameters +3. Get a descriptor for transaction +4. Submit the transaction and wait for callback notification + +1. Allocate a DMA slave channel +Channel allocation is slightly different in the slave DMA context, client +drivers typically need a channel from a particular DMA controller only and even +in some cases a specific channel is desired. To request a channel +dma_request_channel() API is used. + +Interface: +struct dma_chan *dma_request_channel(dma_cap_mask_t mask, + dma_filter_fn filter_fn, + void *filter_param); +where dma_filter_fn is defined as: +typedef bool (*dma_filter_fn)(struct dma_chan *chan, void *filter_param); + +When the optional 'filter_fn' parameter is set to NULL dma_request_channel +simply returns the first channel that satisfies the capability mask. Otherwise, +when the mask parameter is insufficient for specifying the necessary channel, +the filter_fn routine can be used to disposition the available channels in the +system. The filter_fn routine is called once for each free channel in the +system. Upon seeing a suitable channel filter_fn returns DMA_ACK which flags +that channel to be the return value from dma_request_channel. A channel +allocated via this interface is exclusive to the caller, until +dma_release_channel() is called. + +2. Set slave and controller specific parameters +Next step is always to pass some specific information to the DMA driver. Most of +the generic information which a slave DMA can use is in struct dma_slave_config. +It allows the clients to specify DMA direction, DMA addresses, bus widths, DMA +burst lengths etc. If some DMA controllers have more parameters to be sent then +they should try to embed struct dma_slave_config in their controller specific +structure. That gives flexibility to client to pass more parameters, if +required. + +Interface: +int dmaengine_slave_config(struct dma_chan *chan, + struct dma_slave_config *config) + +3. Get a descriptor for transaction +For slave usage the various modes of slave transfers supported by the +DMA-engine are: +slave_sg - DMA a list of scatter gather buffers from/to a peripheral +dma_cyclic - Perform a cyclic DMA operation from/to a peripheral till the + operation is explicitly stopped. +The non NULL return of this transfer API represents a "descriptor" for the given +transaction. + +Interface: +struct dma_async_tx_descriptor *(*chan->device->device_prep_dma_sg)( + struct dma_chan *chan, + struct scatterlist *dst_sg, unsigned int dst_nents, + struct scatterlist *src_sg, unsigned int src_nents, + unsigned long flags); +struct dma_async_tx_descriptor *(*chan->device->device_prep_dma_cyclic)( + struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len, + size_t period_len, enum dma_data_direction direction); + +4. Submit the transaction and wait for callback notification +To schedule the transaction to be scheduled by dma device, the "descriptor" +returned in above (3) needs to be submitted. +To tell the dma driver that a transaction is ready to be serviced, the +descriptor->submit() callback needs to be invoked. This chains the descriptor to +the pending queue. +The transactions in the pending queue can be activated by calling the +issue_pending API. If channel is idle then the first transaction in queue is +started and subsequent ones queued up. +On completion of the DMA operation the next in queue is submitted and a tasklet +triggered. The tasklet would then call the client driver completion callback +routine for notification, if set. +Interface: +void dma_async_issue_pending(struct dma_chan *chan); + +============================================================================== + +Additional usage notes for dma driver writers +1/ Although DMA engine specifies that completion callback routines cannot submit +any new operations, but typically for slave DMA subsequent transaction may not +be available for submit prior to callback routine being called. This requirement +is not a requirement for DMA-slave devices. But they should take care to drop +the spin-lock they might be holding before calling the callback routine diff --git a/Documentation/dontdiff b/Documentation/dontdiff index 470d3dba1a69..dfa6fc6e4b28 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff @@ -1,6 +1,8 @@ *.a *.aux *.bin +*.bz2 +*.cis *.cpio *.csp *.dsp @@ -8,6 +10,8 @@ *.elf *.eps *.fw +*.gcno +*.gcov *.gen.S *.gif *.grep @@ -19,14 +23,20 @@ *.ko *.log *.lst +*.lzma +*.lzo +*.mo *.moc *.mod.c *.o *.o.* +*.order *.orig *.out +*.patch *.pdf *.png +*.pot *.ps *.rej *.s @@ -39,16 +49,22 @@ *.tex *.ver *.xml +*.xz *_MODULES *_vga16.c *~ +\#*# *.9 -*.9.gz .* +.*.d .mm 53c700_d.h CVS ChangeSet +GPATH +GRTAGS +GSYMS +GTAGS Image Kerntypes Module.markers @@ -57,15 +73,14 @@ PENDING SCCS System.map* TAGS +aconf +af_names.h aic7*reg.h* aic7*reg_print.c* aic7*seq.h* aicasm aicdb.h* -altivec1.c -altivec2.c -altivec4.c -altivec8.c +altivec*.c asm-offsets.h asm_offsets.h autoconf.h* @@ -80,6 +95,7 @@ btfixupprep build bvmlinux bzImage* +capability_names.h capflags.c classlist.h* comp*.log @@ -88,7 +104,8 @@ conf config config-* config_data.h* -config_data.gz* +config.mak +config.mak.autogen conmakehash consolemap_deftbl.c* cpustr.h @@ -96,7 +113,9 @@ crc32table.h* cscope.* defkeymap.c devlist.h* +dnotify_test docproc +dslm elf2ecoff elfconfig.h* evergreen_reg_safe.h @@ -105,6 +124,7 @@ flask.h fore200e_mkfirm fore200e_pca_fw.c* gconf +gconf.glade.h gen-devlist gen_crc32table gen_init_cpio @@ -112,11 +132,12 @@ generated genheaders genksyms *_gray256.c +hpet_example +hugepage-mmap +hugepage-shm ihex2fw ikconfig.h* inat-tables.c -initramfs_data.cpio -initramfs_data.cpio.gz initramfs_list int16.c int1.c @@ -133,15 +154,19 @@ kxgettext lkc_defs.h lex.c lex.*.c +linux logo_*.c logo_*_clut224.c logo_*_mono.c lxdialog +mach mach-types mach-types.h machtypes.h map +map_hugetlb maui_boot.h +media mconf miboot* mk_elfconfig @@ -150,23 +175,29 @@ mkbugboot mkcpustr mkdep mkprep +mkregtable mktables mktree modpost modules.builtin modules.order modversions.h* +nconf ncscope.* offset.h offsets.h oui.c* +page-types parse.c parse.h patches* pca200e.bin pca200e_ecd.bin2 -piggy.gz +perf.data +perf.data.old +perf-archive piggyback +piggy.gzip piggy.S pnmtologo ppc_defs.h* @@ -177,10 +208,9 @@ r200_reg_safe.h r300_reg_safe.h r420_reg_safe.h r600_reg_safe.h -raid6altivec*.c -raid6int*.c -raid6tables.c +recordmcount relocs +rlim_names.h rn50_reg_safe.h rs600_reg_safe.h rv515_reg_safe.h @@ -194,6 +224,7 @@ split-include syscalltab.h tables.c tags +test_get_len tftpboot.img timeconst.h times.h* @@ -210,10 +241,13 @@ vdso32.so.dbg vdso64.lds vdso64.so.dbg version.h* +vmImage vmlinux vmlinux-* vmlinux.aout +vmlinux.bin.all vmlinux.lds +vmlinuz voffset.h vsyscall.lds vsyscall_32.lds diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 19132cadc18d..b1c921c27519 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -6,6 +6,42 @@ be removed from this file. --------------------------- +What: x86 floppy disable_hlt +When: 2012 +Why: ancient workaround of dubious utility clutters the + code used by everybody else. +Who: Len Brown <len.brown@intel.com> + +--------------------------- + +What: CONFIG_APM_CPU_IDLE, and its ability to call APM BIOS in idle +When: 2012 +Why: This optional sub-feature of APM is of dubious reliability, + and ancient APM laptops are likely better served by calling HLT. + Deleting CONFIG_APM_CPU_IDLE allows x86 to stop exporting + the pm_idle function pointer to modules. +Who: Len Brown <len.brown@intel.com> + +---------------------------- + +What: x86_32 "no-hlt" cmdline param +When: 2012 +Why: remove a branch from idle path, simplify code used by everybody. + This option disabled the use of HLT in idle and machine_halt() + for hardware that was flakey 15-years ago. Today we have + "idle=poll" that removed HLT from idle, and so if such a machine + is still running the upstream kernel, "idle=poll" is likely sufficient. +Who: Len Brown <len.brown@intel.com> + +---------------------------- + +What: x86 "idle=mwait" cmdline param +When: 2012 +Why: simplify x86 idle code +Who: Len Brown <len.brown@intel.com> + +---------------------------- + What: PRISM54 When: 2.6.34 @@ -262,16 +298,6 @@ Who: Michael Buesch <mb@bu3sch.de> --------------------------- -What: /sys/o2cb symlink -When: January 2010 -Why: /sys/fs/o2cb is the proper location for this information - /sys/o2cb - exists as a symlink for backwards compatibility for old versions of - ocfs2-tools. 2 years should be sufficient time to phase in new versions - which know to look in /sys/fs/o2cb. -Who: ocfs2-devel@oss.oracle.com - ---------------------------- - What: Ability for non root users to shm_get hugetlb pages based on mlock resource limits When: 2.6.31 @@ -455,23 +481,6 @@ Who: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> ---------------------------- -What: namespace cgroup (ns_cgroup) -When: 2.6.38 -Why: The ns_cgroup leads to some problems: - * cgroup creation is out-of-control - * cgroup name can conflict when pids are looping - * it is not possible to have a single process handling - a lot of namespaces without falling in a exponential creation time - * we may want to create a namespace without creating a cgroup - - The ns_cgroup is replaced by a compatibility flag 'clone_children', - where a newly created cgroup will copy the parent cgroup values. - The userspace has to manually create a cgroup and add a task to - the 'tasks' file. -Who: Daniel Lezcano <daniel.lezcano@free.fr> - ----------------------------- - What: iwlwifi disable_hw_scan module parameters When: 2.6.40 Why: Hareware scan is the prefer method for iwlwifi devices for @@ -551,3 +560,48 @@ Why: These legacy callbacks should no longer be used as i2c-core offers Who: Jean Delvare <khali@linux-fr.org> ---------------------------- + +What: Support for UVCIOC_CTRL_ADD in the uvcvideo driver +When: 2.6.42 +Why: The information passed to the driver by this ioctl is now queried + dynamically from the device. +Who: Laurent Pinchart <laurent.pinchart@ideasonboard.com> + +---------------------------- + +What: Support for UVCIOC_CTRL_MAP_OLD in the uvcvideo driver +When: 2.6.42 +Why: Used only by applications compiled against older driver versions. + Superseded by UVCIOC_CTRL_MAP which supports V4L2 menu controls. +Who: Laurent Pinchart <laurent.pinchart@ideasonboard.com> + +---------------------------- + +What: Support for UVCIOC_CTRL_GET and UVCIOC_CTRL_SET in the uvcvideo driver +When: 2.6.42 +Why: Superseded by the UVCIOC_CTRL_QUERY ioctl. +Who: Laurent Pinchart <laurent.pinchart@ideasonboard.com> + +---------------------------- + +What: For VIDIOC_S_FREQUENCY the type field must match the device node's type. + If not, return -EINVAL. +When: 3.2 +Why: It makes no sense to switch the tuner to radio mode by calling + VIDIOC_S_FREQUENCY on a video node, or to switch the tuner to tv mode by + calling VIDIOC_S_FREQUENCY on a radio node. This is the first step of a + move to more consistent handling of tv and radio tuners. +Who: Hans Verkuil <hans.verkuil@cisco.com> + +---------------------------- + +What: Opening a radio device node will no longer automatically switch the + tuner mode from tv to radio. +When: 3.3 +Why: Just opening a V4L device should not change the state of the hardware + like that. It's very unexpected and against the V4L spec. Instead, you + switch to radio mode by calling VIDIOC_S_FREQUENCY. This is the second + and last step of the move to consistent handling of tv and radio tuners. +Who: Hans Verkuil <hans.verkuil@cisco.com> + +---------------------------- diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt index b22abba78fed..13de64c7f0ab 100644 --- a/Documentation/filesystems/9p.txt +++ b/Documentation/filesystems/9p.txt @@ -25,6 +25,8 @@ Other applications are described in the following papers: http://xcpu.org/papers/cellfs-talk.pdf * PROSE I/O: Using 9p to enable Application Partitions http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf + * VirtFS: A Virtualization Aware File System pass-through + http://goo.gl/3WPDg USAGE ===== @@ -130,31 +132,20 @@ OPTIONS RESOURCES ========= -Our current recommendation is to use Inferno (http://www.vitanuova.com/nferno/index.html) -as the 9p server. You can start a 9p server under Inferno by issuing the -following command: - ; styxlisten -A tcp!*!564 export '#U*' +Protocol specifications are maintained on github: +http://ericvh.github.com/9p-rfc/ -The -A specifies an unauthenticated export. The 564 is the port # (you may -have to choose a higher port number if running as a normal user). The '#U*' -specifies exporting the root of the Linux name space. You may specify a -subset of the namespace by extending the path: '#U*'/tmp would just export -/tmp. For more information, see the Inferno manual pages covering styxlisten -and export. +9p client and server implementations are listed on +http://9p.cat-v.org/implementations -A Linux version of the 9p server is now maintained under the npfs project -on sourceforge (http://sourceforge.net/projects/npfs). The currently -maintained version is the single-threaded version of the server (named spfs) -available from the same SVN repository. +A 9p2000.L server is being developed by LLNL and can be found +at http://code.google.com/p/diod/ There are user and developer mailing lists available through the v9fs project on sourceforge (http://sourceforge.net/projects/v9fs). -A stand-alone version of the module (which should build for any 2.6 kernel) -is available via (http://github.com/ericvh/9p-sac/tree/master) - -News and other information is maintained on SWiK (http://swik.net/v9fs) -and the Wiki (http://sf.net/apps/mediawiki/v9fs/index.php). +News and other information is maintained on a Wiki. +(http://sf.net/apps/mediawiki/v9fs/index.php). Bug reports may be issued through the kernel.org bugzilla (http://bugzilla.kernel.org) diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 61b31acb9176..57d827d6071d 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -104,7 +104,7 @@ of the locking scheme for directory operations. prototypes: struct inode *(*alloc_inode)(struct super_block *sb); void (*destroy_inode)(struct inode *); - void (*dirty_inode) (struct inode *); + void (*dirty_inode) (struct inode *, int flags); int (*write_inode) (struct inode *, struct writeback_control *wbc); int (*drop_inode) (struct inode *); void (*evict_inode) (struct inode *); @@ -126,7 +126,7 @@ locking rules: s_umount alloc_inode: destroy_inode: -dirty_inode: (must not sleep) +dirty_inode: write_inode: drop_inode: !!!inode->i_lock!!! evict_inode: diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt index a167ab876c35..7cc6bf2871eb 100644 --- a/Documentation/filesystems/caching/netfs-api.txt +++ b/Documentation/filesystems/caching/netfs-api.txt @@ -673,6 +673,22 @@ storage request to complete, or it may attempt to cancel the storage request - in which case the page will not be stored in the cache this time. +BULK INODE PAGE UNCACHE +----------------------- + +A convenience routine is provided to perform an uncache on all the pages +attached to an inode. This assumes that the pages on the inode correspond on a +1:1 basis with the pages in the cache. + + void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie, + struct inode *inode); + +This takes the netfs cookie that the pages were cached with and the inode that +the pages are attached to. This function will wait for pages to finish being +written to the cache and for the cache to finish with the page generally. No +error is returned. + + ========================== INDEX AND DATA FILE UPDATE ========================== diff --git a/Documentation/filesystems/configfs/configfs_example_explicit.c b/Documentation/filesystems/configfs/configfs_example_explicit.c index fd53869f5633..1420233dfa55 100644 --- a/Documentation/filesystems/configfs/configfs_example_explicit.c +++ b/Documentation/filesystems/configfs/configfs_example_explicit.c @@ -464,9 +464,8 @@ static int __init configfs_example_init(void) return 0; out_unregister: - for (; i >= 0; i--) { + for (i--; i >= 0; i--) configfs_unregister_subsystem(example_subsys[i]); - } return ret; } @@ -475,9 +474,8 @@ static void __exit configfs_example_exit(void) { int i; - for (i = 0; example_subsys[i]; i++) { + for (i = 0; example_subsys[i]; i++) configfs_unregister_subsystem(example_subsys[i]); - } } module_init(configfs_example_init); diff --git a/Documentation/filesystems/configfs/configfs_example_macros.c b/Documentation/filesystems/configfs/configfs_example_macros.c index d8e30a0378aa..327dfbc640a9 100644 --- a/Documentation/filesystems/configfs/configfs_example_macros.c +++ b/Documentation/filesystems/configfs/configfs_example_macros.c @@ -427,9 +427,8 @@ static int __init configfs_example_init(void) return 0; out_unregister: - for (; i >= 0; i--) { + for (i--; i >= 0; i--) configfs_unregister_subsystem(example_subsys[i]); - } return ret; } @@ -438,9 +437,8 @@ static void __exit configfs_example_exit(void) { int i; - for (i = 0; example_subsys[i]; i++) { + for (i = 0; example_subsys[i]; i++) configfs_unregister_subsystem(example_subsys[i]); - } } module_init(configfs_example_init); diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index c79ec58fd7f6..3ae9bc94352a 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -226,10 +226,6 @@ acl Enables POSIX Access Control Lists support. noacl This option disables POSIX Access Control List support. -reservation - -noreservation - bsddf (*) Make 'df' act like BSD. minixdf Make 'df' act like Minix. diff --git a/Documentation/filesystems/nfs/idmapper.txt b/Documentation/filesystems/nfs/idmapper.txt index b9b4192ea8b5..9c8fd6148656 100644 --- a/Documentation/filesystems/nfs/idmapper.txt +++ b/Documentation/filesystems/nfs/idmapper.txt @@ -47,8 +47,8 @@ request-key will find the first matching line and corresponding program. In this case, /some/other/program will handle all uid lookups and /usr/sbin/nfs.idmap will handle gid, user, and group lookups. -See <file:Documentation/keys-request-keys.txt> for more information about the -request-key function. +See <file:Documentation/security/keys-request-keys.txt> for more information +about the request-key function. ========= diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt index 9ed920a8cd79..7618a287aa41 100644 --- a/Documentation/filesystems/ocfs2.txt +++ b/Documentation/filesystems/ocfs2.txt @@ -46,9 +46,15 @@ errors=panic Panic and halt the machine if an error occurs. intr (*) Allow signals to interrupt cluster operations. nointr Do not allow signals to interrupt cluster operations. +noatime Do not update access time. +relatime(*) Update atime if the previous atime is older than + mtime or ctime +strictatime Always update atime, but the minimum update interval + is specified by atime_quantum. atime_quantum=60(*) OCFS2 will not update atime unless this number of seconds has passed since the last update. - Set to zero to always update atime. + Set to zero to always update atime. This option need + work with strictatime. data=ordered (*) All data are forced directly out to the main file system prior to its metadata being committed to the journal. diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 60740e8ecb37..db3b1aba32a3 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -574,6 +574,12 @@ The contents of each smp_affinity file is the same by default: > cat /proc/irq/0/smp_affinity ffffffff +There is an alternate interface, smp_affinity_list which allows specifying +a cpu range instead of a bitmask: + + > cat /proc/irq/0/smp_affinity_list + 1024-1031 + The default_smp_affinity mask applies to all non-active IRQs, which are the IRQs which have not yet been allocated/activated, and hence which lack a /proc/irq/[0-9]* directory. @@ -583,12 +589,13 @@ reports itself as being attached. This hardware locality information does not include information about any possible driver locality preference. prof_cpu_mask specifies which CPUs are to be profiled by the system wide -profiler. Default value is ffffffff (all cpus). +profiler. Default value is ffffffff (all cpus if there are only 32 of them). The way IRQs are routed is handled by the IO-APIC, and it's Round Robin between all the CPUs which are allowed to handle it. As usual the kernel has more info than you and does a better job than you, so the defaults are the -best choice for almost everyone. +best choice for almost everyone. [Note this applies only to those IO-APIC's +that support "Round Robin" interrupt distribution.] There are three more important subdirectories in /proc: net, scsi, and sys. The general rule is that the contents, or even the existence of these @@ -836,6 +843,7 @@ Provides counts of softirq handlers serviced since boot time, for each cpu. TASKLET: 0 0 0 290 SCHED: 27035 26983 26971 26746 HRTIMER: 0 0 0 0 + RCU: 1678 1769 2178 2250 1.3 IDE devices in /proc/ide diff --git a/Documentation/filesystems/ubifs.txt b/Documentation/filesystems/ubifs.txt index d7b13b01e980..8e4fab639d9c 100644 --- a/Documentation/filesystems/ubifs.txt +++ b/Documentation/filesystems/ubifs.txt @@ -115,28 +115,8 @@ ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs Module Parameters for Debugging =============================== -When UBIFS has been compiled with debugging enabled, there are 3 module +When UBIFS has been compiled with debugging enabled, there are 2 module parameters that are available to control aspects of testing and debugging. -The parameters are unsigned integers where each bit controls an option. -The parameters are: - -debug_msgs Selects which debug messages to display, as follows: - - Message Type Flag value - - General messages 1 - Journal messages 2 - Mount messages 4 - Commit messages 8 - LEB search messages 16 - Budgeting messages 32 - Garbage collection messages 64 - Tree Node Cache (TNC) messages 128 - LEB properties (lprops) messages 256 - Input/output messages 512 - Log messages 1024 - Scan messages 2048 - Recovery messages 4096 debug_chks Selects extra checks that UBIFS can do while running: @@ -154,11 +134,9 @@ debug_tsts Selects a mode of testing, as follows: Test mode Flag value - Force in-the-gaps method 2 Failure mode for recovery testing 4 -For example, set debug_msgs to 5 to display General messages and Mount -messages. +For example, set debug_chks to 3 to enable general and TNC checks. References diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 21a7dc467bba..88b9f5519af9 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -211,7 +211,7 @@ struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); void (*destroy_inode)(struct inode *); - void (*dirty_inode) (struct inode *); + void (*dirty_inode) (struct inode *, int flags); int (*write_inode) (struct inode *, int); void (*drop_inode) (struct inode *); void (*delete_inode) (struct inode *); diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt index 7bff3e4f35df..3fc0c31a6f5d 100644 --- a/Documentation/filesystems/xfs.txt +++ b/Documentation/filesystems/xfs.txt @@ -39,6 +39,12 @@ When mounting an XFS filesystem, the following options are accepted. drive level write caching to be enabled, for devices that support write barriers. + discard + Issue command to let the block device reclaim space freed by the + filesystem. This is useful for SSD devices, thinly provisioned + LUNs and virtual machine images, but may have a performance + impact. This option is incompatible with the nodelaylog option. + dmapi Enable the DMAPI (Data Management API) event callouts. Use with the "mtpt" option. diff --git a/Documentation/hwmon/emc6w201 b/Documentation/hwmon/emc6w201 new file mode 100644 index 000000000000..32f355aaf56b --- /dev/null +++ b/Documentation/hwmon/emc6w201 @@ -0,0 +1,42 @@ +Kernel driver emc6w201 +====================== + +Supported chips: + * SMSC EMC6W201 + Prefix: 'emc6w201' + Addresses scanned: I2C 0x2c, 0x2d, 0x2e + Datasheet: Not public + +Author: Jean Delvare <khali@linux-fr.org> + + +Description +----------- + +From the datasheet: + +"The EMC6W201 is an environmental monitoring device with automatic fan +control capability and enhanced system acoustics for noise suppression. +This ACPI compliant device provides hardware monitoring for up to six +voltages (including its own VCC) and five external thermal sensors, +measures the speed of up to five fans, and controls the speed of +multiple DC fans using three Pulse Width Modulator (PWM) outputs. Note +that it is possible to control more than three fans by connecting two +fans to one PWM output. The EMC6W201 will be available in a 36-pin +QFN package." + +The device is functionally close to the EMC6D100 series, but is +register-incompatible. + +The driver currently only supports the monitoring of the voltages, +temperatures and fan speeds. Limits can be changed. Alarms are not +supported, and neither is fan speed control. + + +Known Systems With EMC6W201 +--------------------------- + +The EMC6W201 is a rare device, only found on a few systems, made in +2005 and 2006. Known systems with this device: +* Dell Precision 670 workstation +* Gigabyte 2CEWH mainboard diff --git a/Documentation/hwmon/f71882fg b/Documentation/hwmon/f71882fg index df02245d1419..de91c0db5846 100644 --- a/Documentation/hwmon/f71882fg +++ b/Documentation/hwmon/f71882fg @@ -6,6 +6,10 @@ Supported chips: Prefix: 'f71808e' Addresses scanned: none, address read from Super I/O config space Datasheet: Not public + * Fintek F71808A + Prefix: 'f71808a' + Addresses scanned: none, address read from Super I/O config space + Datasheet: Not public * Fintek F71858FG Prefix: 'f71858fg' Addresses scanned: none, address read from Super I/O config space @@ -18,6 +22,10 @@ Supported chips: Prefix: 'f71869' Addresses scanned: none, address read from Super I/O config space Datasheet: Available from the Fintek website + * Fintek F71869A + Prefix: 'f71869a' + Addresses scanned: none, address read from Super I/O config space + Datasheet: Not public * Fintek F71882FG and F71883FG Prefix: 'f71882fg' Addresses scanned: none, address read from Super I/O config space diff --git a/Documentation/hwmon/fam15h_power b/Documentation/hwmon/fam15h_power new file mode 100644 index 000000000000..a92918e0bd69 --- /dev/null +++ b/Documentation/hwmon/fam15h_power @@ -0,0 +1,37 @@ +Kernel driver fam15h_power +========================== + +Supported chips: +* AMD Family 15h Processors + + Prefix: 'fam15h_power' + Addresses scanned: PCI space + Datasheets: + BIOS and Kernel Developer's Guide (BKDG) For AMD Family 15h Processors + (not yet published) + +Author: Andreas Herrmann <andreas.herrmann3@amd.com> + +Description +----------- + +This driver permits reading of registers providing power information +of AMD Family 15h processors. + +For AMD Family 15h processors the following power values can be +calculated using different processor northbridge function registers: + +* BasePwrWatts: Specifies in watts the maximum amount of power + consumed by the processor for NB and logic external to the core. +* ProcessorPwrWatts: Specifies in watts the maximum amount of power + the processor can support. +* CurrPwrWatts: Specifies in watts the current amount of power being + consumed by the processor. + +This driver provides ProcessorPwrWatts and CurrPwrWatts: +* power1_crit (ProcessorPwrWatts) +* power1_input (CurrPwrWatts) + +On multi-node processors the calculated value is for the entire +package and not for a single node. Thus the driver creates sysfs +attributes only for internal node0 of a multi-node processor. diff --git a/Documentation/hwmon/k10temp b/Documentation/hwmon/k10temp index d2b56a4fd1f5..a10f73624ad3 100644 --- a/Documentation/hwmon/k10temp +++ b/Documentation/hwmon/k10temp @@ -9,8 +9,9 @@ Supported chips: Socket S1G3: Athlon II, Sempron, Turion II * AMD Family 11h processors: Socket S1G2: Athlon (X2), Sempron (X2), Turion X2 (Ultra) -* AMD Family 12h processors: "Llano" -* AMD Family 14h processors: "Brazos" (C/E/G-Series) +* AMD Family 12h processors: "Llano" (E2/A4/A6/A8-Series) +* AMD Family 14h processors: "Brazos" (C/E/G/Z-Series) +* AMD Family 15h processors: "Bulldozer" Prefix: 'k10temp' Addresses scanned: PCI space @@ -19,12 +20,16 @@ Supported chips: http://support.amd.com/us/Processor_TechDocs/31116.pdf BIOS and Kernel Developer's Guide (BKDG) for AMD Family 11h Processors: http://support.amd.com/us/Processor_TechDocs/41256.pdf + BIOS and Kernel Developer's Guide (BKDG) for AMD Family 12h Processors: + http://support.amd.com/us/Processor_TechDocs/41131.pdf BIOS and Kernel Developer's Guide (BKDG) for AMD Family 14h Models 00h-0Fh Processors: http://support.amd.com/us/Processor_TechDocs/43170.pdf Revision Guide for AMD Family 10h Processors: http://support.amd.com/us/Processor_TechDocs/41322.pdf Revision Guide for AMD Family 11h Processors: http://support.amd.com/us/Processor_TechDocs/41788.pdf + Revision Guide for AMD Family 12h Processors: + http://support.amd.com/us/Processor_TechDocs/44739.pdf Revision Guide for AMD Family 14h Models 00h-0Fh Processors: http://support.amd.com/us/Processor_TechDocs/47534.pdf AMD Family 11h Processor Power and Thermal Data Sheet for Notebooks: @@ -40,7 +45,7 @@ Description ----------- This driver permits reading of the internal temperature sensor of AMD -Family 10h/11h/12h/14h processors. +Family 10h/11h/12h/14h/15h processors. All these processors have a sensor, but on those for Socket F or AM2+, the sensor may return inconsistent values (erratum 319). The driver diff --git a/Documentation/hwmon/max6650 b/Documentation/hwmon/max6650 index c565650fcfc6..58d9644a2bde 100644 --- a/Documentation/hwmon/max6650 +++ b/Documentation/hwmon/max6650 @@ -2,9 +2,13 @@ Kernel driver max6650 ===================== Supported chips: - * Maxim 6650 / 6651 + * Maxim MAX6650 Prefix: 'max6650' - Addresses scanned: I2C 0x1b, 0x1f, 0x48, 0x4b + Addresses scanned: none + Datasheet: http://pdfserv.maxim-ic.com/en/ds/MAX6650-MAX6651.pdf + * Maxim MAX6651 + Prefix: 'max6651' + Addresses scanned: none Datasheet: http://pdfserv.maxim-ic.com/en/ds/MAX6650-MAX6651.pdf Authors: @@ -15,10 +19,10 @@ Authors: Description ----------- -This driver implements support for the Maxim 6650/6651 +This driver implements support for the Maxim MAX6650 and MAX6651. -The 2 devices are very similar, but the Maxim 6550 has a reduced feature -set, e.g. only one fan-input, instead of 4 for the 6651. +The 2 devices are very similar, but the MAX6550 has a reduced feature +set, e.g. only one fan-input, instead of 4 for the MAX6651. The driver is not able to distinguish between the 2 devices. @@ -36,6 +40,13 @@ fan1_div rw sets the speed range the inputs can handle. Legal values are 1, 2, 4, and 8. Use lower values for faster fans. +Usage notes +----------- + +This driver does not auto-detect devices. You will have to instantiate the +devices explicitly. Please see Documentation/i2c/instantiating-devices for +details. + Module parameters ----------------- diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801 index 6df69765ccb7..2871fd500349 100644 --- a/Documentation/i2c/busses/i2c-i801 +++ b/Documentation/i2c/busses/i2c-i801 @@ -19,6 +19,7 @@ Supported adapters: * Intel 6 Series (PCH) * Intel Patsburg (PCH) * Intel DH89xxCC (PCH) + * Intel Panther Point (PCH) Datasheets: Publicly available at the Intel website On Intel Patsburg and later chipsets, both the normal host SMBus controller diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients index 5ebf5af1d716..5aa53374ea2a 100644 --- a/Documentation/i2c/writing-clients +++ b/Documentation/i2c/writing-clients @@ -38,7 +38,7 @@ static struct i2c_driver foo_driver = { .name = "foo", }, - .id_table = foo_ids, + .id_table = foo_idtable, .probe = foo_probe, .remove = foo_remove, /* if device autodetection is needed: */ diff --git a/Documentation/input/elantech.txt b/Documentation/input/elantech.txt index 56941ae1f5db..db798af5ef98 100644 --- a/Documentation/input/elantech.txt +++ b/Documentation/input/elantech.txt @@ -34,7 +34,8 @@ Contents Currently the Linux Elantech touchpad driver is aware of two different hardware versions unimaginatively called version 1 and version 2. Version 1 is found in "older" laptops and uses 4 bytes per packet. Version 2 seems to -be introduced with the EeePC and uses 6 bytes per packet. +be introduced with the EeePC and uses 6 bytes per packet, and provides +additional features such as position of two fingers, and width of the touch. The driver tries to support both hardware versions and should be compatible with the Xorg Synaptics touchpad driver and its graphical configuration @@ -94,18 +95,44 @@ Currently the Linux Elantech touchpad driver provides two extra knobs under can check these bits and reject any packet that appears corrupted. Using this knob you can bypass that check. - It is not known yet whether hardware version 2 provides the same parity - bits. Hence checking is disabled by default. Currently even turning it on - will do nothing. - + Hardware version 2 does not provide the same parity bits. Only some basic + data consistency checking can be done. For now checking is disabled by + default. Currently even turning it on will do nothing. ///////////////////////////////////////////////////////////////////////////// +3. Differentiating hardware versions + ================================= + +To detect the hardware version, read the version number as param[0].param[1].param[2] + + 4 bytes version: (after the arrow is the name given in the Dell-provided driver) + 02.00.22 => EF013 + 02.06.00 => EF019 +In the wild, there appear to be more versions, such as 00.01.64, 01.00.21, +02.00.00, 02.00.04, 02.00.06. + + 6 bytes: + 02.00.30 => EF113 + 02.08.00 => EF023 + 02.08.XX => EF123 + 02.0B.00 => EF215 + 04.01.XX => Scroll_EF051 + 04.02.XX => EF051 +In the wild, there appear to be more versions, such as 04.03.01, 04.04.11. There +appears to be almost no difference, except for EF113, which does not report +pressure/width and has different data consistency checks. + +Probably all the versions with param[0] <= 01 can be considered as +4 bytes/firmware 1. The versions < 02.08.00, with the exception of 02.00.30, as +4 bytes/firmware 2. Everything >= 02.08.00 can be considered as 6 bytes. + +///////////////////////////////////////////////////////////////////////////// -3. Hardware version 1 +4. Hardware version 1 ================== -3.1 Registers +4.1 Registers ~~~~~~~~~ By echoing a hexadecimal value to a register it contents can be altered. @@ -168,7 +195,7 @@ For example: smart edge activation area width? -3.2 Native relative mode 4 byte packet format +4.2 Native relative mode 4 byte packet format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ byte 0: @@ -226,9 +253,13 @@ byte 3: positive = down -3.3 Native absolute mode 4 byte packet format +4.3 Native absolute mode 4 byte packet format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +EF013 and EF019 have a special behaviour (due to a bug in the firmware?), and +when 1 finger is touching, the first 2 position reports must be discarded. +This counting is reset whenever a different number of fingers is reported. + byte 0: firmware version 1.x: @@ -279,11 +310,11 @@ byte 3: ///////////////////////////////////////////////////////////////////////////// -4. Hardware version 2 +5. Hardware version 2 ================== -4.1 Registers +5.1 Registers ~~~~~~~~~ By echoing a hexadecimal value to a register it contents can be altered. @@ -316,16 +347,41 @@ For example: 0x7f = never i.e. tap again to release) -4.2 Native absolute mode 6 byte packet format +5.2 Native absolute mode 6 byte packet format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -4.2.1 One finger touch +5.2.1 Parity checking and packet re-synchronization +There is no parity checking, however some consistency checks can be performed. + +For instance for EF113: + SA1= packet[0]; + A1 = packet[1]; + B1 = packet[2]; + SB1= packet[3]; + C1 = packet[4]; + D1 = packet[5]; + if( (((SA1 & 0x3C) != 0x3C) && ((SA1 & 0xC0) != 0x80)) || // check Byte 1 + (((SA1 & 0x0C) != 0x0C) && ((SA1 & 0xC0) == 0x80)) || // check Byte 1 (one finger pressed) + (((SA1 & 0xC0) != 0x80) && (( A1 & 0xF0) != 0x00)) || // check Byte 2 + (((SB1 & 0x3E) != 0x38) && ((SA1 & 0xC0) != 0x80)) || // check Byte 4 + (((SB1 & 0x0E) != 0x08) && ((SA1 & 0xC0) == 0x80)) || // check Byte 4 (one finger pressed) + (((SA1 & 0xC0) != 0x80) && (( C1 & 0xF0) != 0x00)) ) // check Byte 5 + // error detected + +For all the other ones, there are just a few constant bits: + if( ((packet[0] & 0x0C) != 0x04) || + ((packet[3] & 0x0f) != 0x02) ) + // error detected + + +In case an error is detected, all the packets are shifted by one (and packet[0] is discarded). + +5.2.1 One/Three finger touch ~~~~~~~~~~~~~~~~ byte 0: bit 7 6 5 4 3 2 1 0 - n1 n0 . . . . R L + n1 n0 w3 w2 . . R L L, R = 1 when Left, Right mouse button pressed n1..n0 = numbers of fingers on touchpad @@ -333,24 +389,40 @@ byte 0: byte 1: bit 7 6 5 4 3 2 1 0 - . . . . . x10 x9 x8 + p7 p6 p5 p4 . x10 x9 x8 byte 2: bit 7 6 5 4 3 2 1 0 - x7 x6 x5 x4 x4 x2 x1 x0 + x7 x6 x5 x4 x3 x2 x1 x0 x10..x0 = absolute x value (horizontal) byte 3: bit 7 6 5 4 3 2 1 0 - . . . . . . . . + n4 vf w1 w0 . . . b2 + + n4 = set if more than 3 fingers (only in 3 fingers mode) + vf = a kind of flag ? (only on EF123, 0 when finger is over one + of the buttons, 1 otherwise) + w3..w0 = width of the finger touch (not EF113) + b2 (on EF113 only, 0 otherwise), b2.R.L indicates one button pressed: + 0 = none + 1 = Left + 2 = Right + 3 = Middle (Left and Right) + 4 = Forward + 5 = Back + 6 = Another one + 7 = Another one byte 4: bit 7 6 5 4 3 2 1 0 - . . . . . . y9 y8 + p3 p1 p2 p0 . . y9 y8 + + p7..p0 = pressure (not EF113) byte 5: @@ -363,6 +435,11 @@ byte 5: 4.2.2 Two finger touch ~~~~~~~~~~~~~~~~ +Note that the two pairs of coordinates are not exactly the coordinates of the +two fingers, but only the pair of the lower-left and upper-right coordinates. +So the actual fingers might be situated on the other diagonal of the square +defined by these two points. + byte 0: bit 7 6 5 4 3 2 1 0 @@ -376,14 +453,14 @@ byte 1: bit 7 6 5 4 3 2 1 0 ax7 ax6 ax5 ax4 ax3 ax2 ax1 ax0 - ax8..ax0 = first finger absolute x value + ax8..ax0 = lower-left finger absolute x value byte 2: bit 7 6 5 4 3 2 1 0 ay7 ay6 ay5 ay4 ay3 ay2 ay1 ay0 - ay8..ay0 = first finger absolute y value + ay8..ay0 = lower-left finger absolute y value byte 3: @@ -395,11 +472,11 @@ byte 4: bit 7 6 5 4 3 2 1 0 bx7 bx6 bx5 bx4 bx3 bx2 bx1 bx0 - bx8..bx0 = second finger absolute x value + bx8..bx0 = upper-right finger absolute x value byte 5: bit 7 6 5 4 3 2 1 0 by7 by8 by5 by4 by3 by2 by1 by0 - by8..by0 = second finger absolute y value + by8..by0 = upper-right finger absolute y value diff --git a/Documentation/input/rotary-encoder.txt b/Documentation/input/rotary-encoder.txt index 943e8f6f2b15..92e68bce13a4 100644 --- a/Documentation/input/rotary-encoder.txt +++ b/Documentation/input/rotary-encoder.txt @@ -9,6 +9,9 @@ peripherals with two wires. The outputs are phase-shifted by 90 degrees and by triggering on falling and rising edges, the turn direction can be determined. +Some encoders have both outputs low in stable states, whereas others also have +a stable state with both outputs high (half-period mode). + The phase diagram of these two outputs look like this: _____ _____ _____ @@ -26,6 +29,8 @@ The phase diagram of these two outputs look like this: |<-------->| one step + |<-->| + one step (half-period mode) For more information, please see http://en.wikipedia.org/wiki/Rotary_encoder @@ -34,6 +39,13 @@ For more information, please see 1. Events / state machine ------------------------- +In half-period mode, state a) and c) above are used to determine the +rotational direction based on the last stable state. Events are reported in +states b) and d) given that the new stable state is different from the last +(i.e. the rotation was not reversed half-way). + +Otherwise, the following apply: + a) Rising edge on channel A, channel B in low state This state is used to recognize a clockwise turn @@ -96,6 +108,7 @@ static struct rotary_encoder_platform_data my_rotary_encoder_info = { .gpio_b = GPIO_ROTARY_B, .inverted_a = 0, .inverted_b = 0, + .half_period = false, }; static struct platform_device rotary_encoder_device = { diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index a0a5d82b6b0b..3a46e360496d 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt @@ -166,7 +166,6 @@ Code Seq#(hex) Include File Comments 'T' all arch/x86/include/asm/ioctls.h conflict! 'T' C0-DF linux/if_tun.h conflict! 'U' all sound/asound.h conflict! -'U' 00-0F drivers/media/video/uvc/uvcvideo.h conflict! 'U' 00-CF linux/uinput.h conflict! 'U' 00-EF linux/usbdevice_fs.h 'U' C0-CF drivers/bluetooth/hci_uart.h @@ -259,6 +258,7 @@ Code Seq#(hex) Include File Comments 't' 80-8F linux/isdn_ppp.h 't' 90 linux/toshiba.h 'u' 00-1F linux/smb_fs.h gone +'u' 20-3F linux/uvcvideo.h USB video class host driver 'v' 00-1F linux/ext2_fs.h conflict! 'v' 00-1F linux/fs.h conflict! 'v' 00-0F linux/sonypi.h conflict! @@ -304,6 +304,7 @@ Code Seq#(hex) Include File Comments 0xB0 all RATIO devices in development: <mailto:vgo@ratio.de> 0xB1 00-1F PPPoX <mailto:mostrows@styx.uwaterloo.ca> +0xB3 00 linux/mmc/ioctl.h 0xC0 00-0F linux/usb/iowarrior.h 0xCB 00-1F CBM serial IEC bus in development: <mailto:michael.klein@puffin.lb.shuttle.de> diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt index 7c2a89ba674c..68e32bb6bd80 100644 --- a/Documentation/kbuild/kbuild.txt +++ b/Documentation/kbuild/kbuild.txt @@ -201,3 +201,16 @@ KBUILD_ENABLE_EXTRA_GCC_CHECKS -------------------------------------------------- If enabled over the make command line with "W=1", it turns on additional gcc -W... options for more extensive build-time checking. + +KBUILD_BUILD_TIMESTAMP +-------------------------------------------------- +Setting this to a date string overrides the timestamp used in the +UTS_VERSION definition (uname -v in the running kernel). The value has to +be a string that can be passed to date -d. The default value +is the output of the date command at one point during build. + +KBUILD_BUILD_USER, KBUILD_BUILD_HOST +-------------------------------------------------- +These two variables allow to override the user@host string displayed during +boot and in /proc/version. The default value is the output of the commands +whoami and host, respectively. diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt index b507d61fd41c..44e2649fbb29 100644 --- a/Documentation/kbuild/kconfig-language.txt +++ b/Documentation/kbuild/kconfig-language.txt @@ -113,6 +113,13 @@ applicable everywhere (see syntax). That will limit the usefulness but on the other hand avoid the illegal configurations all over. +- limiting menu display: "visible if" <expr> + This attribute is only applicable to menu blocks, if the condition is + false, the menu block is not displayed to the user (the symbols + contained there can still be selected by other symbols, though). It is + similar to a conditional "prompt" attribude for individual menu + entries. Default value of "visible" is true. + - numerical ranges: "range" <symbol> <symbol> ["if" <expr>] This allows to limit the range of possible input values for int and hex symbols. The user can only input a value which is larger than @@ -303,7 +310,8 @@ menu: "endmenu" This defines a menu block, see "Menu structure" above for more -information. The only possible options are dependencies. +information. The only possible options are dependencies and "visible" +attributes. if: @@ -381,3 +389,25 @@ config FOO limits FOO to module (=m) or disabled (=n). +Kconfig symbol existence +~~~~~~~~~~~~~~~~~~~~~~~~ +The following two methods produce the same kconfig symbol dependencies +but differ greatly in kconfig symbol existence (production) in the +generated config file. + +case 1: + +config FOO + tristate "about foo" + depends on BAR + +vs. case 2: + +if BAR +config FOO + tristate "about foo" +endif + +In case 1, the symbol FOO will always exist in the config file (given +no other dependencies). In case 2, the symbol FOO will only exist in +the config file if BAR is enabled. diff --git a/Documentation/kbuild/kconfig.txt b/Documentation/kbuild/kconfig.txt index cca46b1a0f6c..c313d71324b4 100644 --- a/Documentation/kbuild/kconfig.txt +++ b/Documentation/kbuild/kconfig.txt @@ -48,11 +48,6 @@ KCONFIG_OVERWRITECONFIG If you set KCONFIG_OVERWRITECONFIG in the environment, Kconfig will not break symlinks when .config is a symlink to somewhere else. -KCONFIG_NOTIMESTAMP --------------------------------------------------- -If this environment variable exists and is non-null, the timestamp line -in generated .config files is omitted. - ______________________________________________________________________ Environment variables for '{allyes/allmod/allno/rand}config' diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index 5d145bb443c0..47435e56c5da 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt @@ -40,11 +40,13 @@ This document describes the Linux kernel Makefiles. --- 6.6 Commands useful for building a boot image --- 6.7 Custom kbuild commands --- 6.8 Preprocessing linker scripts + --- 6.9 Generic header files === 7 Kbuild syntax for exported headers --- 7.1 header-y --- 7.2 objhdr-y --- 7.3 destination-y + --- 7.4 generic-y === 8 Kbuild Variables === 9 Makefile language @@ -499,6 +501,18 @@ more details, with real examples. gcc >= 3.00. For gcc < 3.00, -malign-functions=4 is used. Note: cc-option-align uses KBUILD_CFLAGS for $(CC) options + cc-disable-warning + cc-disable-warning checks if gcc supports a given warning and returns + the commandline switch to disable it. This special function is needed, + because gcc 4.4 and later accept any unknown -Wno-* option and only + warn about it if there is another warning in the source file. + + Example: + KBUILD_CFLAGS += $(call cc-disable-warning, unused-but-set-variable) + + In the above example, -Wno-unused-but-set-variable will be added to + KBUILD_CFLAGS only if gcc really accepts it. + cc-version cc-version returns a numerical version of the $(CC) compiler version. The format is <major><minor> where both are two digits. So for example @@ -955,6 +969,11 @@ When kbuild executes, the following steps are followed (roughly): used when linking modules. This is often a linker script. From commandline LDFLAGS_MODULE shall be used (see kbuild.txt). + KBUILD_ARFLAGS Options for $(AR) when creating archives + + $(KBUILD_ARFLAGS) set by the top level Makefile to "D" (deterministic + mode) if this option is supported by $(AR). + --- 6.2 Add prerequisites to archprepare: The archprepare: rule is used to list prerequisites that need to be @@ -1209,6 +1228,14 @@ When kbuild executes, the following steps are followed (roughly): The kbuild infrastructure for *lds file are used in several architecture-specific files. +--- 6.9 Generic header files + + The directory include/asm-generic contains the header files + that may be shared between individual architectures. + The recommended approach how to use a generic header file is + to list the file in the Kbuild file. + See "7.4 generic-y" for further info on syntax etc. + === 7 Kbuild syntax for exported headers The kernel include a set of headers that is exported to userspace. @@ -1265,6 +1292,32 @@ See subsequent chapter for the syntax of the Kbuild file. In the example above all exported headers in the Kbuild file will be located in the directory "include/linux" when exported. + --- 7.4 generic-y + + If an architecture uses a verbatim copy of a header from + include/asm-generic then this is listed in the file + arch/$(ARCH)/include/asm/Kbuild like this: + + Example: + #arch/x86/include/asm/Kbuild + generic-y += termios.h + generic-y += rtc.h + + During the prepare phase of the build a wrapper include + file is generated in the directory: + + arch/$(ARCH)/include/generated/asm + + When a header is exported where the architecture uses + the generic header a similar wrapper is generated as part + of the set of exported headers in the directory: + + usr/include/asm + + The generated wrapper will in both cases look like the following: + + Example: termios.h + #include <asm-generic/termios.h> === 8 Kbuild Variables diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index c603ef7b0568..aa47be71df4c 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -999,7 +999,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. With this option on every unmap_single operation will result in a hardware IOTLB flush operation as opposed to batching them for performance. - + sp_off [Default Off] + By default, super page will be supported if Intel IOMMU + has the capability. With this option, super page will + not be supported. intremap= [X86-64, Intel-IOMMU] Format: { on (default) | off | nosid } on enable Interrupt Remapping (default) @@ -1777,9 +1780,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. nosoftlockup [KNL] Disable the soft-lockup detector. - noswapaccount [KNL] Disable accounting of swap in memory resource - controller. (See Documentation/cgroups/memory.txt) - nosync [HW,M68K] Disables sync negotiation for all devices. notsc [BUGS=X86-32] Disable Time Stamp Counter @@ -2015,6 +2015,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. the default. off: Turn ECRC off on: Turn ECRC on. + realloc reallocate PCI resources if allocations done by BIOS + are erroneous. pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power Management. @@ -2585,6 +2587,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. bytes of sense data); c = FIX_CAPACITY (decrease the reported device capacity by one sector); + d = NO_READ_DISC_INFO (don't use + READ_DISC_INFO command); + e = NO_READ_CAPACITY_16 (don't use + READ_CAPACITY_16 command); h = CAPACITY_HEURISTICS (decrease the reported device capacity by one sector if the number is odd); @@ -2594,6 +2600,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. unlock ejectable media); m = MAX_SECTORS_64 (don't transfer more than 64 sectors = 32 KB at a time); + n = INITIAL_READ10 (force a retry of the + initial READ(10) command); o = CAPACITY_OK (accept the capacity reported by the device); r = IGNORE_RESIDUE (the device reports diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt index 090e6ee04536..51063e681ca4 100644 --- a/Documentation/kmemleak.txt +++ b/Documentation/kmemleak.txt @@ -11,7 +11,9 @@ with the difference that the orphan objects are not freed but only reported via /sys/kernel/debug/kmemleak. A similar method is used by the Valgrind tool (memcheck --leak-check) to detect the memory leaks in user-space applications. -Kmemleak is supported on x86, arm, powerpc, sparc, sh, microblaze and tile. + +Please check DEBUG_KMEMLEAK dependencies in lib/Kconfig.debug for supported +architectures. Usage ----- diff --git a/Documentation/laptops/acer-wmi.txt b/Documentation/laptops/acer-wmi.txt deleted file mode 100644 index 4beafa663dd6..000000000000 --- a/Documentation/laptops/acer-wmi.txt +++ /dev/null @@ -1,184 +0,0 @@ -Acer Laptop WMI Extras Driver -http://code.google.com/p/aceracpi -Version 0.3 -4th April 2009 - -Copyright 2007-2009 Carlos Corbacho <carlos@strangeworlds.co.uk> - -acer-wmi is a driver to allow you to control various parts of your Acer laptop -hardware under Linux which are exposed via ACPI-WMI. - -This driver completely replaces the old out-of-tree acer_acpi, which I am -currently maintaining for bug fixes only on pre-2.6.25 kernels. All development -work is now focused solely on acer-wmi. - -Disclaimer -********** - -Acer and Wistron have provided nothing towards the development acer_acpi or -acer-wmi. All information we have has been through the efforts of the developers -and the users to discover as much as possible about the hardware. - -As such, I do warn that this could break your hardware - this is extremely -unlikely of course, but please bear this in mind. - -Background -********** - -acer-wmi is derived from acer_acpi, originally developed by Mark -Smith in 2005, then taken over by Carlos Corbacho in 2007, in order to activate -the wireless LAN card under a 64-bit version of Linux, as acerhk[1] (the -previous solution to the problem) relied on making 32 bit BIOS calls which are -not possible in kernel space from a 64 bit OS. - -[1] acerhk: http://www.cakey.de/acerhk/ - -Supported Hardware -****************** - -NOTE: The Acer Aspire One is not supported hardware. It cannot work with -acer-wmi until Acer fix their ACPI-WMI implementation on them, so has been -blacklisted until that happens. - -Please see the website for the current list of known working hardware: - -http://code.google.com/p/aceracpi/wiki/SupportedHardware - -If your laptop is not listed, or listed as unknown, and works with acer-wmi, -please contact me with a copy of the DSDT. - -If your Acer laptop doesn't work with acer-wmi, I would also like to see the -DSDT. - -To send me the DSDT, as root/sudo: - -cat /sys/firmware/acpi/tables/DSDT > dsdt - -And send me the resulting 'dsdt' file. - -Usage -***** - -On Acer laptops, acer-wmi should already be autoloaded based on DMI matching. -For non-Acer laptops, until WMI based autoloading support is added, you will -need to manually load acer-wmi. - -acer-wmi creates /sys/devices/platform/acer-wmi, and fills it with various -files whose usage is detailed below, which enables you to control some of the -following (varies between models): - -* the wireless LAN card radio -* inbuilt Bluetooth adapter -* inbuilt 3G card -* mail LED of your laptop -* brightness of the LCD panel - -Wireless -******** - -With regards to wireless, all acer-wmi does is enable the radio on the card. It -is not responsible for the wireless LED - once the radio is enabled, this is -down to the wireless driver for your card. So the behaviour of the wireless LED, -once you enable the radio, will depend on your hardware and driver combination. - -e.g. With the BCM4318 on the Acer Aspire 5020 series: - -ndiswrapper: Light blinks on when transmitting -b43: Solid light, blinks off when transmitting - -Wireless radio control is unconditionally enabled - all Acer laptops that support -acer-wmi come with built-in wireless. However, should you feel so inclined to -ever wish to remove the card, or swap it out at some point, please get in touch -with me, as we may well be able to gain some data on wireless card detection. - -The wireless radio is exposed through rfkill. - -Bluetooth -********* - -For bluetooth, this is an internal USB dongle, so once enabled, you will get -a USB device connection event, and a new USB device appears. When you disable -bluetooth, you get the reverse - a USB device disconnect event, followed by the -device disappearing again. - -Bluetooth is autodetected by acer-wmi, so if you do not have a bluetooth module -installed in your laptop, this file won't exist (please be aware that it is -quite common for Acer not to fit bluetooth to their laptops - so just because -you have a bluetooth button on the laptop, doesn't mean that bluetooth is -installed). - -For the adventurously minded - if you want to buy an internal bluetooth -module off the internet that is compatible with your laptop and fit it, then -it will work just fine with acer-wmi. - -Bluetooth is exposed through rfkill. - -3G -** - -3G is currently not autodetected, so the 'threeg' file is always created under -sysfs. So far, no-one in possession of an Acer laptop with 3G built-in appears to -have tried Linux, or reported back, so we don't have any information on this. - -If you have an Acer laptop that does have a 3G card in, please contact me so we -can properly detect these, and find out a bit more about them. - -To read the status of the 3G card (0=off, 1=on): -cat /sys/devices/platform/acer-wmi/threeg - -To enable the 3G card: -echo 1 > /sys/devices/platform/acer-wmi/threeg - -To disable the 3G card: -echo 0 > /sys/devices/platform/acer-wmi/threeg - -To set the state of the 3G card when loading acer-wmi, pass: -threeg=X (where X is 0 or 1) - -Mail LED -******** - -This can be found in most older Acer laptops supported by acer-wmi, and many -newer ones - it is built into the 'mail' button, and blinks when active. - -On newer (WMID) laptops though, we have no way of detecting the mail LED. If -your laptop identifies itself in dmesg as a WMID model, then please try loading -acer_acpi with: - -force_series=2490 - -This will use a known alternative method of reading/ writing the mail LED. If -it works, please report back to me with the DMI data from your laptop so this -can be added to acer-wmi. - -The LED is exposed through the LED subsystem, and can be found in: - -/sys/devices/platform/acer-wmi/leds/acer-wmi::mail/ - -The mail LED is autodetected, so if you don't have one, the LED device won't -be registered. - -Backlight -********* - -The backlight brightness control is available on all acer-wmi supported -hardware. The maximum brightness level is usually 15, but on some newer laptops -it's 10 (this is again autodetected). - -The backlight is exposed through the backlight subsystem, and can be found in: - -/sys/devices/platform/acer-wmi/backlight/acer-wmi/ - -Credits -******* - -Olaf Tauber, who did the real hard work when he developed acerhk -http://www.cakey.de/acerhk/ -All the authors of laptop ACPI modules in the kernel, whose work -was an inspiration in the early days of acer_acpi -Mathieu Segaud, who solved the problem with having to modprobe the driver -twice in acer_acpi 0.2. -Jim Ramsay, who added support for the WMID interface -Mark Smith, who started the original acer_acpi - -And the many people who have used both acer_acpi and acer-wmi. diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index 1565eefd6fd5..61815483efa3 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -534,6 +534,8 @@ Events that are never propagated by the driver: 0x2404 System is waking up from hibernation to undock 0x2405 System is waking up from hibernation to eject bay 0x5010 Brightness level changed/control event +0x6000 KEYBOARD: Numlock key pressed +0x6005 KEYBOARD: Fn key pressed (TO BE VERIFIED) Events that are propagated by the driver to userspace: @@ -545,6 +547,8 @@ Events that are propagated by the driver to userspace: 0x3006 Bay hotplug request (hint to power up SATA link when the optical drive tray is ejected) 0x4003 Undocked (see 0x2x04), can sleep again +0x4010 Docked into hotplug port replicator (non-ACPI dock) +0x4011 Undocked from hotplug port replicator (non-ACPI dock) 0x500B Tablet pen inserted into its storage bay 0x500C Tablet pen removed from its storage bay 0x6011 ALARM: battery is too hot @@ -552,6 +556,7 @@ Events that are propagated by the driver to userspace: 0x6021 ALARM: a sensor is too hot 0x6022 ALARM: a sensor is extremely hot 0x6030 System thermal table changed +0x6040 Nvidia Optimus/AC adapter related (TO BE VERIFIED) Battery nearly empty alarms are a last resort attempt to get the operating system to hibernate or shutdown cleanly (0x2313), or shutdown diff --git a/Documentation/lockstat.txt b/Documentation/lockstat.txt index 65f4c795015d..cef00d42ed5b 100644 --- a/Documentation/lockstat.txt +++ b/Documentation/lockstat.txt @@ -12,8 +12,9 @@ Because things like lock contention can severely impact performance. - HOW Lockdep already has hooks in the lock functions and maps lock instances to -lock classes. We build on that. The graph below shows the relation between -the lock functions and the various hooks therein. +lock classes. We build on that (see Documentation/lockdep-design.txt). +The graph below shows the relation between the lock functions and the various +hooks therein. __acquire | @@ -128,6 +129,37 @@ points are the points we're contending with. The integer part of the time values is in us. +Dealing with nested locks, subclasses may appear: + +32............................................................................................................................................................................................... +33 +34 &rq->lock: 13128 13128 0.43 190.53 103881.26 97454 3453404 0.00 401.11 13224683.11 +35 --------- +36 &rq->lock 645 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75 +37 &rq->lock 297 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a +38 &rq->lock 360 [<ffffffff8103c4c5>] select_task_rq_fair+0x1f0/0x74a +39 &rq->lock 428 [<ffffffff81045f98>] scheduler_tick+0x46/0x1fb +40 --------- +41 &rq->lock 77 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75 +42 &rq->lock 174 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a +43 &rq->lock 4715 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54 +44 &rq->lock 893 [<ffffffff81340524>] schedule+0x157/0x7b8 +45 +46............................................................................................................................................................................................... +47 +48 &rq->lock/1: 11526 11488 0.33 388.73 136294.31 21461 38404 0.00 37.93 109388.53 +49 ----------- +50 &rq->lock/1 11526 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54 +51 ----------- +52 &rq->lock/1 5645 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54 +53 &rq->lock/1 1224 [<ffffffff81340524>] schedule+0x157/0x7b8 +54 &rq->lock/1 4336 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54 +55 &rq->lock/1 181 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a + +Line 48 shows statistics for the second subclass (/1) of &rq->lock class +(subclass starts from 0), since in this case, as line 50 suggests, +double_rq_lock actually acquires a nested lock of two spinlocks. + View the top contending locks: # grep : /proc/lock_stat | head @@ -136,7 +168,7 @@ View the top contending locks: dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24 &inode->i_mutex: 161 286 18446744073709 62882.54 1244614.55 3653 20598 18446744073709 62318.60 1693822.74 &zone->lru_lock: 94 94 0.53 7.33 92.10 4366 32690 0.29 59.81 16350.06 - &inode->i_data.i_mmap_lock: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44 + &inode->i_data.i_mmap_mutex: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44 &q->__queue_lock: 48 50 0.52 31.62 86.31 774 13131 0.17 113.08 12277.52 &rq->rq_lock_key: 43 47 0.74 68.50 170.63 3706 33929 0.22 107.99 17460.62 &rq->rq_lock_key#2: 39 46 0.75 6.68 49.03 2979 32292 0.17 125.17 17137.63 diff --git a/Documentation/md.txt b/Documentation/md.txt index 2366b1c8cf19..f0eee83ff78a 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -555,7 +555,7 @@ also have sync_min sync_max The two values, given as numbers of sectors, indicate a range - withing the array where 'check'/'repair' will operate. Must be + within the array where 'check'/'repair' will operate. Must be a multiple of chunk_size. When it reaches "sync_max" it will pause, rather than complete. You can use 'select' or 'poll' on "sync_completed" to wait for diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX index fca586f5b853..93dd7a714075 100644 --- a/Documentation/mmc/00-INDEX +++ b/Documentation/mmc/00-INDEX @@ -2,3 +2,5 @@ - this file mmc-dev-attrs.txt - info on SD and MMC device attributes +mmc-dev-parts.txt + - info on SD and MMC device partitions diff --git a/Documentation/mmc/mmc-dev-attrs.txt b/Documentation/mmc/mmc-dev-attrs.txt index ff2bd685bced..8898a95b41e5 100644 --- a/Documentation/mmc/mmc-dev-attrs.txt +++ b/Documentation/mmc/mmc-dev-attrs.txt @@ -1,3 +1,13 @@ +SD and MMC Block Device Attributes +================================== + +These attributes are defined for the block devices associated with the +SD or MMC device. + +The following attributes are read/write. + + force_ro Enforce read-only access even if write protect switch is off. + SD and MMC Device Attributes ============================ diff --git a/Documentation/mmc/mmc-dev-parts.txt b/Documentation/mmc/mmc-dev-parts.txt new file mode 100644 index 000000000000..2db28b8e662f --- /dev/null +++ b/Documentation/mmc/mmc-dev-parts.txt @@ -0,0 +1,27 @@ +SD and MMC Device Partitions +============================ + +Device partitions are additional logical block devices present on the +SD/MMC device. + +As of this writing, MMC boot partitions as supported and exposed as +/dev/mmcblkXboot0 and /dev/mmcblkXboot1, where X is the index of the +parent /dev/mmcblkX. + +MMC Boot Partitions +=================== + +Read and write access is provided to the two MMC boot partitions. Due to +the sensitive nature of the boot partition contents, which often store +a bootloader or bootloader configuration tables crucial to booting the +platform, write access is disabled by default to reduce the chance of +accidental bricking. + +To enable write access to /dev/mmcblkXbootY, disable the forced read-only +access with: + +echo 0 > /sys/block/mmcblkXbootY/force_ro + +To re-enable read-only access: + +echo 1 > /sys/block/mmcblkXbootY/force_ro diff --git a/Documentation/networking/dns_resolver.txt b/Documentation/networking/dns_resolver.txt index 04ca06325b08..7f531ad83285 100644 --- a/Documentation/networking/dns_resolver.txt +++ b/Documentation/networking/dns_resolver.txt @@ -139,8 +139,8 @@ the key will be discarded and recreated when the data it holds has expired. dns_query() returns a copy of the value attached to the key, or an error if that is indicated instead. -See <file:Documentation/keys-request-key.txt> for further information about -request-key function. +See <file:Documentation/security/keys-request-key.txt> for further +information about request-key function. ========= diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c index 2bac9618c345..65968fbf1e49 100644 --- a/Documentation/networking/ifenslave.c +++ b/Documentation/networking/ifenslave.c @@ -260,7 +260,7 @@ int main(int argc, char *argv[]) case 'V': opt_V++; exclusive++; break; case '?': - fprintf(stderr, usage_msg); + fprintf(stderr, "%s", usage_msg); res = 2; goto out; } @@ -268,13 +268,13 @@ int main(int argc, char *argv[]) /* options check */ if (exclusive > 1) { - fprintf(stderr, usage_msg); + fprintf(stderr, "%s", usage_msg); res = 2; goto out; } if (opt_v || opt_V) { - printf(version); + printf("%s", version); if (opt_V) { res = 0; goto out; @@ -282,14 +282,14 @@ int main(int argc, char *argv[]) } if (opt_u) { - printf(usage_msg); + printf("%s", usage_msg); res = 0; goto out; } if (opt_h) { - printf(usage_msg); - printf(help_msg); + printf("%s", usage_msg); + printf("%s", help_msg); res = 0; goto out; } @@ -309,7 +309,7 @@ int main(int argc, char *argv[]) goto out; } else { /* Just show usage */ - fprintf(stderr, usage_msg); + fprintf(stderr, "%s", usage_msg); res = 2; goto out; } @@ -320,7 +320,7 @@ int main(int argc, char *argv[]) master_ifname = *spp++; if (master_ifname == NULL) { - fprintf(stderr, usage_msg); + fprintf(stderr, "%s", usage_msg); res = 2; goto out; } @@ -339,7 +339,7 @@ int main(int argc, char *argv[]) if (slave_ifname == NULL) { if (opt_d || opt_c) { - fprintf(stderr, usage_msg); + fprintf(stderr, "%s", usage_msg); res = 2; goto out; } diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index d3d653a5f9b9..db2a4067013c 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -106,16 +106,6 @@ inet_peer_maxttl - INTEGER when the number of entries in the pool is very small). Measured in seconds. -inet_peer_gc_mintime - INTEGER - Minimum interval between garbage collection passes. This interval is - in effect under high memory pressure on the pool. - Measured in seconds. - -inet_peer_gc_maxtime - INTEGER - Minimum interval between garbage collection passes. This interval is - in effect under low (or absent) memory pressure on the pool. - Measured in seconds. - TCP variables: somaxconn - INTEGER @@ -346,7 +336,7 @@ tcp_orphan_retries - INTEGER when RTO retransmissions remain unacknowledged. See tcp_retries2 for more details. - The default value is 7. + The default value is 8. If your machine is a loaded WEB server, you should think about lowering this value, such sockets may consume significant resources. Cf. tcp_max_orphans. @@ -394,7 +384,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max min: Minimal size of receive buffer used by TCP sockets. It is guaranteed to each TCP socket, even under moderate memory pressure. - Default: 8K + Default: 1 page default: initial size of receive buffer used by TCP sockets. This value overrides net.core.rmem_default used by other protocols. @@ -483,7 +473,7 @@ tcp_window_scaling - BOOLEAN tcp_wmem - vector of 3 INTEGERs: min, default, max min: Amount of memory reserved for send buffers for TCP sockets. Each TCP socket has rights to use it due to fact of its birth. - Default: 4K + Default: 1 page default: initial size of send buffer used by TCP sockets. This value overrides net.core.wmem_default used by other protocols. @@ -553,13 +543,13 @@ udp_rmem_min - INTEGER Minimal size of receive buffer used by UDP sockets in moderation. Each UDP socket is able to use the size for receiving data, even if total pages of UDP sockets exceed udp_mem pressure. The unit is byte. - Default: 4096 + Default: 1 page udp_wmem_min - INTEGER Minimal size of send buffer used by UDP sockets in moderation. Each UDP socket is able to use the size for sending data, even if total pages of UDP sockets exceed udp_mem pressure. The unit is byte. - Default: 4096 + Default: 1 page CIPSOv4 Variables: @@ -1465,10 +1455,17 @@ sctp_mem - vector of 3 INTEGERs: min, pressure, max Default is calculated at boot time from amount of available memory. sctp_rmem - vector of 3 INTEGERs: min, default, max - See tcp_rmem for a description. + Only the first value ("min") is used, "default" and "max" are + ignored. + + min: Minimal size of receive buffer used by SCTP socket. + It is guaranteed to each SCTP socket (but not association) even + under moderate memory pressure. + + Default: 1 page sctp_wmem - vector of 3 INTEGERs: min, default, max - See tcp_wmem for a description. + Currently this tunable has no effect. addr_scope_policy - INTEGER Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00 diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt new file mode 100644 index 000000000000..4b1c0dcef84c --- /dev/null +++ b/Documentation/networking/netdev-features.txt @@ -0,0 +1,154 @@ +Netdev features mess and how to get out from it alive +===================================================== + +Author: + MichaÅ‚ MirosÅ‚aw <mirq-linux@rere.qmqm.pl> + + + + Part I: Feature sets +====================== + +Long gone are the days when a network card would just take and give packets +verbatim. Today's devices add multiple features and bugs (read: offloads) +that relieve an OS of various tasks like generating and checking checksums, +splitting packets, classifying them. Those capabilities and their state +are commonly referred to as netdev features in Linux kernel world. + +There are currently three sets of features relevant to the driver, and +one used internally by network core: + + 1. netdev->hw_features set contains features whose state may possibly + be changed (enabled or disabled) for a particular device by user's + request. This set should be initialized in ndo_init callback and not + changed later. + + 2. netdev->features set contains features which are currently enabled + for a device. This should be changed only by network core or in + error paths of ndo_set_features callback. + + 3. netdev->vlan_features set contains features whose state is inherited + by child VLAN devices (limits netdev->features set). This is currently + used for all VLAN devices whether tags are stripped or inserted in + hardware or software. + + 4. netdev->wanted_features set contains feature set requested by user. + This set is filtered by ndo_fix_features callback whenever it or + some device-specific conditions change. This set is internal to + networking core and should not be referenced in drivers. + + + + Part II: Controlling enabled features +======================================= + +When current feature set (netdev->features) is to be changed, new set +is calculated and filtered by calling ndo_fix_features callback +and netdev_fix_features(). If the resulting set differs from current +set, it is passed to ndo_set_features callback and (if the callback +returns success) replaces value stored in netdev->features. +NETDEV_FEAT_CHANGE notification is issued after that whenever current +set might have changed. + +The following events trigger recalculation: + 1. device's registration, after ndo_init returned success + 2. user requested changes in features state + 3. netdev_update_features() is called + +ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks +are treated as always returning success. + +A driver that wants to trigger recalculation must do so by calling +netdev_update_features() while holding rtnl_lock. This should not be done +from ndo_*_features callbacks. netdev->features should not be modified by +driver except by means of ndo_fix_features callback. + + + + Part III: Implementation hints +================================ + + * ndo_fix_features: + +All dependencies between features should be resolved here. The resulting +set can be reduced further by networking core imposed limitations (as coded +in netdev_fix_features()). For this reason it is safer to disable a feature +when its dependencies are not met instead of forcing the dependency on. + +This callback should not modify hardware nor driver state (should be +stateless). It can be called multiple times between successive +ndo_set_features calls. + +Callback must not alter features contained in NETIF_F_SOFT_FEATURES or +NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but +care must be taken as the change won't affect already configured VLANs. + + * ndo_set_features: + +Hardware should be reconfigured to match passed feature set. The set +should not be altered unless some error condition happens that can't +be reliably detected in ndo_fix_features. In this case, the callback +should update netdev->features to match resulting hardware state. +Errors returned are not (and cannot be) propagated anywhere except dmesg. +(Note: successful return is zero, >0 means silent error.) + + + + Part IV: Features +=================== + +For current list of features, see include/linux/netdev_features.h. +This section describes semantics of some of them. + + * Transmit checksumming + +For complete description, see comments near the top of include/linux/skbuff.h. + +Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM. +It means that device can fill TCP/UDP-like checksum anywhere in the packets +whatever headers there might be. + + * Transmit TCP segmentation offload + +NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit +set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6). + + * Transmit DMA from high memory + +On platforms where this is relevant, NETIF_F_HIGHDMA signals that +ndo_start_xmit can handle skbs with frags in high memory. + + * Transmit scatter-gather + +Those features say that ndo_start_xmit can handle fragmented skbs: +NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST --- +chained skbs (skb->next/prev list). + + * Software features + +Features contained in NETIF_F_SOFT_FEATURES are features of networking +stack. Driver should not change behaviour based on them. + + * LLTX driver (deprecated for hardware drivers) + +NETIF_F_LLTX should be set in drivers that implement their own locking in +transmit path or don't need locking at all (e.g. software tunnels). +In ndo_start_xmit, it is recommended to use a try_lock and return +NETDEV_TX_LOCKED when the spin lock fails. The locking should also properly +protect against other callbacks (the rules you need to find out). + +Don't use it for new drivers. + + * netns-local device + +NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between +network namespaces (e.g. loopback). + +Don't use it in drivers. + + * VLAN challenged + +NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN +headers. Some drivers set this because the cards can't handle the bigger MTU. +[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU +VLANs. This may be not useful, though.] diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt index 88880839ece4..64565aac6e40 100644 --- a/Documentation/power/devices.txt +++ b/Documentation/power/devices.txt @@ -520,59 +520,20 @@ Support for power domains is provided through the pwr_domain field of struct device. This field is a pointer to an object of type struct dev_power_domain, defined in include/linux/pm.h, providing a set of power management callbacks analogous to the subsystem-level and device driver callbacks that are executed -for the given device during all power transitions, in addition to the respective -subsystem-level callbacks. Specifically, the power domain "suspend" callbacks -(i.e. ->runtime_suspend(), ->suspend(), ->freeze(), ->poweroff(), etc.) are -executed after the analogous subsystem-level callbacks, while the power domain -"resume" callbacks (i.e. ->runtime_resume(), ->resume(), ->thaw(), ->restore, -etc.) are executed before the analogous subsystem-level callbacks. Error codes -returned by the "suspend" and "resume" power domain callbacks are ignored. - -Power domain ->runtime_idle() callback is executed before the subsystem-level -->runtime_idle() callback and the result returned by it is not ignored. Namely, -if it returns error code, the subsystem-level ->runtime_idle() callback will not -be called and the helper function rpm_idle() executing it will return error -code. This mechanism is intended to help platforms where saving device state -is a time consuming operation and should only be carried out if all devices -in the power domain are idle, before turning off the shared power resource(s). -Namely, the power domain ->runtime_idle() callback may return error code until -the pm_runtime_idle() helper (or its asychronous version) has been called for -all devices in the power domain (it is recommended that the returned error code -be -EBUSY in those cases), preventing the subsystem-level ->runtime_idle() -callback from being run prematurely. - -The support for device power domains is only relevant to platforms needing to -use the same subsystem-level (e.g. platform bus type) and device driver power -management callbacks in many different power domain configurations and wanting -to avoid incorporating the support for power domains into the subsystem-level -callbacks. The other platforms need not implement it or take it into account -in any way. - - -System Devices --------------- -System devices (sysdevs) follow a slightly different API, which can be found in - - include/linux/sysdev.h - drivers/base/sys.c - -System devices will be suspended with interrupts disabled, and after all other -devices have been suspended. On resume, they will be resumed before any other -devices, and also with interrupts disabled. These things occur in special -"sysdev_driver" phases, which affect only system devices. - -Thus, after the suspend_noirq (or freeze_noirq or poweroff_noirq) phase, when -the non-boot CPUs are all offline and IRQs are disabled on the remaining online -CPU, then a sysdev_driver.suspend phase is carried out, and the system enters a -sleep state (or a system image is created). During resume (or after the image -has been created or loaded) a sysdev_driver.resume phase is carried out, IRQs -are enabled on the only online CPU, the non-boot CPUs are enabled, and the -resume_noirq (or thaw_noirq or restore_noirq) phase begins. - -Code to actually enter and exit the system-wide low power state sometimes -involves hardware details that are only known to the boot firmware, and -may leave a CPU running software (from SRAM or flash memory) that monitors -the system and manages its wakeup sequence. +for the given device during all power transitions, instead of the respective +subsystem-level callbacks. Specifically, if a device's pm_domain pointer is +not NULL, the ->suspend() callback from the object pointed to by it will be +executed instead of its subsystem's (e.g. bus type's) ->suspend() callback and +anlogously for all of the remaining callbacks. In other words, power management +domain callbacks, if defined for the given device, always take precedence over +the callbacks provided by the device's subsystem (e.g. bus type). + +The support for device power management domains is only relevant to platforms +needing to use the same device driver power management callbacks in many +different power domain configurations and wanting to avoid incorporating the +support for power domains into subsystem-level callbacks, for example by +modifying the platform bus type. Other platforms need not implement it or take +it into account in any way. Device Low Power (suspend) States diff --git a/Documentation/power/regulator/machine.txt b/Documentation/power/regulator/machine.txt index bdec39b9bd75..b42419b52e44 100644 --- a/Documentation/power/regulator/machine.txt +++ b/Documentation/power/regulator/machine.txt @@ -53,11 +53,11 @@ static struct regulator_init_data regulator1_data = { Regulator-1 supplies power to Regulator-2. This relationship must be registered with the core so that Regulator-1 is also enabled when Consumer A enables its -supply (Regulator-2). The supply regulator is set by the supply_regulator_dev +supply (Regulator-2). The supply regulator is set by the supply_regulator field below:- static struct regulator_init_data regulator2_data = { - .supply_regulator_dev = &platform_regulator1_device.dev, + .supply_regulator = "regulator_name", .constraints = { .min_uV = 1800000, .max_uV = 2000000, diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt index 654097b130b4..b24875b1ced5 100644 --- a/Documentation/power/runtime_pm.txt +++ b/Documentation/power/runtime_pm.txt @@ -501,13 +501,29 @@ helper functions described in Section 4. In that case, pm_runtime_resume() should be used. Of course, for this purpose the device's run-time PM has to be enabled earlier by calling pm_runtime_enable(). -If the device bus type's or driver's ->probe() or ->remove() callback runs +If the device bus type's or driver's ->probe() callback runs pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts, they will fail returning -EAGAIN, because the device's usage counter is -incremented by the core before executing ->probe() and ->remove(). Still, it -may be desirable to suspend the device as soon as ->probe() or ->remove() has -finished, so the PM core uses pm_runtime_idle_sync() to invoke the -subsystem-level idle callback for the device at that time. +incremented by the driver core before executing ->probe(). Still, it may be +desirable to suspend the device as soon as ->probe() has finished, so the driver +core uses pm_runtime_put_sync() to invoke the subsystem-level idle callback for +the device at that time. + +Moreover, the driver core prevents runtime PM callbacks from racing with the bus +notifier callback in __device_release_driver(), which is necessary, because the +notifier is used by some subsystems to carry out operations affecting the +runtime PM functionality. It does so by calling pm_runtime_get_sync() before +driver_sysfs_remove() and the BUS_NOTIFY_UNBIND_DRIVER notifications. This +resumes the device if it's in the suspended state and prevents it from +being suspended again while those routines are being executed. + +To allow bus types and drivers to put devices into the suspended state by +calling pm_runtime_suspend() from their ->remove() routines, the driver core +executes pm_runtime_put_sync() after running the BUS_NOTIFY_UNBIND_DRIVER +notifications in __device_release_driver(). This requires bus types and +drivers to make their ->remove() callbacks avoid races with runtime PM directly, +but also it allows of more flexibility in the handling of devices during the +removal of their drivers. The user space can effectively disallow the driver of the device to power manage it at run time by changing the value of its /sys/devices/.../power/control @@ -566,11 +582,6 @@ to do this is: pm_runtime_set_active(dev); pm_runtime_enable(dev); -The PM core always increments the run-time usage counter before calling the -->prepare() callback and decrements it after calling the ->complete() callback. -Hence disabling run-time PM temporarily like this will not cause any run-time -suspend callbacks to be lost. - 7. Generic subsystem callbacks Subsystems may wish to conserve code space by using the set of generic power diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt index 1b5a5ddbc3ef..5df176ed59b8 100644 --- a/Documentation/printk-formats.txt +++ b/Documentation/printk-formats.txt @@ -9,7 +9,121 @@ If variable is of Type, use printk format specifier: size_t %zu or %zx ssize_t %zd or %zx -Raw pointer value SHOULD be printed with %p. +Raw pointer value SHOULD be printed with %p. The kernel supports +the following extended format specifiers for pointer types: + +Symbols/Function Pointers: + + %pF versatile_init+0x0/0x110 + %pf versatile_init + %pS versatile_init+0x0/0x110 + %ps versatile_init + %pB prev_fn_of_versatile_init+0x88/0x88 + + For printing symbols and function pointers. The 'S' and 's' specifiers + result in the symbol name with ('S') or without ('s') offsets. Where + this is used on a kernel without KALLSYMS - the symbol address is + printed instead. + + The 'B' specifier results in the symbol name with offsets and should be + used when printing stack backtraces. The specifier takes into + consideration the effect of compiler optimisations which may occur + when tail-call's are used and marked with the noreturn GCC attribute. + + On ia64, ppc64 and parisc64 architectures function pointers are + actually function descriptors which must first be resolved. The 'F' and + 'f' specifiers perform this resolution and then provide the same + functionality as the 'S' and 's' specifiers. + +Kernel Pointers: + + %pK 0x01234567 or 0x0123456789abcdef + + For printing kernel pointers which should be hidden from unprivileged + users. The behaviour of %pK depends on the kptr_restrict sysctl - see + Documentation/sysctl/kernel.txt for more details. + +Struct Resources: + + %pr [mem 0x60000000-0x6fffffff flags 0x2200] or + [mem 0x0000000060000000-0x000000006fffffff flags 0x2200] + %pR [mem 0x60000000-0x6fffffff pref] or + [mem 0x0000000060000000-0x000000006fffffff pref] + + For printing struct resources. The 'R' and 'r' specifiers result in a + printed resource with ('R') or without ('r') a decoded flags member. + +MAC/FDDI addresses: + + %pM 00:01:02:03:04:05 + %pMF 00-01-02-03-04-05 + %pm 000102030405 + + For printing 6-byte MAC/FDDI addresses in hex notation. The 'M' and 'm' + specifiers result in a printed address with ('M') or without ('m') byte + separators. The default byte separator is the colon (':'). + + Where FDDI addresses are concerned the 'F' specifier can be used after + the 'M' specifier to use dash ('-') separators instead of the default + separator. + +IPv4 addresses: + + %pI4 1.2.3.4 + %pi4 001.002.003.004 + %p[Ii][hnbl] + + For printing IPv4 dot-separated decimal addresses. The 'I4' and 'i4' + specifiers result in a printed address with ('i4') or without ('I4') + leading zeros. + + The additional 'h', 'n', 'b', and 'l' specifiers are used to specify + host, network, big or little endian order addresses respectively. Where + no specifier is provided the default network/big endian order is used. + +IPv6 addresses: + + %pI6 0001:0002:0003:0004:0005:0006:0007:0008 + %pi6 00010002000300040005000600070008 + %pI6c 1:2:3:4:5:6:7:8 + + For printing IPv6 network-order 16-bit hex addresses. The 'I6' and 'i6' + specifiers result in a printed address with ('I6') or without ('i6') + colon-separators. Leading zeros are always used. + + The additional 'c' specifier can be used with the 'I' specifier to + print a compressed IPv6 address as described by + http://tools.ietf.org/html/rfc5952 + +UUID/GUID addresses: + + %pUb 00010203-0405-0607-0809-0a0b0c0d0e0f + %pUB 00010203-0405-0607-0809-0A0B0C0D0E0F + %pUl 03020100-0504-0706-0809-0a0b0c0e0e0f + %pUL 03020100-0504-0706-0809-0A0B0C0E0E0F + + For printing 16-byte UUID/GUIDs addresses. The additional 'l', 'L', + 'b' and 'B' specifiers are used to specify a little endian order in + lower ('l') or upper case ('L') hex characters - and big endian order + in lower ('b') or upper case ('B') hex characters. + + Where no additional specifiers are used the default little endian + order with lower case hex characters will be printed. + +struct va_format: + + %pV + + For printing struct va_format structures. These contain a format string + and va_list as follows: + + struct va_format { + const char *fmt; + va_list *va; + }; + + Do not use this feature without some mechanism to verify the + correctness of the format string and va_list arguments. u64 SHOULD be printed with %llu/%llx, (unsigned long long): @@ -32,4 +146,5 @@ Reminder: sizeof() result is of type size_t. Thank you for your cooperation and attention. -By Randy Dunlap <rdunlap@xenotime.net> +By Randy Dunlap <rdunlap@xenotime.net> and +Andrew Murray <amurray@mpc-data.co.uk> diff --git a/Documentation/pti/pti_intel_mid.txt b/Documentation/pti/pti_intel_mid.txt new file mode 100644 index 000000000000..e7a5b6d1f7a9 --- /dev/null +++ b/Documentation/pti/pti_intel_mid.txt @@ -0,0 +1,99 @@ +The Intel MID PTI project is HW implemented in Intel Atom +system-on-a-chip designs based on the Parallel Trace +Interface for MIPI P1149.7 cJTAG standard. The kernel solution +for this platform involves the following files: + +./include/linux/pti.h +./drivers/.../n_tracesink.h +./drivers/.../n_tracerouter.c +./drivers/.../n_tracesink.c +./drivers/.../pti.c + +pti.c is the driver that enables various debugging features +popular on platforms from certain mobile manufacturers. +n_tracerouter.c and n_tracesink.c allow extra system information to +be collected and routed to the pti driver, such as trace +debugging data from a modem. Although n_tracerouter +and n_tracesink are a part of the complete PTI solution, +these two line disciplines can work separately from +pti.c and route any data stream from one /dev/tty node +to another /dev/tty node via kernel-space. This provides +a stable, reliable connection that will not break unless +the user-space application shuts down (plus avoids +kernel->user->kernel context switch overheads of routing +data). + +An example debugging usage for this driver system: + *Hook /dev/ttyPTI0 to syslogd. Opening this port will also start + a console device to further capture debugging messages to PTI. + *Hook /dev/ttyPTI1 to modem debugging data to write to PTI HW. + This is where n_tracerouter and n_tracesink are used. + *Hook /dev/pti to a user-level debugging application for writing + to PTI HW. + *Use mipi_* Kernel Driver API in other device drivers for + debugging to PTI by first requesting a PTI write address via + mipi_request_masterchannel(1). + +Below is example pseudo-code on how a 'privileged' application +can hook up n_tracerouter and n_tracesink to any tty on +a system. 'Privileged' means the application has enough +privileges to successfully manipulate the ldisc drivers +but is not just blindly executing as 'root'. Keep in mind +the use of ioctl(,TIOCSETD,) is not specific to the n_tracerouter +and n_tracesink line discpline drivers but is a generic +operation for a program to use a line discpline driver +on a tty port other than the default n_tty. + +/////////// To hook up n_tracerouter and n_tracesink ///////// + +// Note that n_tracerouter depends on n_tracesink. +#include <errno.h> +#define ONE_TTY "/dev/ttyOne" +#define TWO_TTY "/dev/ttyTwo" + +// needed global to hand onto ldisc connection +static int g_fd_source = -1; +static int g_fd_sink = -1; + +// these two vars used to grab LDISC values from loaded ldisc drivers +// in OS. Look at /proc/tty/ldiscs to get the right numbers from +// the ldiscs loaded in the system. +int source_ldisc_num, sink_ldisc_num = -1; +int retval; + +g_fd_source = open(ONE_TTY, O_RDWR); // must be R/W +g_fd_sink = open(TWO_TTY, O_RDWR); // must be R/W + +if (g_fd_source <= 0) || (g_fd_sink <= 0) { + // doubt you'll want to use these exact error lines of code + printf("Error on open(). errno: %d\n",errno); + return errno; +} + +retval = ioctl(g_fd_sink, TIOCSETD, &sink_ldisc_num); +if (retval < 0) { + printf("Error on ioctl(). errno: %d\n", errno); + return errno; +} + +retval = ioctl(g_fd_source, TIOCSETD, &source_ldisc_num); +if (retval < 0) { + printf("Error on ioctl(). errno: %d\n", errno); + return errno; +} + +/////////// To disconnect n_tracerouter and n_tracesink //////// + +// First make sure data through the ldiscs has stopped. + +// Second, disconnect ldiscs. This provides a +// little cleaner shutdown on tty stack. +sink_ldisc_num = 0; +source_ldisc_num = 0; +ioctl(g_fd_uart, TIOCSETD, &sink_ldisc_num); +ioctl(g_fd_gadget, TIOCSETD, &source_ldisc_num); + +// Three, program closes connection, and cleanup: +close(g_fd_uart); +close(g_fd_gadget); +g_fd_uart = g_fd_gadget = NULL; diff --git a/Documentation/ptp/ptp.txt b/Documentation/ptp/ptp.txt new file mode 100644 index 000000000000..ae8fef86b832 --- /dev/null +++ b/Documentation/ptp/ptp.txt @@ -0,0 +1,89 @@ + +* PTP hardware clock infrastructure for Linux + + This patch set introduces support for IEEE 1588 PTP clocks in + Linux. Together with the SO_TIMESTAMPING socket options, this + presents a standardized method for developing PTP user space + programs, synchronizing Linux with external clocks, and using the + ancillary features of PTP hardware clocks. + + A new class driver exports a kernel interface for specific clock + drivers and a user space interface. The infrastructure supports a + complete set of PTP hardware clock functionality. + + + Basic clock operations + - Set time + - Get time + - Shift the clock by a given offset atomically + - Adjust clock frequency + + + Ancillary clock features + - One short or periodic alarms, with signal delivery to user program + - Time stamp external events + - Period output signals configurable from user space + - Synchronization of the Linux system time via the PPS subsystem + +** PTP hardware clock kernel API + + A PTP clock driver registers itself with the class driver. The + class driver handles all of the dealings with user space. The + author of a clock driver need only implement the details of + programming the clock hardware. The clock driver notifies the class + driver of asynchronous events (alarms and external time stamps) via + a simple message passing interface. + + The class driver supports multiple PTP clock drivers. In normal use + cases, only one PTP clock is needed. However, for testing and + development, it can be useful to have more than one clock in a + single system, in order to allow performance comparisons. + +** PTP hardware clock user space API + + The class driver also creates a character device for each + registered clock. User space can use an open file descriptor from + the character device as a POSIX clock id and may call + clock_gettime, clock_settime, and clock_adjtime. These calls + implement the basic clock operations. + + User space programs may control the clock using standardized + ioctls. A program may query, enable, configure, and disable the + ancillary clock features. User space can receive time stamped + events via blocking read() and poll(). One shot and periodic + signals may be configured via the POSIX timer_settime() system + call. + +** Writing clock drivers + + Clock drivers include include/linux/ptp_clock_kernel.h and register + themselves by presenting a 'struct ptp_clock_info' to the + registration method. Clock drivers must implement all of the + functions in the interface. If a clock does not offer a particular + ancillary feature, then the driver should just return -EOPNOTSUPP + from those functions. + + Drivers must ensure that all of the methods in interface are + reentrant. Since most hardware implementations treat the time value + as a 64 bit integer accessed as two 32 bit registers, drivers + should use spin_lock_irqsave/spin_unlock_irqrestore to protect + against concurrent access. This locking cannot be accomplished in + class driver, since the lock may also be needed by the clock + driver's interrupt service routine. + +** Supported hardware + + + Freescale eTSEC gianfar + - 2 Time stamp external triggers, programmable polarity (opt. interrupt) + - 2 Alarm registers (optional interrupt) + - 3 Periodic signals (optional interrupt) + + + National DP83640 + - 6 GPIOs programmable as inputs or outputs + - 6 GPIOs with dedicated functions (LED/JTAG/clock) can also be + used as general inputs or outputs + - GPIO inputs can time stamp external triggers + - GPIO outputs can produce periodic signals + - 1 interrupt pin + + + Intel IXP465 + - Auxiliary Slave/Master Mode Snapshot (optional interrupt) + - Target Time (optional interrupt) diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c new file mode 100644 index 000000000000..f59ded066108 --- /dev/null +++ b/Documentation/ptp/testptp.c @@ -0,0 +1,381 @@ +/* + * PTP 1588 clock support - User space test program + * + * Copyright (C) 2010 OMICRON electronics GmbH + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ +#include <errno.h> +#include <fcntl.h> +#include <math.h> +#include <signal.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <sys/stat.h> +#include <sys/time.h> +#include <sys/timex.h> +#include <sys/types.h> +#include <time.h> +#include <unistd.h> + +#include <linux/ptp_clock.h> + +#define DEVICE "/dev/ptp0" + +#ifndef ADJ_SETOFFSET +#define ADJ_SETOFFSET 0x0100 +#endif + +#ifndef CLOCK_INVALID +#define CLOCK_INVALID -1 +#endif + +/* When glibc offers the syscall, this will go away. */ +#include <sys/syscall.h> +static int clock_adjtime(clockid_t id, struct timex *tx) +{ + return syscall(__NR_clock_adjtime, id, tx); +} + +static clockid_t get_clockid(int fd) +{ +#define CLOCKFD 3 +#define FD_TO_CLOCKID(fd) ((~(clockid_t) (fd) << 3) | CLOCKFD) + + return FD_TO_CLOCKID(fd); +} + +static void handle_alarm(int s) +{ + printf("received signal %d\n", s); +} + +static int install_handler(int signum, void (*handler)(int)) +{ + struct sigaction action; + sigset_t mask; + + /* Unblock the signal. */ + sigemptyset(&mask); + sigaddset(&mask, signum); + sigprocmask(SIG_UNBLOCK, &mask, NULL); + + /* Install the signal handler. */ + action.sa_handler = handler; + action.sa_flags = 0; + sigemptyset(&action.sa_mask); + sigaction(signum, &action, NULL); + + return 0; +} + +static long ppb_to_scaled_ppm(int ppb) +{ + /* + * The 'freq' field in the 'struct timex' is in parts per + * million, but with a 16 bit binary fractional field. + * Instead of calculating either one of + * + * scaled_ppm = (ppb / 1000) << 16 [1] + * scaled_ppm = (ppb << 16) / 1000 [2] + * + * we simply use double precision math, in order to avoid the + * truncation in [1] and the possible overflow in [2]. + */ + return (long) (ppb * 65.536); +} + +static void usage(char *progname) +{ + fprintf(stderr, + "usage: %s [options]\n" + " -a val request a one-shot alarm after 'val' seconds\n" + " -A val request a periodic alarm every 'val' seconds\n" + " -c query the ptp clock's capabilities\n" + " -d name device to open\n" + " -e val read 'val' external time stamp events\n" + " -f val adjust the ptp clock frequency by 'val' ppb\n" + " -g get the ptp clock time\n" + " -h prints this message\n" + " -p val enable output with a period of 'val' nanoseconds\n" + " -P val enable or disable (val=1|0) the system clock PPS\n" + " -s set the ptp clock time from the system time\n" + " -S set the system time from the ptp clock time\n" + " -t val shift the ptp clock time by 'val' seconds\n", + progname); +} + +int main(int argc, char *argv[]) +{ + struct ptp_clock_caps caps; + struct ptp_extts_event event; + struct ptp_extts_request extts_request; + struct ptp_perout_request perout_request; + struct timespec ts; + struct timex tx; + + static timer_t timerid; + struct itimerspec timeout; + struct sigevent sigevent; + + char *progname; + int c, cnt, fd; + + char *device = DEVICE; + clockid_t clkid; + int adjfreq = 0x7fffffff; + int adjtime = 0; + int capabilities = 0; + int extts = 0; + int gettime = 0; + int oneshot = 0; + int periodic = 0; + int perout = -1; + int pps = -1; + int settime = 0; + + progname = strrchr(argv[0], '/'); + progname = progname ? 1+progname : argv[0]; + while (EOF != (c = getopt(argc, argv, "a:A:cd:e:f:ghp:P:sSt:v"))) { + switch (c) { + case 'a': + oneshot = atoi(optarg); + break; + case 'A': + periodic = atoi(optarg); + break; + case 'c': + capabilities = 1; + break; + case 'd': + device = optarg; + break; + case 'e': + extts = atoi(optarg); + break; + case 'f': + adjfreq = atoi(optarg); + break; + case 'g': + gettime = 1; + break; + case 'p': + perout = atoi(optarg); + break; + case 'P': + pps = atoi(optarg); + break; + case 's': + settime = 1; + break; + case 'S': + settime = 2; + break; + case 't': + adjtime = atoi(optarg); + break; + case 'h': + usage(progname); + return 0; + case '?': + default: + usage(progname); + return -1; + } + } + + fd = open(device, O_RDWR); + if (fd < 0) { + fprintf(stderr, "opening %s: %s\n", device, strerror(errno)); + return -1; + } + + clkid = get_clockid(fd); + if (CLOCK_INVALID == clkid) { + fprintf(stderr, "failed to read clock id\n"); + return -1; + } + + if (capabilities) { + if (ioctl(fd, PTP_CLOCK_GETCAPS, &caps)) { + perror("PTP_CLOCK_GETCAPS"); + } else { + printf("capabilities:\n" + " %d maximum frequency adjustment (ppb)\n" + " %d programmable alarms\n" + " %d external time stamp channels\n" + " %d programmable periodic signals\n" + " %d pulse per second\n", + caps.max_adj, + caps.n_alarm, + caps.n_ext_ts, + caps.n_per_out, + caps.pps); + } + } + + if (0x7fffffff != adjfreq) { + memset(&tx, 0, sizeof(tx)); + tx.modes = ADJ_FREQUENCY; + tx.freq = ppb_to_scaled_ppm(adjfreq); + if (clock_adjtime(clkid, &tx)) { + perror("clock_adjtime"); + } else { + puts("frequency adjustment okay"); + } + } + + if (adjtime) { + memset(&tx, 0, sizeof(tx)); + tx.modes = ADJ_SETOFFSET; + tx.time.tv_sec = adjtime; + tx.time.tv_usec = 0; + if (clock_adjtime(clkid, &tx) < 0) { + perror("clock_adjtime"); + } else { + puts("time shift okay"); + } + } + + if (gettime) { + if (clock_gettime(clkid, &ts)) { + perror("clock_gettime"); + } else { + printf("clock time: %ld.%09ld or %s", + ts.tv_sec, ts.tv_nsec, ctime(&ts.tv_sec)); + } + } + + if (settime == 1) { + clock_gettime(CLOCK_REALTIME, &ts); + if (clock_settime(clkid, &ts)) { + perror("clock_settime"); + } else { + puts("set time okay"); + } + } + + if (settime == 2) { + clock_gettime(clkid, &ts); + if (clock_settime(CLOCK_REALTIME, &ts)) { + perror("clock_settime"); + } else { + puts("set time okay"); + } + } + + if (extts) { + memset(&extts_request, 0, sizeof(extts_request)); + extts_request.index = 0; + extts_request.flags = PTP_ENABLE_FEATURE; + if (ioctl(fd, PTP_EXTTS_REQUEST, &extts_request)) { + perror("PTP_EXTTS_REQUEST"); + extts = 0; + } else { + puts("external time stamp request okay"); + } + for (; extts; extts--) { + cnt = read(fd, &event, sizeof(event)); + if (cnt != sizeof(event)) { + perror("read"); + break; + } + printf("event index %u at %lld.%09u\n", event.index, + event.t.sec, event.t.nsec); + fflush(stdout); + } + /* Disable the feature again. */ + extts_request.flags = 0; + if (ioctl(fd, PTP_EXTTS_REQUEST, &extts_request)) { + perror("PTP_EXTTS_REQUEST"); + } + } + + if (oneshot) { + install_handler(SIGALRM, handle_alarm); + /* Create a timer. */ + sigevent.sigev_notify = SIGEV_SIGNAL; + sigevent.sigev_signo = SIGALRM; + if (timer_create(clkid, &sigevent, &timerid)) { + perror("timer_create"); + return -1; + } + /* Start the timer. */ + memset(&timeout, 0, sizeof(timeout)); + timeout.it_value.tv_sec = oneshot; + if (timer_settime(timerid, 0, &timeout, NULL)) { + perror("timer_settime"); + return -1; + } + pause(); + timer_delete(timerid); + } + + if (periodic) { + install_handler(SIGALRM, handle_alarm); + /* Create a timer. */ + sigevent.sigev_notify = SIGEV_SIGNAL; + sigevent.sigev_signo = SIGALRM; + if (timer_create(clkid, &sigevent, &timerid)) { + perror("timer_create"); + return -1; + } + /* Start the timer. */ + memset(&timeout, 0, sizeof(timeout)); + timeout.it_interval.tv_sec = periodic; + timeout.it_value.tv_sec = periodic; + if (timer_settime(timerid, 0, &timeout, NULL)) { + perror("timer_settime"); + return -1; + } + while (1) { + pause(); + } + timer_delete(timerid); + } + + if (perout >= 0) { + if (clock_gettime(clkid, &ts)) { + perror("clock_gettime"); + return -1; + } + memset(&perout_request, 0, sizeof(perout_request)); + perout_request.index = 0; + perout_request.start.sec = ts.tv_sec + 2; + perout_request.start.nsec = 0; + perout_request.period.sec = 0; + perout_request.period.nsec = perout; + if (ioctl(fd, PTP_PEROUT_REQUEST, &perout_request)) { + perror("PTP_PEROUT_REQUEST"); + } else { + puts("periodic output request okay"); + } + } + + if (pps != -1) { + int enable = pps ? 1 : 0; + if (ioctl(fd, PTP_ENABLE_PPS, enable)) { + perror("PTP_ENABLE_PPS"); + } else { + puts("pps for system time request okay"); + } + } + + close(fd); + return 0; +} diff --git a/Documentation/ptp/testptp.mk b/Documentation/ptp/testptp.mk new file mode 100644 index 000000000000..4ef2d9755421 --- /dev/null +++ b/Documentation/ptp/testptp.mk @@ -0,0 +1,33 @@ +# PTP 1588 clock support - User space test program +# +# Copyright (C) 2010 OMICRON electronics GmbH +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + +CC = $(CROSS_COMPILE)gcc +INC = -I$(KBUILD_OUTPUT)/usr/include +CFLAGS = -Wall $(INC) +LDLIBS = -lrt +PROGS = testptp + +all: $(PROGS) + +testptp: testptp.o + +clean: + rm -f testptp.o + +distclean: clean + rm -f $(PROGS) diff --git a/Documentation/scheduler/sched-design-CFS.txt b/Documentation/scheduler/sched-design-CFS.txt index 99961993257a..91ecff07cede 100644 --- a/Documentation/scheduler/sched-design-CFS.txt +++ b/Documentation/scheduler/sched-design-CFS.txt @@ -223,9 +223,10 @@ When CONFIG_FAIR_GROUP_SCHED is defined, a "cpu.shares" file is created for each group created using the pseudo filesystem. See example steps below to create task groups and modify their CPU share using the "cgroups" pseudo filesystem. - # mkdir /dev/cpuctl - # mount -t cgroup -ocpu none /dev/cpuctl - # cd /dev/cpuctl + # mount -t tmpfs cgroup_root /sys/fs/cgroup + # mkdir /sys/fs/cgroup/cpu + # mount -t cgroup -ocpu none /sys/fs/cgroup/cpu + # cd /sys/fs/cgroup/cpu # mkdir multimedia # create "multimedia" group of tasks # mkdir browser # create "browser" group of tasks diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt index 605b0d40329d..71b54d549987 100644 --- a/Documentation/scheduler/sched-rt-group.txt +++ b/Documentation/scheduler/sched-rt-group.txt @@ -129,9 +129,8 @@ priority! Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real CPU bandwidth to task groups. -This uses the /cgroup virtual file system and -"/cgroup/<cgroup>/cpu.rt_runtime_us" to control the CPU time reserved for each -control group. +This uses the cgroup virtual file system and "<cgroup>/cpu.rt_runtime_us" +to control the CPU time reserved for each control group. For more information on working with control groups, you should read Documentation/cgroups/cgroups.txt as well. @@ -150,7 +149,7 @@ For now, this can be simplified to just the following (but see Future plans): =============== There is work in progress to make the scheduling period for each group -("/cgroup/<cgroup>/cpu.rt_period_us") configurable as well. +("<cgroup>/cpu.rt_period_us") configurable as well. The constraint on the period is that a subgroup must have a smaller or equal period to its parent. But realistically its not very useful _yet_ diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas index 4d9ce73ff730..9ed1d9d96783 100644 --- a/Documentation/scsi/ChangeLog.megaraid_sas +++ b/Documentation/scsi/ChangeLog.megaraid_sas @@ -1,3 +1,17 @@ +Release Date : Wed. May 11, 2011 17:00:00 PST 2010 - + (emaild-id:megaraidlinux@lsi.com) + Adam Radford +Current Version : 00.00.05.38-rc1 +Old Version : 00.00.05.34-rc1 + 1. Remove MSI-X black list, use MFI_REG_STATE.ready.msiEnable. + 2. Remove un-used function megasas_return_cmd_for_smid(). + 3. Check MFI_REG_STATE.fault.resetAdapter in megasas_reset_fusion(). + 4. Disable interrupts/free_irq() in megasas_shutdown(). + 5. Fix bug where AENs could be lost in probe() and resume(). + 6. Convert 6,10,12 byte CDB's to 16 byte CDB for large LBA's for FastPath + IO. + 7. Add 1078 OCR support. +------------------------------------------------------------------------------- Release Date : Thu. Feb 24, 2011 17:00:00 PST 2010 - (emaild-id:megaraidlinux@lsi.com) Adam Radford diff --git a/Documentation/security/00-INDEX b/Documentation/security/00-INDEX new file mode 100644 index 000000000000..19bc49439cac --- /dev/null +++ b/Documentation/security/00-INDEX @@ -0,0 +1,18 @@ +00-INDEX + - this file. +SELinux.txt + - how to get started with the SELinux security enhancement. +Smack.txt + - documentation on the Smack Linux Security Module. +apparmor.txt + - documentation on the AppArmor security extension. +credentials.txt + - documentation about credentials in Linux. +keys-request-key.txt + - description of the kernel key request service. +keys-trusted-encrypted.txt + - info on the Trusted and Encrypted keys in the kernel key ring service. +keys.txt + - description of the kernel key retention service. +tomoyo.txt + - documentation on the TOMOYO Linux Security Module. diff --git a/Documentation/SELinux.txt b/Documentation/security/SELinux.txt index 07eae00f3314..07eae00f3314 100644 --- a/Documentation/SELinux.txt +++ b/Documentation/security/SELinux.txt diff --git a/Documentation/Smack.txt b/Documentation/security/Smack.txt index e9dab41c0fe0..e9dab41c0fe0 100644 --- a/Documentation/Smack.txt +++ b/Documentation/security/Smack.txt diff --git a/Documentation/apparmor.txt b/Documentation/security/apparmor.txt index 93c1fd7d0635..93c1fd7d0635 100644 --- a/Documentation/apparmor.txt +++ b/Documentation/security/apparmor.txt diff --git a/Documentation/credentials.txt b/Documentation/security/credentials.txt index 995baf379c07..fc0366cbd7ce 100644 --- a/Documentation/credentials.txt +++ b/Documentation/security/credentials.txt @@ -216,7 +216,7 @@ The Linux kernel supports the following types of credentials: When a process accesses a key, if not already present, it will normally be cached on one of these keyrings for future accesses to find. - For more information on using keys, see Documentation/keys.txt. + For more information on using keys, see Documentation/security/keys.txt. (5) LSM diff --git a/Documentation/keys-request-key.txt b/Documentation/security/keys-request-key.txt index 69686ad12c66..51987bfecfed 100644 --- a/Documentation/keys-request-key.txt +++ b/Documentation/security/keys-request-key.txt @@ -3,8 +3,8 @@ =================== The key request service is part of the key retention service (refer to -Documentation/keys.txt). This document explains more fully how the requesting -algorithm works. +Documentation/security/keys.txt). This document explains more fully how +the requesting algorithm works. The process starts by either the kernel requesting a service by calling request_key*(): diff --git a/Documentation/keys-trusted-encrypted.txt b/Documentation/security/keys-trusted-encrypted.txt index 8fb79bc1ac4b..8fb79bc1ac4b 100644 --- a/Documentation/keys-trusted-encrypted.txt +++ b/Documentation/security/keys-trusted-encrypted.txt diff --git a/Documentation/keys.txt b/Documentation/security/keys.txt index 6523a9e6f293..4d75931d2d79 100644 --- a/Documentation/keys.txt +++ b/Documentation/security/keys.txt @@ -434,7 +434,7 @@ The main syscalls are: /sbin/request-key will be invoked in an attempt to obtain a key. The callout_info string will be passed as an argument to the program. - See also Documentation/keys-request-key.txt. + See also Documentation/security/keys-request-key.txt. The keyctl syscall functions are: @@ -864,7 +864,7 @@ payload contents" for more information. If successful, the key will have been attached to the default keyring for implicitly obtained request-key keys, as set by KEYCTL_SET_REQKEY_KEYRING. - See also Documentation/keys-request-key.txt. + See also Documentation/security/keys-request-key.txt. (*) To search for a key, passing auxiliary data to the upcaller, call: diff --git a/Documentation/tomoyo.txt b/Documentation/security/tomoyo.txt index 200a2d37cbc8..200a2d37cbc8 100644 --- a/Documentation/tomoyo.txt +++ b/Documentation/security/tomoyo.txt diff --git a/Documentation/spinlocks.txt b/Documentation/spinlocks.txt index 2e3c64b1a6a5..9dbe885ecd8d 100644 --- a/Documentation/spinlocks.txt +++ b/Documentation/spinlocks.txt @@ -13,18 +13,8 @@ static DEFINE_SPINLOCK(xxx_lock); The above is always safe. It will disable interrupts _locally_, but the spinlock itself will guarantee the global lock, so it will guarantee that there is only one thread-of-control within the region(s) protected by that -lock. This works well even under UP. The above sequence under UP -essentially is just the same as doing - - unsigned long flags; - - save_flags(flags); cli(); - ... critical section ... - restore_flags(flags); - -so the code does _not_ need to worry about UP vs SMP issues: the spinlocks -work correctly under both (and spinlocks are actually more efficient on -architectures that allow doing the "save_flags + cli" in one operation). +lock. This works well even under UP also, so the code does _not_ need to +worry about UP vs SMP issues: the spinlocks work correctly under both. NOTE! Implications of spin_locks for memory are further described in: @@ -36,27 +26,7 @@ The above is usually pretty simple (you usually need and want only one spinlock for most things - using more than one spinlock can make things a lot more complex and even slower and is usually worth it only for sequences that you _know_ need to be split up: avoid it at all cost if you -aren't sure). HOWEVER, it _does_ mean that if you have some code that does - - cli(); - .. critical section .. - sti(); - -and another sequence that does - - spin_lock_irqsave(flags); - .. critical section .. - spin_unlock_irqrestore(flags); - -then they are NOT mutually exclusive, and the critical regions can happen -at the same time on two different CPU's. That's fine per se, but the -critical regions had better be critical for different things (ie they -can't stomp on each other). - -The above is a problem mainly if you end up mixing code - for example the -routines in ll_rw_block() tend to use cli/sti to protect the atomicity of -their actions, and if a driver uses spinlocks instead then you should -think about issues like the above. +aren't sure). This is really the only really hard part about spinlocks: once you start using spinlocks they tend to expand to areas you might not have noticed @@ -120,11 +90,10 @@ Lesson 3: spinlocks revisited. The single spin-lock primitives above are by no means the only ones. They are the most safe ones, and the ones that work under all circumstances, -but partly _because_ they are safe they are also fairly slow. They are -much faster than a generic global cli/sti pair, but slower than they'd -need to be, because they do have to disable interrupts (which is just a -single instruction on a x86, but it's an expensive one - and on other -architectures it can be worse). +but partly _because_ they are safe they are also fairly slow. They are slower +than they'd need to be, because they do have to disable interrupts +(which is just a single instruction on a x86, but it's an expensive one - +and on other architectures it can be worse). If you have a case where you have to protect a data structure across several CPU's and you want to use spinlocks you can potentially use diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 4af0614147ef..88fd7f5c8dcd 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -231,13 +231,6 @@ its creation). This directory contains configuration options for the epoll(7) interface. -max_user_instances ------------------- - -This is the maximum number of epoll file descriptors that a single user can -have open at a given time. The default value is 128, and should be enough -for normal users. - max_user_watches ---------------- diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 36f007514db3..5e7cb39ad195 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -161,7 +161,8 @@ core_pattern is used to specify a core dumpfile pattern name. %s signal number %t UNIX time of dump %h hostname - %e executable filename + %e executable filename (may be shortened) + %E executable path %<OTHER> both are dropped . If the first character of the pattern is a '|', the kernel will treat the rest of the pattern as a command to run. The core dump will be diff --git a/Documentation/usb/callbacks.txt b/Documentation/usb/callbacks.txt index bfb36b34b79e..9e85846bdb98 100644 --- a/Documentation/usb/callbacks.txt +++ b/Documentation/usb/callbacks.txt @@ -95,9 +95,11 @@ pre_reset int (*pre_reset)(struct usb_interface *intf); -Another driver or user space is triggering a reset on the device which -contains the interface passed as an argument. Cease IO and save any -device state you need to restore. +A driver or user space is triggering a reset on the device which +contains the interface passed as an argument. Cease IO, wait for all +outstanding URBs to complete, and save any device state you need to +restore. No more URBs may be submitted until the post_reset method +is called. If you need to allocate memory here, use GFP_NOIO or GFP_ATOMIC, if you are in atomic context. diff --git a/Documentation/usb/error-codes.txt b/Documentation/usb/error-codes.txt index d83703ea74b2..b3f606b81a03 100644 --- a/Documentation/usb/error-codes.txt +++ b/Documentation/usb/error-codes.txt @@ -76,6 +76,13 @@ A transfer's actual_length may be positive even when an error has been reported. That's because transfers often involve several packets, so that one or more packets could finish before an error stops further endpoint I/O. +For isochronous URBs, the urb status value is non-zero only if the URB is +unlinked, the device is removed, the host controller is disabled, or the total +transferred length is less than the requested length and the URB_SHORT_NOT_OK +flag is set. Completion handlers for isochronous URBs should only see +urb->status set to zero, -ENOENT, -ECONNRESET, -ESHUTDOWN, or -EREMOTEIO. +Individual frame descriptor status fields may report more status codes. + 0 Transfer completed successfully @@ -132,7 +139,7 @@ one or more packets could finish before an error stops further endpoint I/O. device removal events immediately. -EXDEV ISO transfer only partially completed - look at individual frame status for details + (only set in iso_frame_desc[n].status, not urb->status) -EINVAL ISO madness, if this happens: Log off and go home diff --git a/Documentation/usb/linux-cdc-acm.inf b/Documentation/usb/linux-cdc-acm.inf index 612e7220fb29..37a02ce54841 100644 --- a/Documentation/usb/linux-cdc-acm.inf +++ b/Documentation/usb/linux-cdc-acm.inf @@ -90,10 +90,10 @@ ServiceBinary=%12%\USBSER.sys [SourceDisksFiles] [SourceDisksNames] [DeviceList] -%DESCRIPTION%=DriverInstall, USB\VID_0525&PID_A4A7, USB\VID_0525&PID_A4AB&MI_02 +%DESCRIPTION%=DriverInstall, USB\VID_0525&PID_A4A7, USB\VID_1D6B&PID_0104&MI_02 [DeviceList.NTamd64] -%DESCRIPTION%=DriverInstall, USB\VID_0525&PID_A4A7, USB\VID_0525&PID_A4AB&MI_02 +%DESCRIPTION%=DriverInstall, USB\VID_0525&PID_A4A7, USB\VID_1D6B&PID_0104&MI_02 ;------------------------------------------------------------------------------ diff --git a/Documentation/usb/linux.inf b/Documentation/usb/linux.inf index 4dee95851224..4ffa715b0ae8 100644 --- a/Documentation/usb/linux.inf +++ b/Documentation/usb/linux.inf @@ -18,15 +18,15 @@ DriverVer = 06/21/2006,6.0.6000.16384 ; Decoration for x86 architecture [LinuxDevices.NTx86] -%LinuxDevice% = RNDIS.NT.5.1, USB\VID_0525&PID_a4a2, USB\VID_0525&PID_a4ab&MI_00 +%LinuxDevice% = RNDIS.NT.5.1, USB\VID_0525&PID_a4a2, USB\VID_1d6b&PID_0104&MI_00 ; Decoration for x64 architecture [LinuxDevices.NTamd64] -%LinuxDevice% = RNDIS.NT.5.1, USB\VID_0525&PID_a4a2, USB\VID_0525&PID_a4ab&MI_00 +%LinuxDevice% = RNDIS.NT.5.1, USB\VID_0525&PID_a4a2, USB\VID_1d6b&PID_0104&MI_00 ; Decoration for ia64 architecture [LinuxDevices.NTia64] -%LinuxDevice% = RNDIS.NT.5.1, USB\VID_0525&PID_a4a2, USB\VID_0525&PID_a4ab&MI_00 +%LinuxDevice% = RNDIS.NT.5.1, USB\VID_0525&PID_a4a2, USB\VID_1d6b&PID_0104&MI_00 ;@@@ This is the common setting for setup [ControlFlags] diff --git a/Documentation/vgaarbiter.txt b/Documentation/vgaarbiter.txt index 43a9b0694fdd..b7d401e0eae9 100644 --- a/Documentation/vgaarbiter.txt +++ b/Documentation/vgaarbiter.txt @@ -14,11 +14,10 @@ the legacy VGA arbitration task (besides other bus management tasks) when more than one legacy device co-exists on the same machine. But the problem happens when these devices are trying to be accessed by different userspace clients (e.g. two server in parallel). Their address assignments conflict. Moreover, -ideally, being an userspace application, it is not the role of the the X -server to control bus resources. Therefore an arbitration scheme outside of -the X server is needed to control the sharing of these resources. This -document introduces the operation of the VGA arbiter implemented for Linux -kernel. +ideally, being a userspace application, it is not the role of the X server to +control bus resources. Therefore an arbitration scheme outside of the X server +is needed to control the sharing of these resources. This document introduces +the operation of the VGA arbiter implemented for the Linux kernel. ---------------------------------------------------------------------------- @@ -39,7 +38,7 @@ I.1 vgaarb The vgaarb is a module of the Linux Kernel. When it is initially loaded, it scans all PCI devices and adds the VGA ones inside the arbitration. The arbiter then enables/disables the decoding on different devices of the VGA -legacy instructions. Device which do not want/need to use the arbiter may +legacy instructions. Devices which do not want/need to use the arbiter may explicitly tell it by calling vga_set_legacy_decoding(). The kernel exports a char device interface (/dev/vga_arbiter) to the clients, @@ -95,8 +94,8 @@ In the case of devices hot-{un,}plugged, there is a hook - pci_notify() - to notify them being added/removed in the system and automatically added/removed in the arbiter. -There's also a in-kernel API of the arbiter in the case of DRM, vgacon and -others which may use the arbiter. +There is also an in-kernel API of the arbiter in case DRM, vgacon, or other +drivers want to use it. I.2 libpciaccess @@ -117,9 +116,8 @@ Besides it, in pci_system were added: struct pci_device *vga_default_dev; -The vga_count is usually need to keep informed how many cards are being -arbitrated, so for instance if there's only one then it can totally escape the -scheme. +The vga_count is used to track how many cards are being arbitrated, so for +instance, if there is only one card, then it can completely escape arbitration. These functions below acquire VGA resources for the given card and mark those diff --git a/Documentation/video4linux/CARDLIST.em28xx b/Documentation/video4linux/CARDLIST.em28xx index 31b485723bc5..9aae449440dc 100644 --- a/Documentation/video4linux/CARDLIST.em28xx +++ b/Documentation/video4linux/CARDLIST.em28xx @@ -54,7 +54,7 @@ 53 -> Pinnacle Hybrid Pro (em2881) 54 -> Kworld VS-DVB-T 323UR (em2882) [eb1a:e323] 55 -> Terratec Cinnergy Hybrid T USB XS (em2882) (em2882) [0ccd:005e,0ccd:0042] - 56 -> Pinnacle Hybrid Pro (2) (em2882) [2304:0226] + 56 -> Pinnacle Hybrid Pro (330e) (em2882) [2304:0226] 57 -> Kworld PlusTV HD Hybrid 330 (em2883) [eb1a:a316] 58 -> Compro VideoMate ForYou/Stereo (em2820/em2840) [185b:2041] 60 -> Hauppauge WinTV HVR 850 (em2883) [2040:651f] diff --git a/Documentation/video4linux/Zoran b/Documentation/video4linux/Zoran index c40e3bab08fa..9ed629d4874b 100644 --- a/Documentation/video4linux/Zoran +++ b/Documentation/video4linux/Zoran @@ -130,7 +130,6 @@ Card number: 4 Note: No module for the mse3000 is available yet Note: No module for the vpx3224 is available yet -Note: use encoder=X or decoder=X for non-default i2c chips =========================== diff --git a/Documentation/video4linux/gspca.txt b/Documentation/video4linux/gspca.txt index 5c542e60f51d..5bfa9a777d26 100644 --- a/Documentation/video4linux/gspca.txt +++ b/Documentation/video4linux/gspca.txt @@ -275,6 +275,7 @@ pac7302 093a:2629 Genious iSlim 300 pac7302 093a:262a Webcam 300k pac7302 093a:262c Philips SPC 230 NC jeilinj 0979:0280 Sakar 57379 +jeilinj 0979:0280 Sportscam DV15 zc3xx 0ac8:0302 Z-star Vimicro zc0302 vc032x 0ac8:0321 Vimicro generic vc0321 vc032x 0ac8:0323 Vimicro Vc0323 diff --git a/Documentation/video4linux/uvcvideo.txt b/Documentation/video4linux/uvcvideo.txt new file mode 100644 index 000000000000..848d620dcc5c --- /dev/null +++ b/Documentation/video4linux/uvcvideo.txt @@ -0,0 +1,239 @@ +Linux USB Video Class (UVC) driver +================================== + +This file documents some driver-specific aspects of the UVC driver, such as +driver-specific ioctls and implementation notes. + +Questions and remarks can be sent to the Linux UVC development mailing list at +linux-uvc-devel@lists.berlios.de. + + +Extension Unit (XU) support +--------------------------- + +1. Introduction + +The UVC specification allows for vendor-specific extensions through extension +units (XUs). The Linux UVC driver supports extension unit controls (XU controls) +through two separate mechanisms: + + - through mappings of XU controls to V4L2 controls + - through a driver-specific ioctl interface + +The first one allows generic V4L2 applications to use XU controls by mapping +certain XU controls onto V4L2 controls, which then show up during ordinary +control enumeration. + +The second mechanism requires uvcvideo-specific knowledge for the application to +access XU controls but exposes the entire UVC XU concept to user space for +maximum flexibility. + +Both mechanisms complement each other and are described in more detail below. + + +2. Control mappings + +The UVC driver provides an API for user space applications to define so-called +control mappings at runtime. These allow for individual XU controls or byte +ranges thereof to be mapped to new V4L2 controls. Such controls appear and +function exactly like normal V4L2 controls (i.e. the stock controls, such as +brightness, contrast, etc.). However, reading or writing of such a V4L2 controls +triggers a read or write of the associated XU control. + +The ioctl used to create these control mappings is called UVCIOC_CTRL_MAP. +Previous driver versions (before 0.2.0) required another ioctl to be used +beforehand (UVCIOC_CTRL_ADD) to pass XU control information to the UVC driver. +This is no longer necessary as newer uvcvideo versions query the information +directly from the device. + +For details on the UVCIOC_CTRL_MAP ioctl please refer to the section titled +"IOCTL reference" below. + + +3. Driver specific XU control interface + +For applications that need to access XU controls directly, e.g. for testing +purposes, firmware upload, or accessing binary controls, a second mechanism to +access XU controls is provided in the form of a driver-specific ioctl, namely +UVCIOC_CTRL_QUERY. + +A call to this ioctl allows applications to send queries to the UVC driver that +directly map to the low-level UVC control requests. + +In order to make such a request the UVC unit ID of the control's extension unit +and the control selector need to be known. This information either needs to be +hardcoded in the application or queried using other ways such as by parsing the +UVC descriptor or, if available, using the media controller API to enumerate a +device's entities. + +Unless the control size is already known it is necessary to first make a +UVC_GET_LEN requests in order to be able to allocate a sufficiently large buffer +and set the buffer size to the correct value. Similarly, to find out whether +UVC_GET_CUR or UVC_SET_CUR are valid requests for a given control, a +UVC_GET_INFO request should be made. The bits 0 (GET supported) and 1 (SET +supported) of the resulting byte indicate which requests are valid. + +With the addition of the UVCIOC_CTRL_QUERY ioctl the UVCIOC_CTRL_GET and +UVCIOC_CTRL_SET ioctls have become obsolete since their functionality is a +subset of the former ioctl. For the time being they are still supported but +application developers are encouraged to use UVCIOC_CTRL_QUERY instead. + +For details on the UVCIOC_CTRL_QUERY ioctl please refer to the section titled +"IOCTL reference" below. + + +4. Security + +The API doesn't currently provide a fine-grained access control facility. The +UVCIOC_CTRL_ADD and UVCIOC_CTRL_MAP ioctls require super user permissions. + +Suggestions on how to improve this are welcome. + + +5. Debugging + +In order to debug problems related to XU controls or controls in general it is +recommended to enable the UVC_TRACE_CONTROL bit in the module parameter 'trace'. +This causes extra output to be written into the system log. + + +6. IOCTL reference + +---- UVCIOC_CTRL_MAP - Map a UVC control to a V4L2 control ---- + +Argument: struct uvc_xu_control_mapping + +Description: + This ioctl creates a mapping between a UVC control or part of a UVC + control and a V4L2 control. Once mappings are defined, userspace + applications can access vendor-defined UVC control through the V4L2 + control API. + + To create a mapping, applications fill the uvc_xu_control_mapping + structure with information about an existing UVC control defined with + UVCIOC_CTRL_ADD and a new V4L2 control. + + A UVC control can be mapped to several V4L2 controls. For instance, + a UVC pan/tilt control could be mapped to separate pan and tilt V4L2 + controls. The UVC control is divided into non overlapping fields using + the 'size' and 'offset' fields and are then independantly mapped to + V4L2 control. + + For signed integer V4L2 controls the data_type field should be set to + UVC_CTRL_DATA_TYPE_SIGNED. Other values are currently ignored. + +Return value: + On success 0 is returned. On error -1 is returned and errno is set + appropriately. + + ENOMEM + Not enough memory to perform the operation. + EPERM + Insufficient privileges (super user privileges are required). + EINVAL + No such UVC control. + EOVERFLOW + The requested offset and size would overflow the UVC control. + EEXIST + Mapping already exists. + +Data types: + * struct uvc_xu_control_mapping + + __u32 id V4L2 control identifier + __u8 name[32] V4L2 control name + __u8 entity[16] UVC extension unit GUID + __u8 selector UVC control selector + __u8 size V4L2 control size (in bits) + __u8 offset V4L2 control offset (in bits) + enum v4l2_ctrl_type + v4l2_type V4L2 control type + enum uvc_control_data_type + data_type UVC control data type + struct uvc_menu_info + *menu_info Array of menu entries (for menu controls only) + __u32 menu_count Number of menu entries (for menu controls only) + + * struct uvc_menu_info + + __u32 value Menu entry value used by the device + __u8 name[32] Menu entry name + + + * enum uvc_control_data_type + + UVC_CTRL_DATA_TYPE_RAW Raw control (byte array) + UVC_CTRL_DATA_TYPE_SIGNED Signed integer + UVC_CTRL_DATA_TYPE_UNSIGNED Unsigned integer + UVC_CTRL_DATA_TYPE_BOOLEAN Boolean + UVC_CTRL_DATA_TYPE_ENUM Enumeration + UVC_CTRL_DATA_TYPE_BITMASK Bitmask + + +---- UVCIOC_CTRL_QUERY - Query a UVC XU control ---- + +Argument: struct uvc_xu_control_query + +Description: + This ioctl queries a UVC XU control identified by its extension unit ID + and control selector. + + There are a number of different queries available that closely + correspond to the low-level control requests described in the UVC + specification. These requests are: + + UVC_GET_CUR + Obtain the current value of the control. + UVC_GET_MIN + Obtain the minimum value of the control. + UVC_GET_MAX + Obtain the maximum value of the control. + UVC_GET_DEF + Obtain the default value of the control. + UVC_GET_RES + Query the resolution of the control, i.e. the step size of the + allowed control values. + UVC_GET_LEN + Query the size of the control in bytes. + UVC_GET_INFO + Query the control information bitmap, which indicates whether + get/set requests are supported. + UVC_SET_CUR + Update the value of the control. + + Applications must set the 'size' field to the correct length for the + control. Exceptions are the UVC_GET_LEN and UVC_GET_INFO queries, for + which the size must be set to 2 and 1, respectively. The 'data' field + must point to a valid writable buffer big enough to hold the indicated + number of data bytes. + + Data is copied directly from the device without any driver-side + processing. Applications are responsible for data buffer formatting, + including little-endian/big-endian conversion. This is particularly + important for the result of the UVC_GET_LEN requests, which is always + returned as a little-endian 16-bit integer by the device. + +Return value: + On success 0 is returned. On error -1 is returned and errno is set + appropriately. + + ENOENT + The device does not support the given control or the specified + extension unit could not be found. + ENOBUFS + The specified buffer size is incorrect (too big or too small). + EINVAL + An invalid request code was passed. + EBADRQC + The given request is not supported by the given control. + EFAULT + The data pointer references an inaccessible memory area. + +Data types: + * struct uvc_xu_control_query + + __u8 unit Extension unit ID + __u8 selector Control selector + __u8 query Request code to send to the device + __u16 size Control data size (in bytes) + __u8 *data Control value diff --git a/Documentation/virtual/lguest/Makefile b/Documentation/virtual/lguest/Makefile index bebac6b4f332..0ac34206f7a7 100644 --- a/Documentation/virtual/lguest/Makefile +++ b/Documentation/virtual/lguest/Makefile @@ -1,5 +1,5 @@ # This creates the demonstration utility "lguest" which runs a Linux guest. -# Missing headers? Add "-I../../include -I../../arch/x86/include" +# Missing headers? Add "-I../../../include -I../../../arch/x86/include" CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -U_FORTIFY_SOURCE all: lguest diff --git a/Documentation/virtual/lguest/lguest.c b/Documentation/virtual/lguest/lguest.c index d9da7e148538..cd9d6af61d07 100644 --- a/Documentation/virtual/lguest/lguest.c +++ b/Documentation/virtual/lguest/lguest.c @@ -49,7 +49,7 @@ #include <linux/virtio_rng.h> #include <linux/virtio_ring.h> #include <asm/bootparam.h> -#include "../../include/linux/lguest_launcher.h" +#include "../../../include/linux/lguest_launcher.h" /*L:110 * We can ignore the 42 include files we need for this program, but I do want * to draw attention to the use of kernel-style types. @@ -135,9 +135,6 @@ struct device { /* Is it operational */ bool running; - /* Does Guest want an intrrupt on empty? */ - bool irq_on_empty; - /* Device-specific data. */ void *priv; }; @@ -637,10 +634,7 @@ static void trigger_irq(struct virtqueue *vq) /* If they don't want an interrupt, don't send one... */ if (vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT) { - /* ... unless they've asked us to force one on empty. */ - if (!vq->dev->irq_on_empty - || lg_last_avail(vq) != vq->vring.avail->idx) - return; + return; } /* Send the Guest an interrupt tell them we used something up. */ @@ -1057,15 +1051,6 @@ static void create_thread(struct virtqueue *vq) close(vq->eventfd); } -static bool accepted_feature(struct device *dev, unsigned int bit) -{ - const u8 *features = get_feature_bits(dev) + dev->feature_len; - - if (dev->feature_len < bit / CHAR_BIT) - return false; - return features[bit / CHAR_BIT] & (1 << (bit % CHAR_BIT)); -} - static void start_device(struct device *dev) { unsigned int i; @@ -1079,8 +1064,6 @@ static void start_device(struct device *dev) verbose(" %02x", get_feature_bits(dev) [dev->feature_len+i]); - dev->irq_on_empty = accepted_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY); - for (vq = dev->vq; vq; vq = vq->next) { if (vq->service) create_thread(vq); @@ -1564,7 +1547,6 @@ static void setup_tun_net(char *arg) /* Set up the tun device. */ configure_device(ipfd, tapif, ip); - add_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY); /* Expect Guest to handle everything except UFO */ add_feature(dev, VIRTIO_NET_F_CSUM); add_feature(dev, VIRTIO_NET_F_GUEST_CSUM); diff --git a/Documentation/virtual/uml/UserModeLinux-HOWTO.txt b/Documentation/virtual/uml/UserModeLinux-HOWTO.txt index 9b7e1904db1c..5d0fc8bfcdb9 100644 --- a/Documentation/virtual/uml/UserModeLinux-HOWTO.txt +++ b/Documentation/virtual/uml/UserModeLinux-HOWTO.txt @@ -1182,6 +1182,16 @@ forge.net/> and explains these in detail, as well as some other issues. + There is also a related point-to-point only "ucast" transport. + This is useful when your network does not support multicast, and + all network connections are simple point to point links. + + The full set of command line options for this transport are + + + ethn=ucast,ethernet address,remote address,listen port,remote port + + 66..66.. TTUUNN//TTAAPP wwiitthh tthhee uummll__nneett hheellppeerr diff --git a/Documentation/vm/cleancache.txt b/Documentation/vm/cleancache.txt new file mode 100644 index 000000000000..36c367c73084 --- /dev/null +++ b/Documentation/vm/cleancache.txt @@ -0,0 +1,278 @@ +MOTIVATION + +Cleancache is a new optional feature provided by the VFS layer that +potentially dramatically increases page cache effectiveness for +many workloads in many environments at a negligible cost. + +Cleancache can be thought of as a page-granularity victim cache for clean +pages that the kernel's pageframe replacement algorithm (PFRA) would like +to keep around, but can't since there isn't enough memory. So when the +PFRA "evicts" a page, it first attempts to use cleancache code to +put the data contained in that page into "transcendent memory", memory +that is not directly accessible or addressable by the kernel and is +of unknown and possibly time-varying size. + +Later, when a cleancache-enabled filesystem wishes to access a page +in a file on disk, it first checks cleancache to see if it already +contains it; if it does, the page of data is copied into the kernel +and a disk access is avoided. + +Transcendent memory "drivers" for cleancache are currently implemented +in Xen (using hypervisor memory) and zcache (using in-kernel compressed +memory) and other implementations are in development. + +FAQs are included below. + +IMPLEMENTATION OVERVIEW + +A cleancache "backend" that provides transcendent memory registers itself +to the kernel's cleancache "frontend" by calling cleancache_register_ops, +passing a pointer to a cleancache_ops structure with funcs set appropriately. +Note that cleancache_register_ops returns the previous settings so that +chaining can be performed if desired. The functions provided must conform to +certain semantics as follows: + +Most important, cleancache is "ephemeral". Pages which are copied into +cleancache have an indefinite lifetime which is completely unknowable +by the kernel and so may or may not still be in cleancache at any later time. +Thus, as its name implies, cleancache is not suitable for dirty pages. +Cleancache has complete discretion over what pages to preserve and what +pages to discard and when. + +Mounting a cleancache-enabled filesystem should call "init_fs" to obtain a +pool id which, if positive, must be saved in the filesystem's superblock; +a negative return value indicates failure. A "put_page" will copy a +(presumably about-to-be-evicted) page into cleancache and associate it with +the pool id, a file key, and a page index into the file. (The combination +of a pool id, a file key, and an index is sometimes called a "handle".) +A "get_page" will copy the page, if found, from cleancache into kernel memory. +A "flush_page" will ensure the page no longer is present in cleancache; +a "flush_inode" will flush all pages associated with the specified file; +and, when a filesystem is unmounted, a "flush_fs" will flush all pages in +all files specified by the given pool id and also surrender the pool id. + +An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache +to treat the pool as shared using a 128-bit UUID as a key. On systems +that may run multiple kernels (such as hard partitioned or virtualized +systems) that may share a clustered filesystem, and where cleancache +may be shared among those kernels, calls to init_shared_fs that specify the +same UUID will receive the same pool id, thus allowing the pages to +be shared. Note that any security requirements must be imposed outside +of the kernel (e.g. by "tools" that control cleancache). Or a +cleancache implementation can simply disable shared_init by always +returning a negative value. + +If a get_page is successful on a non-shared pool, the page is flushed (thus +making cleancache an "exclusive" cache). On a shared pool, the page +is NOT flushed on a successful get_page so that it remains accessible to +other sharers. The kernel is responsible for ensuring coherency between +cleancache (shared or not), the page cache, and the filesystem, using +cleancache flush operations as required. + +Note that cleancache must enforce put-put-get coherency and get-get +coherency. For the former, if two puts are made to the same handle but +with different data, say AAA by the first put and BBB by the second, a +subsequent get can never return the stale data (AAA). For get-get coherency, +if a get for a given handle fails, subsequent gets for that handle will +never succeed unless preceded by a successful put with that handle. + +Last, cleancache provides no SMP serialization guarantees; if two +different Linux threads are simultaneously putting and flushing a page +with the same handle, the results are indeterminate. Callers must +lock the page to ensure serial behavior. + +CLEANCACHE PERFORMANCE METRICS + +Cleancache monitoring is done by sysfs files in the +/sys/kernel/mm/cleancache directory. The effectiveness of cleancache +can be measured (across all filesystems) with: + +succ_gets - number of gets that were successful +failed_gets - number of gets that failed +puts - number of puts attempted (all "succeed") +flushes - number of flushes attempted + +A backend implementatation may provide additional metrics. + +FAQ + +1) Where's the value? (Andrew Morton) + +Cleancache provides a significant performance benefit to many workloads +in many environments with negligible overhead by improving the +effectiveness of the pagecache. Clean pagecache pages are +saved in transcendent memory (RAM that is otherwise not directly +addressable to the kernel); fetching those pages later avoids "refaults" +and thus disk reads. + +Cleancache (and its sister code "frontswap") provide interfaces for +this transcendent memory (aka "tmem"), which conceptually lies between +fast kernel-directly-addressable RAM and slower DMA/asynchronous devices. +Disallowing direct kernel or userland reads/writes to tmem +is ideal when data is transformed to a different form and size (such +as with compression) or secretly moved (as might be useful for write- +balancing for some RAM-like devices). Evicted page-cache pages (and +swap pages) are a great use for this kind of slower-than-RAM-but-much- +faster-than-disk transcendent memory, and the cleancache (and frontswap) +"page-object-oriented" specification provides a nice way to read and +write -- and indirectly "name" -- the pages. + +In the virtual case, the whole point of virtualization is to statistically +multiplex physical resources across the varying demands of multiple +virtual machines. This is really hard to do with RAM and efforts to +do it well with no kernel change have essentially failed (except in some +well-publicized special-case workloads). Cleancache -- and frontswap -- +with a fairly small impact on the kernel, provide a huge amount +of flexibility for more dynamic, flexible RAM multiplexing. +Specifically, the Xen Transcendent Memory backend allows otherwise +"fallow" hypervisor-owned RAM to not only be "time-shared" between multiple +virtual machines, but the pages can be compressed and deduplicated to +optimize RAM utilization. And when guest OS's are induced to surrender +underutilized RAM (e.g. with "self-ballooning"), page cache pages +are the first to go, and cleancache allows those pages to be +saved and reclaimed if overall host system memory conditions allow. + +And the identical interface used for cleancache can be used in +physical systems as well. The zcache driver acts as a memory-hungry +device that stores pages of data in a compressed state. And +the proposed "RAMster" driver shares RAM across multiple physical +systems. + +2) Why does cleancache have its sticky fingers so deep inside the + filesystems and VFS? (Andrew Morton and Christoph Hellwig) + +The core hooks for cleancache in VFS are in most cases a single line +and the minimum set are placed precisely where needed to maintain +coherency (via cleancache_flush operations) between cleancache, +the page cache, and disk. All hooks compile into nothingness if +cleancache is config'ed off and turn into a function-pointer- +compare-to-NULL if config'ed on but no backend claims the ops +functions, or to a compare-struct-element-to-negative if a +backend claims the ops functions but a filesystem doesn't enable +cleancache. + +Some filesystems are built entirely on top of VFS and the hooks +in VFS are sufficient, so don't require an "init_fs" hook; the +initial implementation of cleancache didn't provide this hook. +But for some filesystems (such as btrfs), the VFS hooks are +incomplete and one or more hooks in fs-specific code are required. +And for some other filesystems, such as tmpfs, cleancache may +be counterproductive. So it seemed prudent to require a filesystem +to "opt in" to use cleancache, which requires adding a hook in +each filesystem. Not all filesystems are supported by cleancache +only because they haven't been tested. The existing set should +be sufficient to validate the concept, the opt-in approach means +that untested filesystems are not affected, and the hooks in the +existing filesystems should make it very easy to add more +filesystems in the future. + +The total impact of the hooks to existing fs and mm files is only +about 40 lines added (not counting comments and blank lines). + +3) Why not make cleancache asynchronous and batched so it can + more easily interface with real devices with DMA instead + of copying each individual page? (Minchan Kim) + +The one-page-at-a-time copy semantics simplifies the implementation +on both the frontend and backend and also allows the backend to +do fancy things on-the-fly like page compression and +page deduplication. And since the data is "gone" (copied into/out +of the pageframe) before the cleancache get/put call returns, +a great deal of race conditions and potential coherency issues +are avoided. While the interface seems odd for a "real device" +or for real kernel-addressable RAM, it makes perfect sense for +transcendent memory. + +4) Why is non-shared cleancache "exclusive"? And where is the + page "flushed" after a "get"? (Minchan Kim) + +The main reason is to free up space in transcendent memory and +to avoid unnecessary cleancache_flush calls. If you want inclusive, +the page can be "put" immediately following the "get". If +put-after-get for inclusive becomes common, the interface could +be easily extended to add a "get_no_flush" call. + +The flush is done by the cleancache backend implementation. + +5) What's the performance impact? + +Performance analysis has been presented at OLS'09 and LCA'10. +Briefly, performance gains can be significant on most workloads, +especially when memory pressure is high (e.g. when RAM is +overcommitted in a virtual workload); and because the hooks are +invoked primarily in place of or in addition to a disk read/write, +overhead is negligible even in worst case workloads. Basically +cleancache replaces I/O with memory-copy-CPU-overhead; on older +single-core systems with slow memory-copy speeds, cleancache +has little value, but in newer multicore machines, especially +consolidated/virtualized machines, it has great value. + +6) How do I add cleancache support for filesystem X? (Boaz Harrash) + +Filesystems that are well-behaved and conform to certain +restrictions can utilize cleancache simply by making a call to +cleancache_init_fs at mount time. Unusual, misbehaving, or +poorly layered filesystems must either add additional hooks +and/or undergo extensive additional testing... or should just +not enable the optional cleancache. + +Some points for a filesystem to consider: + +- The FS should be block-device-based (e.g. a ram-based FS such + as tmpfs should not enable cleancache) +- To ensure coherency/correctness, the FS must ensure that all + file removal or truncation operations either go through VFS or + add hooks to do the equivalent cleancache "flush" operations +- To ensure coherency/correctness, either inode numbers must + be unique across the lifetime of the on-disk file OR the + FS must provide an "encode_fh" function. +- The FS must call the VFS superblock alloc and deactivate routines + or add hooks to do the equivalent cleancache calls done there. +- To maximize performance, all pages fetched from the FS should + go through the do_mpag_readpage routine or the FS should add + hooks to do the equivalent (cf. btrfs) +- Currently, the FS blocksize must be the same as PAGESIZE. This + is not an architectural restriction, but no backends currently + support anything different. +- A clustered FS should invoke the "shared_init_fs" cleancache + hook to get best performance for some backends. + +7) Why not use the KVA of the inode as the key? (Christoph Hellwig) + +If cleancache would use the inode virtual address instead of +inode/filehandle, the pool id could be eliminated. But, this +won't work because cleancache retains pagecache data pages +persistently even when the inode has been pruned from the +inode unused list, and only flushes the data page if the file +gets removed/truncated. So if cleancache used the inode kva, +there would be potential coherency issues if/when the inode +kva is reused for a different file. Alternately, if cleancache +flushed the pages when the inode kva was freed, much of the value +of cleancache would be lost because the cache of pages in cleanache +is potentially much larger than the kernel pagecache and is most +useful if the pages survive inode cache removal. + +8) Why is a global variable required? + +The cleancache_enabled flag is checked in all of the frequently-used +cleancache hooks. The alternative is a function call to check a static +variable. Since cleancache is enabled dynamically at runtime, systems +that don't enable cleancache would suffer thousands (possibly +tens-of-thousands) of unnecessary function calls per second. So the +global variable allows cleancache to be enabled by default at compile +time, but have insignificant performance impact when cleancache remains +disabled at runtime. + +9) Does cleanache work with KVM? + +The memory model of KVM is sufficiently different that a cleancache +backend may have less value for KVM. This remains to be tested, +especially in an overcommitted system. + +10) Does cleancache work in userspace? It sounds useful for + memory hungry caches like web browsers. (Jamie Lokier) + +No plans yet, though we agree it sounds useful, at least for +apps that bypass the page cache (e.g. O_DIRECT). + +Last updated: Dan Magenheimer, April 13 2011 diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index 12f9ba20ccb7..550068466605 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -129,12 +129,12 @@ Limit injection to pages owned by memgroup. Specified by inode number of the memcg. Example: - mkdir /cgroup/hwpoison + mkdir /sys/fs/cgroup/mem/hwpoison usemem -m 100 -s 1000 & - echo `jobs -p` > /cgroup/hwpoison/tasks + echo `jobs -p` > /sys/fs/cgroup/mem/hwpoison/tasks - memcg_ino=$(ls -id /cgroup/hwpoison | cut -f1 -d' ') + memcg_ino=$(ls -id /sys/fs/cgroup/mem/hwpoison | cut -f1 -d' ') echo $memcg_ino > /debug/hwpoison/corrupt-filter-memcg page-types -p `pidof init` --hwpoison # shall do nothing diff --git a/Documentation/vm/locking b/Documentation/vm/locking index 25fadb448760..f61228bd6395 100644 --- a/Documentation/vm/locking +++ b/Documentation/vm/locking @@ -66,7 +66,7 @@ in some cases it is not really needed. Eg, vm_start is modified by expand_stack(), it is hard to come up with a destructive scenario without having the vmlist protection in this case. -The page_table_lock nests with the inode i_mmap_lock and the kmem cache +The page_table_lock nests with the inode i_mmap_mutex and the kmem cache c_spinlock spinlocks. This is okay, since the kmem code asks for pages after dropping c_spinlock. The page_table_lock also nests with pagecache_lock and pagemap_lru_lock spinlocks, and no code asks for memory with these locks |