diff options
Diffstat (limited to 'Documentation')
139 files changed, 4742 insertions, 2753 deletions
diff --git a/Documentation/ABI/stable/sysfs-bus-xen-backend b/Documentation/ABI/stable/sysfs-bus-xen-backend new file mode 100644 index 000000000000..3d5951c8bf5f --- /dev/null +++ b/Documentation/ABI/stable/sysfs-bus-xen-backend @@ -0,0 +1,75 @@ +What: /sys/bus/xen-backend/devices/*/devtype +Date: Feb 2009 +KernelVersion: 2.6.38 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + The type of the device. e.g., one of: 'vbd' (block), + 'vif' (network), or 'vfb' (framebuffer). + +What: /sys/bus/xen-backend/devices/*/nodename +Date: Feb 2009 +KernelVersion: 2.6.38 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + XenStore node (under /local/domain/NNN/) for this + backend device. + +What: /sys/bus/xen-backend/devices/vbd-*/physical_device +Date: April 2011 +KernelVersion: 3.0 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + The major:minor number (in hexidecimal) of the + physical device providing the storage for this backend + block device. + +What: /sys/bus/xen-backend/devices/vbd-*/mode +Date: April 2011 +KernelVersion: 3.0 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + Whether the block device is read-only ('r') or + read-write ('w'). + +What: /sys/bus/xen-backend/devices/vbd-*/statistics/f_req +Date: April 2011 +KernelVersion: 3.0 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + Number of flush requests from the frontend. + +What: /sys/bus/xen-backend/devices/vbd-*/statistics/oo_req +Date: April 2011 +KernelVersion: 3.0 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + Number of requests delayed because the backend was too + busy processing previous requests. + +What: /sys/bus/xen-backend/devices/vbd-*/statistics/rd_req +Date: April 2011 +KernelVersion: 3.0 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + Number of read requests from the frontend. + +What: /sys/bus/xen-backend/devices/vbd-*/statistics/rd_sect +Date: April 2011 +KernelVersion: 3.0 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + Number of sectors read by the frontend. + +What: /sys/bus/xen-backend/devices/vbd-*/statistics/wr_req +Date: April 2011 +KernelVersion: 3.0 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + Number of write requests from the frontend. + +What: /sys/bus/xen-backend/devices/vbd-*/statistics/wr_sect +Date: April 2011 +KernelVersion: 3.0 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + Number of sectors written by the frontend. diff --git a/Documentation/ABI/stable/sysfs-devices-system-xen_memory b/Documentation/ABI/stable/sysfs-devices-system-xen_memory new file mode 100644 index 000000000000..caa311d59ac1 --- /dev/null +++ b/Documentation/ABI/stable/sysfs-devices-system-xen_memory @@ -0,0 +1,77 @@ +What: /sys/devices/system/xen_memory/xen_memory0/max_retry_count +Date: May 2011 +KernelVersion: 2.6.39 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + The maximum number of times the balloon driver will + attempt to increase the balloon before giving up. See + also 'retry_count' below. + A value of zero means retry forever and is the default one. + +What: /sys/devices/system/xen_memory/xen_memory0/max_schedule_delay +Date: May 2011 +KernelVersion: 2.6.39 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + The limit that 'schedule_delay' (see below) will be + increased to. The default value is 32 seconds. + +What: /sys/devices/system/xen_memory/xen_memory0/retry_count +Date: May 2011 +KernelVersion: 2.6.39 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + The current number of times that the balloon driver + has attempted to increase the size of the balloon. + The default value is one. With max_retry_count being + zero (unlimited), this means that the driver will attempt + to retry with a 'schedule_delay' delay. + +What: /sys/devices/system/xen_memory/xen_memory0/schedule_delay +Date: May 2011 +KernelVersion: 2.6.39 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + The time (in seconds) to wait between attempts to + increase the balloon. Each time the balloon cannot be + increased, 'schedule_delay' is increased (until + 'max_schedule_delay' is reached at which point it + will use the max value). + +What: /sys/devices/system/xen_memory/xen_memory0/target +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + The target number of pages to adjust this domain's + memory reservation to. + +What: /sys/devices/system/xen_memory/xen_memory0/target_kb +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + As target above, except the value is in KiB. + +What: /sys/devices/system/xen_memory/xen_memory0/info/current_kb +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + Current size (in KiB) of this domain's memory + reservation. + +What: /sys/devices/system/xen_memory/xen_memory0/info/high_kb +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + Amount (in KiB) of high memory in the balloon. + +What: /sys/devices/system/xen_memory/xen_memory0/info/low_kb +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> +Description: + Amount (in KiB) of low (or normal) memory in the + balloon. diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 349ecf26ce10..34f51100f029 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -66,6 +66,24 @@ Description: re-discover previously removed devices. Depends on CONFIG_HOTPLUG. +What: /sys/bus/pci/devices/.../msi_irqs/ +Date: September, 2011 +Contact: Neil Horman <nhorman@tuxdriver.com> +Description: + The /sys/devices/.../msi_irqs directory contains a variable set + of sub-directories, with each sub-directory being named after a + corresponding msi irq vector allocated to that device. Each + numbered sub-directory N contains attributes of that irq. + Note that this directory is not created for device drivers which + do not support msi irqs + +What: /sys/bus/pci/devices/.../msi_irqs/<N>/mode +Date: September 2011 +Contact: Neil Horman <nhorman@tuxdriver.com> +Description: + This attribute indicates the mode that the irq vector named by + the parent directory is in (msi vs. msix) + What: /sys/bus/pci/devices/.../remove Date: January 2009 Contact: Linux PCI developers <linux-pci@vger.kernel.org> diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb index e647378e9e88..b4f548792e32 100644 --- a/Documentation/ABI/testing/sysfs-bus-usb +++ b/Documentation/ABI/testing/sysfs-bus-usb @@ -119,6 +119,31 @@ Description: Write a 1 to force the device to disconnect (equivalent to unplugging a wired USB device). +What: /sys/bus/usb/drivers/.../new_id +Date: October 2011 +Contact: linux-usb@vger.kernel.org +Description: + Writing a device ID to this file will attempt to + dynamically add a new device ID to a USB device driver. + This may allow the driver to support more hardware than + was included in the driver's static device ID support + table at compile time. The format for the device ID is: + idVendor idProduct bInterfaceClass. + The vendor ID and device ID fields are required, the + interface class is optional. + Upon successfully adding an ID, the driver will probe + for the device and attempt to bind to it. For example: + # echo "8086 10f5" > /sys/bus/usb/drivers/foo/new_id + +What: /sys/bus/usb-serial/drivers/.../new_id +Date: October 2011 +Contact: linux-usb@vger.kernel.org +Description: + For serial USB drivers, this attribute appears under the + extra bus folder "usb-serial" in sysfs; apart from that + difference, all descriptions from the entry + "/sys/bus/usb/drivers/.../new_id" apply. + What: /sys/bus/usb/drivers/.../remove_id Date: November 2009 Contact: CHENG Renquan <rqcheng@smu.edu.sg> diff --git a/Documentation/ABI/testing/sysfs-class-rtc-rtc0-device-rtc_calibration b/Documentation/ABI/testing/sysfs-class-rtc-rtc0-device-rtc_calibration new file mode 100644 index 000000000000..4cf1e72222d9 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-rtc-rtc0-device-rtc_calibration @@ -0,0 +1,12 @@ +What: Attribute for calibrating ST-Ericsson AB8500 Real Time Clock +Date: Oct 2011 +KernelVersion: 3.0 +Contact: Mark Godfrey <mark.godfrey@stericsson.com> +Description: The rtc_calibration attribute allows the userspace to + calibrate the AB8500.s 32KHz Real Time Clock. + Every 60 seconds the AB8500 will correct the RTC's value + by adding to it the value of this attribute. + The range of the attribute is -127 to +127 in units of + 30.5 micro-seconds (half-parts-per-million of the 32KHz clock) +Users: The /vendor/st-ericsson/base_utilities/core/rtc_calibration + daemon uses this interface. diff --git a/Documentation/ABI/testing/sysfs-devices-platform-docg3 b/Documentation/ABI/testing/sysfs-devices-platform-docg3 new file mode 100644 index 000000000000..8aa36716882f --- /dev/null +++ b/Documentation/ABI/testing/sysfs-devices-platform-docg3 @@ -0,0 +1,34 @@ +What: /sys/devices/platform/docg3/f[0-3]_dps[01]_is_keylocked +Date: November 2011 +KernelVersion: 3.3 +Contact: Robert Jarzmik <robert.jarzmik@free.fr> +Description: + Show whether the floor (0 to 4), protection area (0 or 1) is + keylocked. Each docg3 chip (or floor) has 2 protection areas, + which can cover any part of it, block aligned, called DPS. + The protection has information embedded whether it blocks reads, + writes or both. + The result is: + 0 -> the DPS is not keylocked + 1 -> the DPS is keylocked +Users: None identified so far. + +What: /sys/devices/platform/docg3/f[0-3]_dps[01]_protection_key +Date: November 2011 +KernelVersion: 3.3 +Contact: Robert Jarzmik <robert.jarzmik@free.fr> +Description: + Enter the protection key for the floor (0 to 4), protection area + (0 or 1). Each docg3 chip (or floor) has 2 protection areas, + which can cover any part of it, block aligned, called DPS. + The protection has information embedded whether it blocks reads, + writes or both. + The protection key is a string of 8 bytes (value 0-255). + Entering the correct value toggle the lock, and can be observed + through f[0-3]_dps[01]_is_keylocked. + Possible values are: + - 8 bytes + Typical values are: + - "00000000" + - "12345678" +Users: None identified so far. diff --git a/Documentation/ABI/testing/sysfs-driver-hid-logitech-lg4ff b/Documentation/ABI/testing/sysfs-driver-hid-logitech-lg4ff index 9aec8ef228b0..167d9032b970 100644 --- a/Documentation/ABI/testing/sysfs-driver-hid-logitech-lg4ff +++ b/Documentation/ABI/testing/sysfs-driver-hid-logitech-lg4ff @@ -1,7 +1,7 @@ What: /sys/module/hid_logitech/drivers/hid:logitech/<dev>/range. Date: July 2011 KernelVersion: 3.2 -Contact: Michal Malý <madcatxster@gmail.com> +Contact: Michal Malý <madcatxster@gmail.com> Description: Display minimum, maximum and current range of the steering wheel. Writing a value within min and max boundaries sets the range of the wheel. diff --git a/Documentation/ABI/testing/sysfs-driver-hid-multitouch b/Documentation/ABI/testing/sysfs-driver-hid-multitouch new file mode 100644 index 000000000000..f79839d1af37 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-driver-hid-multitouch @@ -0,0 +1,9 @@ +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/quirks +Date: November 2011 +Contact: Benjamin Tissoires <benjamin.tissoires@gmail.com> +Description: The integer value of this attribute corresponds to the + quirks actually in place to handle the device's protocol. + When read, this attribute returns the current settings (see + MT_QUIRKS_* in hid-multitouch.c). + When written this attribute change on the fly the quirks, then + the protocol to handle the device. diff --git a/Documentation/ABI/testing/sysfs-driver-hid-roccat-isku b/Documentation/ABI/testing/sysfs-driver-hid-roccat-isku new file mode 100644 index 000000000000..189dc43891bf --- /dev/null +++ b/Documentation/ABI/testing/sysfs-driver-hid-roccat-isku @@ -0,0 +1,135 @@ +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/actual_profile +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: The integer value of this attribute ranges from 0-4. + When read, this attribute returns the number of the actual + profile. This value is persistent, so its equivalent to the + profile that's active when the device is powered on next time. + When written, this file sets the number of the startup profile + and the device activates this profile immediately. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/info +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When read, this file returns general data like firmware version. + The data is 6 bytes long. + This file is readonly. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/key_mask +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one deactivate certain keys like + windows and application keys, to prevent accidental presses. + Profile number for which this settings occur is included in + written data. The data has to be 6 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/keys_capslock +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the function of the + capslock key for a specific profile. Profile number is included + in written data. The data has to be 6 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/keys_easyzone +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the function of the + easyzone keys for a specific profile. Profile number is included + in written data. The data has to be 65 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/keys_function +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the function of the + function keys for a specific profile. Profile number is included + in written data. The data has to be 41 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/keys_macro +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the function of the macro + keys for a specific profile. Profile number is included in + written data. The data has to be 35 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/keys_media +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the function of the media + keys for a specific profile. Profile number is included in + written data. The data has to be 29 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/keys_thumbster +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the function of the + thumbster keys for a specific profile. Profile number is included + in written data. The data has to be 23 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/last_set +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the time in secs since + epoch in which the last configuration took place. + The data has to be 20 bytes long. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/light +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the backlight intensity for + a specific profile. Profile number is included in written data. + The data has to be 10 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/macro +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one store macros with max 500 + keystrokes for a specific button for a specific profile. + Button and profile numbers are included in written data. + The data has to be 2083 bytes long. + Before reading this file, control has to be written to select + which profile and key to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/control +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one select which data from which + profile will be read next. The data has to be 3 bytes long. + This file is writeonly. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/isku/roccatisku<minor>/talk +Date: June 2011 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one trigger easyshift functionality + from the host. + The data has to be 16 bytes long. + This file is writeonly. +Users: http://roccat.sourceforge.net diff --git a/Documentation/ABI/testing/sysfs-driver-hid-wiimote b/Documentation/ABI/testing/sysfs-driver-hid-wiimote index 5d5a16ea57c6..3d98009f447a 100644 --- a/Documentation/ABI/testing/sysfs-driver-hid-wiimote +++ b/Documentation/ABI/testing/sysfs-driver-hid-wiimote @@ -8,3 +8,15 @@ Contact: David Herrmann <dh.herrmann@googlemail.com> Description: Make it possible to set/get current led state. Reading from it returns 0 if led is off and 1 if it is on. Writing 0 to it disables the led, writing 1 enables it. + +What: /sys/bus/hid/drivers/wiimote/<dev>/extension +Date: August 2011 +KernelVersion: 3.2 +Contact: David Herrmann <dh.herrmann@googlemail.com> +Description: This file contains the currently connected and initialized + extensions. It can be one of: none, motionp, nunchuck, classic, + motionp+nunchuck, motionp+classic + motionp is the official Nintendo Motion+ extension, nunchuck is + the official Nintendo Nunchuck extension and classic is the + Nintendo Classic Controller extension. The motionp extension can + be combined with the other two. diff --git a/Documentation/ABI/testing/sysfs-driver-wacom b/Documentation/ABI/testing/sysfs-driver-wacom index 82d4df136444..0130d6683c14 100644 --- a/Documentation/ABI/testing/sysfs-driver-wacom +++ b/Documentation/ABI/testing/sysfs-driver-wacom @@ -15,9 +15,9 @@ Contact: linux-input@vger.kernel.org Description: Attribute group for control of the status LEDs and the OLEDs. This attribute group is only available for Intuos 4 M, L, - and XL (with LEDs and OLEDs) and Cintiq 21UX2 (LEDs only). - Therefore its presence implicitly signifies the presence of - said LEDs and OLEDs on the tablet device. + and XL (with LEDs and OLEDs) and Cintiq 21UX2 and Cintiq 24HD + (LEDs only). Therefore its presence implicitly signifies the + presence of said LEDs and OLEDs on the tablet device. What: /sys/bus/usb/devices/<busnum>-<devnum>:<cfg>.<intf>/wacom_led/status0_luminance Date: August 2011 @@ -41,16 +41,17 @@ Date: August 2011 Contact: linux-input@vger.kernel.org Description: Writing to this file sets which one of the four (for Intuos 4) - or of the right four (for Cintiq 21UX2) status LEDs is active (0..3). - The other three LEDs on the same side are always inactive. + or of the right four (for Cintiq 21UX2 and Cintiq 24HD) status + LEDs is active (0..3). The other three LEDs on the same side are + always inactive. What: /sys/bus/usb/devices/<busnum>-<devnum>:<cfg>.<intf>/wacom_led/status_led1_select Date: September 2011 Contact: linux-input@vger.kernel.org Description: - Writing to this file sets which one of the left four (for Cintiq 21UX2) - status LEDs is active (0..3). The other three LEDs on the left are always - inactive. + Writing to this file sets which one of the left four (for Cintiq 21UX2 + and Cintiq 24HD) status LEDs is active (0..3). The other three LEDs on + the left are always inactive. What: /sys/bus/usb/devices/<busnum>-<devnum>:<cfg>.<intf>/wacom_led/buttons_luminance Date: August 2011 diff --git a/Documentation/ABI/testing/sysfs-kernel-slab b/Documentation/ABI/testing/sysfs-kernel-slab index 8b093f8222d3..91bd6ca5440f 100644 --- a/Documentation/ABI/testing/sysfs-kernel-slab +++ b/Documentation/ABI/testing/sysfs-kernel-slab @@ -346,6 +346,10 @@ Description: number of objects per slab. If a slab cannot be allocated because of fragmentation, SLUB will retry with the minimum order possible depending on its characteristics. + When debug_guardpage_minorder=N (N > 0) parameter is specified + (see Documentation/kernel-parameters.txt), the minimum possible + order is used and this sysfs entry can not be used to change + the order at run time. What: /sys/kernel/slab/cache/order_fallback Date: April 2008 diff --git a/Documentation/ABI/testing/sysfs-module b/Documentation/ABI/testing/sysfs-module index 9489ea8e294c..47064c2b1f79 100644 --- a/Documentation/ABI/testing/sysfs-module +++ b/Documentation/ABI/testing/sysfs-module @@ -33,3 +33,19 @@ Description: Maximum time allowed for periodic transfers per microframe (μs) Beware, non-standard modes are usually not thoroughly tested by hardware designers, and the hardware can malfunction when this setting differ from default 100. + +What: /sys/module/*/{coresize,initsize} +Date: Jan 2012 +KernelVersion:»·3.3 +Contact: Kay Sievers <kay.sievers@vrfy.org> +Description: Module size in bytes. + +What: /sys/module/*/taint +Date: Jan 2012 +KernelVersion:»·3.3 +Contact: Kay Sievers <kay.sievers@vrfy.org> +Description: Module taint flags: + P - proprietary module + O - out-of-tree module + F - force-loaded module + C - staging driver module diff --git a/Documentation/DocBook/debugobjects.tmpl b/Documentation/DocBook/debugobjects.tmpl index 08ff908aa7a2..24979f691e3e 100644 --- a/Documentation/DocBook/debugobjects.tmpl +++ b/Documentation/DocBook/debugobjects.tmpl @@ -96,6 +96,7 @@ <listitem><para>debug_object_deactivate</para></listitem> <listitem><para>debug_object_destroy</para></listitem> <listitem><para>debug_object_free</para></listitem> + <listitem><para>debug_object_assert_init</para></listitem> </itemizedlist> Each of these functions takes the address of the real object and a pointer to the object type specific debug description @@ -273,6 +274,26 @@ debug checks. </para> </sect1> + + <sect1 id="debug_object_assert_init"> + <title>debug_object_assert_init</title> + <para> + This function is called to assert that an object has been + initialized. + </para> + <para> + When the real object is not tracked by debugobjects, it calls + fixup_assert_init of the object type description structure + provided by the caller, with the hardcoded object state + ODEBUG_NOT_AVAILABLE. The fixup function can correct the problem + by calling debug_object_init and other specific initializing + functions. + </para> + <para> + When the real object is already tracked by debugobjects it is + ignored. + </para> + </sect1> </chapter> <chapter id="fixupfunctions"> <title>Fixup functions</title> @@ -381,6 +402,35 @@ statistics. </para> </sect1> + <sect1 id="fixup_assert_init"> + <title>fixup_assert_init</title> + <para> + This function is called from the debug code whenever a problem + in debug_object_assert_init is detected. + </para> + <para> + Called from debug_object_assert_init() with a hardcoded state + ODEBUG_STATE_NOTAVAILABLE when the object is not found in the + debug bucket. + </para> + <para> + The function returns 1 when the fixup was successful, + otherwise 0. The return value is used to update the + statistics. + </para> + <para> + Note, this function should make sure debug_object_init() is + called before returning. + </para> + <para> + The handling of statically initialized objects is a special + case. The fixup function should check if this is a legitimate + case of a statically initialized object or not. In this case only + debug_object_init() should be called to make the object known to + the tracker. Then the function should return 0 because this is not + a real fixup. + </para> + </sect1> </chapter> <chapter id="bugs"> <title>Known Bugs And Assumptions</title> diff --git a/Documentation/DocBook/writing-an-alsa-driver.tmpl b/Documentation/DocBook/writing-an-alsa-driver.tmpl index 5de23c007078..cab4ec58e46e 100644 --- a/Documentation/DocBook/writing-an-alsa-driver.tmpl +++ b/Documentation/DocBook/writing-an-alsa-driver.tmpl @@ -404,7 +404,7 @@ /* SNDRV_CARDS: maximum number of cards supported by this module */ static int index[SNDRV_CARDS] = SNDRV_DEFAULT_IDX; static char *id[SNDRV_CARDS] = SNDRV_DEFAULT_STR; - static int enable[SNDRV_CARDS] = SNDRV_DEFAULT_ENABLE_PNP; + static bool enable[SNDRV_CARDS] = SNDRV_DEFAULT_ENABLE_PNP; /* definition of the chip-specific record */ struct mychip { diff --git a/Documentation/HOWTO b/Documentation/HOWTO index 81bc1a9ab9d8..f7ade3b3b40d 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO @@ -275,8 +275,8 @@ versions. If no 2.6.x.y kernel is available, then the highest numbered 2.6.x kernel is the current stable kernel. -2.6.x.y are maintained by the "stable" team <stable@kernel.org>, and are -released as needs dictate. The normal release period is approximately +2.6.x.y are maintained by the "stable" team <stable@vger.kernel.org>, and +are released as needs dictate. The normal release period is approximately two weeks, but it can be longer if there are no pressing problems. A security-related problem, instead, can cause a release to happen almost instantly. diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 0c134f8afc6f..bff2d8be1e18 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -328,6 +328,12 @@ over a rather long period of time, but improvements are always welcome! RCU rather than SRCU, because RCU is almost always faster and easier to use than is SRCU. + If you need to enter your read-side critical section in a + hardirq or exception handler, and then exit that same read-side + critical section in the task that was interrupted, then you need + to srcu_read_lock_raw() and srcu_read_unlock_raw(), which avoid + the lockdep checking that would otherwise this practice illegal. + Also unlike other forms of RCU, explicit initialization and cleanup is required via init_srcu_struct() and cleanup_srcu_struct(). These are passed a "struct srcu_struct" diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt index 31852705b586..bf778332a28f 100644 --- a/Documentation/RCU/rcu.txt +++ b/Documentation/RCU/rcu.txt @@ -38,11 +38,11 @@ o How can the updater tell when a grace period has completed Preemptible variants of RCU (CONFIG_TREE_PREEMPT_RCU) get the same effect, but require that the readers manipulate CPU-local - counters. These counters allow limited types of blocking - within RCU read-side critical sections. SRCU also uses - CPU-local counters, and permits general blocking within - RCU read-side critical sections. These two variants of - RCU detect grace periods by sampling these counters. + counters. These counters allow limited types of blocking within + RCU read-side critical sections. SRCU also uses CPU-local + counters, and permits general blocking within RCU read-side + critical sections. These variants of RCU detect grace periods + by sampling these counters. o If I am running on a uniprocessor kernel, which can only do one thing at a time, why should I wait for a grace period? diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 4e959208f736..083d88cbc089 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -101,6 +101,11 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning messages. +o A hardware or software issue shuts off the scheduler-clock + interrupt on a CPU that is not in dyntick-idle mode. This + problem really has happened, and seems to be most likely to + result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels. + o A bug in the RCU implementation. o A hardware failure. This is quite unlikely, but has occurred @@ -109,12 +114,11 @@ o A hardware failure. This is quite unlikely, but has occurred This resulted in a series of RCU CPU stall warnings, eventually leading the realization that the CPU had failed. -The RCU, RCU-sched, and RCU-bh implementations have CPU stall -warning. SRCU does not have its own CPU stall warnings, but its -calls to synchronize_sched() will result in RCU-sched detecting -RCU-sched-related CPU stalls. Please note that RCU only detects -CPU stalls when there is a grace period in progress. No grace period, -no CPU stall warnings. +The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning. +SRCU does not have its own CPU stall warnings, but its calls to +synchronize_sched() will result in RCU-sched detecting RCU-sched-related +CPU stalls. Please note that RCU only detects CPU stalls when there is +a grace period in progress. No grace period, no CPU stall warnings. To diagnose the cause of the stall, inspect the stack traces. The offending function will usually be near the top of the stack. diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index 783d6c134d3f..d67068d0d2b9 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt @@ -61,11 +61,24 @@ nreaders This is the number of RCU reading threads supported. To properly exercise RCU implementations with preemptible read-side critical sections. +onoff_interval + The number of seconds between each attempt to execute a + randomly selected CPU-hotplug operation. Defaults to + zero, which disables CPU hotplugging. In HOTPLUG_CPU=n + kernels, rcutorture will silently refuse to do any + CPU-hotplug operations regardless of what value is + specified for onoff_interval. + shuffle_interval The number of seconds to keep the test threads affinitied to a particular subset of the CPUs, defaults to 3 seconds. Used in conjunction with test_no_idle_hz. +shutdown_secs The number of seconds to run the test before terminating + the test and powering off the system. The default is + zero, which disables test termination and system shutdown. + This capability is useful for automated testing. + stat_interval The number of seconds between output of torture statistics (via printk()). Regardless of the interval, statistics are printed when the module is unloaded. diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index aaf65f6c6cd7..49587abfc2f7 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -105,14 +105,10 @@ o "dt" is the current value of the dyntick counter that is incremented or one greater than the interrupt-nesting depth otherwise. The number after the second "/" is the NMI nesting depth. - This field is displayed only for CONFIG_NO_HZ kernels. - o "df" is the number of times that some other CPU has forced a quiescent state on behalf of this CPU due to this CPU being in dynticks-idle state. - This field is displayed only for CONFIG_NO_HZ kernels. - o "of" is the number of times that some other CPU has forced a quiescent state on behalf of this CPU due to this CPU being offline. In a perfect world, this might never happen, but it diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 6ef692667e2f..6bbe8dcdc3da 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -4,6 +4,7 @@ to start learning about RCU: 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ +4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ What is RCU? @@ -834,6 +835,8 @@ SRCU: Critical sections Grace period Barrier srcu_read_lock synchronize_srcu N/A srcu_read_unlock synchronize_srcu_expedited + srcu_read_lock_raw + srcu_read_unlock_raw srcu_dereference SRCU: Initialization/cleanup @@ -855,27 +858,33 @@ list can be helpful: a. Will readers need to block? If so, you need SRCU. -b. What about the -rt patchset? If readers would need to block +b. Is it necessary to start a read-side critical section in a + hardirq handler or exception handler, and then to complete + this read-side critical section in the task that was + interrupted? If so, you need SRCU's srcu_read_lock_raw() and + srcu_read_unlock_raw() primitives. + +c. What about the -rt patchset? If readers would need to block in an non-rt kernel, you need SRCU. If readers would block in a -rt kernel, but not in a non-rt kernel, SRCU is not necessary. -c. Do you need to treat NMI handlers, hardirq handlers, +d. Do you need to treat NMI handlers, hardirq handlers, and code segments with preemption disabled (whether via preempt_disable(), local_irq_save(), local_bh_disable(), or some other mechanism) as if they were explicit RCU readers? If so, you need RCU-sched. -d. Do you need RCU grace periods to complete even in the face +e. Do you need RCU grace periods to complete even in the face of softirq monopolization of one or more of the CPUs? For example, is your code subject to network-based denial-of-service attacks? If so, you need RCU-bh. -e. Is your workload too update-intensive for normal use of +f. Is your workload too update-intensive for normal use of RCU, but inappropriate for other synchronization mechanisms? If so, consider SLAB_DESTROY_BY_RCU. But please be careful! -f. Otherwise, use RCU. +g. Otherwise, use RCU. Of course, this all assumes that you have determined that RCU is in fact the right tool for your job. diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt index 771d48d3b335..208a2d465b92 100644 --- a/Documentation/arm/memory.txt +++ b/Documentation/arm/memory.txt @@ -51,15 +51,14 @@ ffc00000 ffefffff DMA memory mapping region. Memory returned ff000000 ffbfffff Reserved for future expansion of DMA mapping region. -VMALLOC_END feffffff Free for platform use, recommended. - VMALLOC_END must be aligned to a 2MB - boundary. - VMALLOC_START VMALLOC_END-1 vmalloc() / ioremap() space. Memory returned by vmalloc/ioremap will be dynamically placed in this region. - VMALLOC_START may be based upon the value - of the high_memory variable. + Machine specific static mappings are also + located here through iotable_init(). + VMALLOC_START is based upon the value + of the high_memory variable, and VMALLOC_END + is equal to 0xff000000. PAGE_OFFSET high_memory-1 Kernel direct-mapped RAM region. This maps the platforms RAM, and typically diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt index 3bd585b44927..27f2b21a9d5c 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/atomic_ops.txt @@ -84,6 +84,93 @@ compiler optimizes the section accessing atomic_t variables. *** YOU HAVE BEEN WARNED! *** +Properly aligned pointers, longs, ints, and chars (and unsigned +equivalents) may be atomically loaded from and stored to in the same +sense as described for atomic_read() and atomic_set(). The ACCESS_ONCE() +macro should be used to prevent the compiler from using optimizations +that might otherwise optimize accesses out of existence on the one hand, +or that might create unsolicited accesses on the other. + +For example consider the following code: + + while (a > 0) + do_something(); + +If the compiler can prove that do_something() does not store to the +variable a, then the compiler is within its rights transforming this to +the following: + + tmp = a; + if (a > 0) + for (;;) + do_something(); + +If you don't want the compiler to do this (and you probably don't), then +you should use something like the following: + + while (ACCESS_ONCE(a) < 0) + do_something(); + +Alternatively, you could place a barrier() call in the loop. + +For another example, consider the following code: + + tmp_a = a; + do_something_with(tmp_a); + do_something_else_with(tmp_a); + +If the compiler can prove that do_something_with() does not store to the +variable a, then the compiler is within its rights to manufacture an +additional load as follows: + + tmp_a = a; + do_something_with(tmp_a); + tmp_a = a; + do_something_else_with(tmp_a); + +This could fatally confuse your code if it expected the same value +to be passed to do_something_with() and do_something_else_with(). + +The compiler would be likely to manufacture this additional load if +do_something_with() was an inline function that made very heavy use +of registers: reloading from variable a could save a flush to the +stack and later reload. To prevent the compiler from attacking your +code in this manner, write the following: + + tmp_a = ACCESS_ONCE(a); + do_something_with(tmp_a); + do_something_else_with(tmp_a); + +For a final example, consider the following code, assuming that the +variable a is set at boot time before the second CPU is brought online +and never changed later, so that memory barriers are not needed: + + if (a) + b = 9; + else + b = 42; + +The compiler is within its rights to manufacture an additional store +by transforming the above code into the following: + + b = 42; + if (a) + b = 9; + +This could come as a fatal surprise to other code running concurrently +that expected b to never have the value 42 if a was zero. To prevent +the compiler from doing this, write something like: + + if (a) + ACCESS_ONCE(b) = 9; + else + ACCESS_ONCE(b) = 42; + +Don't even -think- about doing this without proper use of memory barriers, +locks, or atomic operations if variable a can change at runtime! + +*** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! *** + Now, we move onto the atomic operation interfaces typically implemented with the help of assembly code. diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 9c452ef2328c..a7c96ae5557c 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -594,53 +594,44 @@ rmdir() will fail with it. From this behavior, pre_destroy() can be called multiple times against a cgroup. int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct task_struct *task) + struct cgroup_taskset *tset) (cgroup_mutex held by caller) -Called prior to moving a task into a cgroup; if the subsystem -returns an error, this will abort the attach operation. If a NULL -task is passed, then a successful result indicates that *any* -unspecified task can be moved into the cgroup. Note that this isn't -called on a fork. If this method returns 0 (success) then this should -remain valid while the caller holds cgroup_mutex and it is ensured that either +Called prior to moving one or more tasks into a cgroup; if the +subsystem returns an error, this will abort the attach operation. +@tset contains the tasks to be attached and is guaranteed to have at +least one task in it. + +If there are multiple tasks in the taskset, then: + - it's guaranteed that all are from the same thread group + - @tset contains all tasks from the thread group whether or not + they're switching cgroups + - the first task is the leader + +Each @tset entry also contains the task's old cgroup and tasks which +aren't switching cgroup can be skipped easily using the +cgroup_taskset_for_each() iterator. Note that this isn't called on a +fork. If this method returns 0 (success) then this should remain valid +while the caller holds cgroup_mutex and it is ensured that either attach() or cancel_attach() will be called in future. -int can_attach_task(struct cgroup *cgrp, struct task_struct *tsk); -(cgroup_mutex held by caller) - -As can_attach, but for operations that must be run once per task to be -attached (possibly many when using cgroup_attach_proc). Called after -can_attach. - void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct task_struct *task, bool threadgroup) + struct cgroup_taskset *tset) (cgroup_mutex held by caller) Called when a task attach operation has failed after can_attach() has succeeded. A subsystem whose can_attach() has some side-effects should provide this function, so that the subsystem can implement a rollback. If not, not necessary. This will be called only about subsystems whose can_attach() operation have -succeeded. - -void pre_attach(struct cgroup *cgrp); -(cgroup_mutex held by caller) - -For any non-per-thread attachment work that needs to happen before -attach_task. Needed by cpuset. +succeeded. The parameters are identical to can_attach(). void attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct cgroup *old_cgrp, struct task_struct *task) + struct cgroup_taskset *tset) (cgroup_mutex held by caller) Called after the task has been attached to the cgroup, to allow any post-attachment activity that requires memory allocations or blocking. - -void attach_task(struct cgroup *cgrp, struct task_struct *tsk); -(cgroup_mutex held by caller) - -As attach, but for operations that must be run once per task to be attached, -like can_attach_task. Called before attach. Currently does not support any -subsystem that might need the old_cgrp for every thread in the group. +The parameters are identical to can_attach(). void fork(struct cgroup_subsy *ss, struct task_struct *task) diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index cc0ebc5241b3..4c95c0034a4b 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -44,8 +44,8 @@ Features: - oom-killer disable knob and oom-notifier - Root cgroup has no limit controls. - Kernel memory and Hugepages are not under control yet. We just manage - pages on LRU. To add more controls, we have to take care of performance. + Kernel memory support is work in progress, and the current version provides + basically functionality. (See Section 2.7) Brief summary of control files. @@ -61,7 +61,7 @@ Brief summary of control files. memory.failcnt # show the number of memory usage hits limits memory.memsw.failcnt # show the number of memory+Swap hits limits memory.max_usage_in_bytes # show max memory usage recorded - memory.memsw.usage_in_bytes # show max memory+Swap usage recorded + memory.memsw.max_usage_in_bytes # show max memory+Swap usage recorded memory.soft_limit_in_bytes # set/show soft limit of memory usage memory.stat # show various statistics memory.use_hierarchy # set/show hierarchical account enabled @@ -72,6 +72,9 @@ Brief summary of control files. memory.oom_control # set/show oom controls. memory.numa_stat # show the number of memory usage per numa node + memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory + memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation + 1. History The memory controller has a long history. A request for comments for the memory @@ -255,6 +258,27 @@ When oom event notifier is registered, event will be delivered. per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by zone->lru_lock, it has no lock of its own. +2.7 Kernel Memory Extension (CONFIG_CGROUP_MEM_RES_CTLR_KMEM) + +With the Kernel memory extension, the Memory Controller is able to limit +the amount of kernel memory used by the system. Kernel memory is fundamentally +different than user memory, since it can't be swapped out, which makes it +possible to DoS the system by consuming too much of this precious resource. + +Kernel memory limits are not imposed for the root cgroup. Usage for the root +cgroup may or may not be accounted. + +Currently no soft limit is implemented for kernel memory. It is future work +to trigger slab reclaim when those limits are reached. + +2.7.1 Current Kernel Memory resources accounted + +* sockets memory pressure: some sockets protocols have memory pressure +thresholds. The Memory Controller allows them to be controlled individually +per cgroup, instead of globally. + +* tcp memory pressure: sockets memory pressure for the tcp protocol. + 3. User Interface 0. Configuration @@ -386,8 +410,11 @@ memory.stat file includes following statistics cache - # of bytes of page cache memory. rss - # of bytes of anonymous and swap cache memory. mapped_file - # of bytes of mapped file (includes tmpfs/shmem) -pgpgin - # of pages paged in (equivalent to # of charging events). -pgpgout - # of pages paged out (equivalent to # of uncharging events). +pgpgin - # of charging events to the memory cgroup. The charging + event happens each time a page is accounted as either mapped + anon page(RSS) or cache page(Page Cache) to the cgroup. +pgpgout - # of uncharging events to the memory cgroup. The uncharging + event happens each time a page is unaccounted from the cgroup. swap - # of bytes of swap usage inactive_anon - # of bytes of anonymous memory and swap cache memory on LRU list. diff --git a/Documentation/cgroups/net_prio.txt b/Documentation/cgroups/net_prio.txt new file mode 100644 index 000000000000..01b322635591 --- /dev/null +++ b/Documentation/cgroups/net_prio.txt @@ -0,0 +1,53 @@ +Network priority cgroup +------------------------- + +The Network priority cgroup provides an interface to allow an administrator to +dynamically set the priority of network traffic generated by various +applications + +Nominally, an application would set the priority of its traffic via the +SO_PRIORITY socket option. This however, is not always possible because: + +1) The application may not have been coded to set this value +2) The priority of application traffic is often a site-specific administrative + decision rather than an application defined one. + +This cgroup allows an administrator to assign a process to a group which defines +the priority of egress traffic on a given interface. Network priority groups can +be created by first mounting the cgroup filesystem. + +# mount -t cgroup -onet_prio none /sys/fs/cgroup/net_prio + +With the above step, the initial group acting as the parent accounting group +becomes visible at '/sys/fs/cgroup/net_prio'. This group includes all tasks in +the system. '/sys/fs/cgroup/net_prio/tasks' lists the tasks in this cgroup. + +Each net_prio cgroup contains two files that are subsystem specific + +net_prio.prioidx +This file is read-only, and is simply informative. It contains a unique integer +value that the kernel uses as an internal representation of this cgroup. + +net_prio.ifpriomap +This file contains a map of the priorities assigned to traffic originating from +processes in this group and egressing the system on various interfaces. It +contains a list of tuples in the form <ifname priority>. Contents of this file +can be modified by echoing a string into the file using the same tuple format. +for example: + +echo "eth0 5" > /sys/fs/cgroups/net_prio/iscsi/net_prio.ifpriomap + +This command would force any traffic originating from processes belonging to the +iscsi net_prio cgroup and egressing on interface eth0 to have the priority of +said traffic set to the value 5. The parent accounting group also has a +writeable 'net_prio.ifpriomap' file that can be used to set a system default +priority. + +Priorities are set immediately prior to queueing a frame to the device +queueing discipline (qdisc) so priorities will be assigned prior to the hardware +queue selection being made. + +One usage for the net_prio cgroup is with mqprio qdisc allowing application +traffic to be steered to hardware/driver based traffic classes. These mappings +can then be managed by administrators or other networking protocols such as +DCBX. diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt index d221781dabaa..c7a2eb8450c2 100644 --- a/Documentation/cpu-freq/governors.txt +++ b/Documentation/cpu-freq/governors.txt @@ -127,7 +127,7 @@ in the bash (as said, 1000 is default), do: echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) \ >ondemand/sampling_rate -show_sampling_rate_min: +sampling_rate_min: The sampling rate is limited by the HW transition latency: transition_latency * 100 Or by kernel restrictions: @@ -140,8 +140,6 @@ HZ=100: min=200000us (200ms) The highest value of kernel and HW latency restrictions is shown and used as the minimum sampling rate. -show_sampling_rate_max: THIS INTERFACE IS DEPRECATED, DON'T USE IT. - up_threshold: defines what the average CPU usage between the samplings of 'sampling_rate' needs to be for the kernel to make a decision on whether it should increase the frequency. For example when it is set diff --git a/Documentation/development-process/5.Posting b/Documentation/development-process/5.Posting index 903a2546f138..8a48c9b62864 100644 --- a/Documentation/development-process/5.Posting +++ b/Documentation/development-process/5.Posting @@ -271,10 +271,10 @@ copies should go to: the linux-kernel list. - If you are fixing a bug, think about whether the fix should go into the - next stable update. If so, stable@kernel.org should get a copy of the - patch. Also add a "Cc: stable@kernel.org" to the tags within the patch - itself; that will cause the stable team to get a notification when your - fix goes into the mainline. + next stable update. If so, stable@vger.kernel.org should get a copy of + the patch. Also add a "Cc: stable@vger.kernel.org" to the tags within + the patch itself; that will cause the stable team to get a notification + when your fix goes into the mainline. When selecting recipients for a patch, it is good to have an idea of who you think will eventually accept the patch and get it merged. While it diff --git a/Documentation/devices.txt b/Documentation/devices.txt index eccffe715229..cec8864ce4e8 100644 --- a/Documentation/devices.txt +++ b/Documentation/devices.txt @@ -379,7 +379,7 @@ Your cooperation is appreciated. 162 = /dev/smbus System Management Bus 163 = /dev/lik Logitech Internet Keyboard 164 = /dev/ipmo Intel Intelligent Platform Management - 165 = /dev/vmmon VMWare virtual machine monitor + 165 = /dev/vmmon VMware virtual machine monitor 166 = /dev/i2o/ctl I2O configuration manager 167 = /dev/specialix_sxctl Specialix serial control 168 = /dev/tcldrv Technology Concepts serial control diff --git a/Documentation/devicetree/bindings/arm/fsl.txt b/Documentation/devicetree/bindings/arm/fsl.txt index c9848ad0e2e3..54bdddadf1cf 100644 --- a/Documentation/devicetree/bindings/arm/fsl.txt +++ b/Documentation/devicetree/bindings/arm/fsl.txt @@ -21,6 +21,10 @@ i.MX53 Smart Mobile Reference Design Board Required root node properties: - compatible = "fsl,imx53-smd", "fsl,imx53"; -i.MX6 Quad SABRE Automotive Board +i.MX6 Quad Armadillo2 Board Required root node properties: - - compatible = "fsl,imx6q-sabreauto", "fsl,imx6q"; + - compatible = "fsl,imx6q-arm2", "fsl,imx6q"; + +i.MX6 Quad SABRE Lite Board +Required root node properties: + - compatible = "fsl,imx6q-sabrelite", "fsl,imx6q"; diff --git a/Documentation/devicetree/bindings/arm/gic.txt b/Documentation/devicetree/bindings/arm/gic.txt index 52916b4aa1fe..9b4b82a721b6 100644 --- a/Documentation/devicetree/bindings/arm/gic.txt +++ b/Documentation/devicetree/bindings/arm/gic.txt @@ -42,6 +42,10 @@ Optional - interrupts : Interrupt source of the parent interrupt controller. Only present on secondary GICs. +- cpu-offset : per-cpu offset within the distributor and cpu interface + regions, used when the GIC doesn't have banked registers. The offset is + cpu-offset * cpu-nr. + Example: intc: interrupt-controller@fff11000 { diff --git a/Documentation/devicetree/bindings/arm/insignal-boards.txt b/Documentation/devicetree/bindings/arm/insignal-boards.txt new file mode 100644 index 000000000000..524c3dc5d808 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/insignal-boards.txt @@ -0,0 +1,8 @@ +* Insignal's Exynos4210 based Origen evaluation board + +Origen low-cost evaluation board is based on Samsung's Exynos4210 SoC. + +Required root node properties: + - compatible = should be one or more of the following. + (a) "samsung,smdkv310" - for Samsung's SMDKV310 eval board. + (b) "samsung,exynos4210" - for boards based on Exynos4210 SoC. diff --git a/Documentation/devicetree/bindings/arm/samsung-boards.txt b/Documentation/devicetree/bindings/arm/samsung-boards.txt new file mode 100644 index 000000000000..0bf68be56fd1 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/samsung-boards.txt @@ -0,0 +1,8 @@ +* Samsung's Exynos4210 based SMDKV310 evaluation board + +SMDKV310 evaluation board is based on Samsung's Exynos4210 SoC. + +Required root node properties: + - compatible = should be one or more of the following. + (a) "samsung,smdkv310" - for Samsung's SMDKV310 eval board. + (b) "samsung,exynos4210" - for boards based on Exynos4210 SoC. diff --git a/Documentation/devicetree/bindings/arm/tegra.txt b/Documentation/devicetree/bindings/arm/tegra.txt new file mode 100644 index 000000000000..6e69d2e5e766 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/tegra.txt @@ -0,0 +1,14 @@ +NVIDIA Tegra device tree bindings +------------------------------------------- + +Boards with the tegra20 SoC shall have the following properties: + +Required root node property: + +compatible = "nvidia,tegra20"; + +Boards with the tegra30 SoC shall have the following properties: + +Required root node property: + +compatible = "nvidia,tegra30"; diff --git a/Documentation/devicetree/bindings/arm/vic.txt b/Documentation/devicetree/bindings/arm/vic.txt new file mode 100644 index 000000000000..266716b23437 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/vic.txt @@ -0,0 +1,29 @@ +* ARM Vectored Interrupt Controller + +One or more Vectored Interrupt Controllers (VIC's) can be connected in an ARM +system for interrupt routing. For multiple controllers they can either be +nested or have the outputs wire-OR'd together. + +Required properties: + +- compatible : should be one of + "arm,pl190-vic" + "arm,pl192-vic" +- interrupt-controller : Identifies the node as an interrupt controller +- #interrupt-cells : The number of cells to define the interrupts. Must be 1 as + the VIC has no configuration options for interrupt sources. The cell is a u32 + and defines the interrupt number. +- reg : The register bank for the VIC. + +Optional properties: + +- interrupts : Interrupt source for parent controllers if the VIC is nested. + +Example: + + vic0: interrupt-controller@60000 { + compatible = "arm,pl192-vic"; + interrupt-controller; + #interrupt-cells = <1>; + reg = <0x60000 0x1000>; + }; diff --git a/Documentation/devicetree/bindings/c6x/clocks.txt b/Documentation/devicetree/bindings/c6x/clocks.txt new file mode 100644 index 000000000000..a04f5fd30122 --- /dev/null +++ b/Documentation/devicetree/bindings/c6x/clocks.txt @@ -0,0 +1,40 @@ +C6X PLL Clock Controllers +------------------------- + +This is a first-cut support for the SoC clock controllers. This is still +under development and will probably change as the common device tree +clock support is added to the kernel. + +Required properties: + +- compatible: "ti,c64x+pll" + May also have SoC-specific value to support SoC-specific initialization + in the driver. One of: + "ti,c6455-pll" + "ti,c6457-pll" + "ti,c6472-pll" + "ti,c6474-pll" + +- reg: base address and size of register area +- clock-frequency: input clock frequency in hz + + +Optional properties: + +- ti,c64x+pll-bypass-delay: CPU cycles to delay when entering bypass mode + +- ti,c64x+pll-reset-delay: CPU cycles to delay after PLL reset + +- ti,c64x+pll-lock-delay: CPU cycles to delay after PLL frequency change + +Example: + + clock-controller@29a0000 { + compatible = "ti,c6472-pll", "ti,c64x+pll"; + reg = <0x029a0000 0x200>; + clock-frequency = <25000000>; + + ti,c64x+pll-bypass-delay = <200>; + ti,c64x+pll-reset-delay = <12000>; + ti,c64x+pll-lock-delay = <80000>; + }; diff --git a/Documentation/devicetree/bindings/c6x/dscr.txt b/Documentation/devicetree/bindings/c6x/dscr.txt new file mode 100644 index 000000000000..d847758f2b20 --- /dev/null +++ b/Documentation/devicetree/bindings/c6x/dscr.txt @@ -0,0 +1,127 @@ +Device State Configuration Registers +------------------------------------ + +TI C6X SoCs contain a region of miscellaneous registers which provide various +function for SoC control or status. Details vary considerably among from SoC +to SoC with no two being alike. + +In general, the Device State Configuraion Registers (DSCR) will provide one or +more configuration registers often protected by a lock register where one or +more key values must be written to a lock register in order to unlock the +configuration register for writes. These configuration register may be used to +enable (and disable in some cases) SoC pin drivers, select peripheral clock +sources (internal or pin), etc. In some cases, a configuration register is +write once or the individual bits are write once. In addition to device config, +the DSCR block may provide registers which which are used to reset peripherals, +provide device ID information, provide ethernet MAC addresses, as well as other +miscellaneous functions. + +For device state control (enable/disable), each device control is assigned an +id which is used by individual device drivers to control the state as needed. + +Required properties: + +- compatible: must be "ti,c64x+dscr" +- reg: register area base and size + +Optional properties: + + NOTE: These are optional in that not all SoCs will have all properties. For + SoCs which do support a given property, leaving the property out of the + device tree will result in reduced functionality or possibly driver + failure. + +- ti,dscr-devstat + offset of the devstat register + +- ti,dscr-silicon-rev + offset, start bit, and bitsize of silicon revision field + +- ti,dscr-rmii-resets + offset and bitmask of RMII reset field. May have multiple tuples if more + than one ethernet port is available. + +- ti,dscr-locked-regs + possibly multiple tuples describing registers which are write protected by + a lock register. Each tuple consists of the register offset, lock register + offsset, and the key value used to unlock the register. + +- ti,dscr-kick-regs + offset and key values of two "kick" registers used to write protect other + registers in DSCR. On SoCs using kick registers, the first key must be + written to the first kick register and the second key must be written to + the second register before other registers in the area are write-enabled. + +- ti,dscr-mac-fuse-regs + MAC addresses are contained in two registers. Each element of a MAC address + is contained in a single byte. This property has two tuples. Each tuple has + a register offset and four cells representing bytes in the register from + most significant to least. The value of these four cells is the MAC byte + index (1-6) of the byte within the register. A value of 0 means the byte + is unused in the MAC address. + +- ti,dscr-devstate-ctl-regs + This property describes the bitfields used to control the state of devices. + Each tuple describes a range of identical bitfields used to control one or + more devices (one bitfield per device). The layout of each tuple is: + + start_id num_ids reg enable disable start_bit nbits + + Where: + start_id is device id for the first device control in the range + num_ids is the number of device controls in the range + reg is the offset of the register holding the control bits + enable is the value to enable a device + disable is the value to disable a device (0xffffffff if cannot disable) + start_bit is the bit number of the first bit in the range + nbits is the number of bits per device control + +- ti,dscr-devstate-stat-regs + This property describes the bitfields used to provide device state status + for device states controlled by the DSCR. Each tuple describes a range of + identical bitfields used to provide status for one or more devices (one + bitfield per device). The layout of each tuple is: + + start_id num_ids reg enable disable start_bit nbits + + Where: + start_id is device id for the first device status in the range + num_ids is the number of devices covered by the range + reg is the offset of the register holding the status bits + enable is the value indicating device is enabled + disable is the value indicating device is disabled + start_bit is the bit number of the first bit in the range + nbits is the number of bits per device status + +- ti,dscr-privperm + Offset and default value for register used to set access privilege for + some SoC devices. + + +Example: + + device-state-config-regs@2a80000 { + compatible = "ti,c64x+dscr"; + reg = <0x02a80000 0x41000>; + + ti,dscr-devstat = <0>; + ti,dscr-silicon-rev = <8 28 0xf>; + ti,dscr-rmii-resets = <0x40020 0x00040000>; + + ti,dscr-locked-regs = <0x40008 0x40004 0x0f0a0b00>; + ti,dscr-devstate-ctl-regs = + <0 12 0x40008 1 0 0 2 + 12 1 0x40008 3 0 30 2 + 13 2 0x4002c 1 0xffffffff 0 1>; + ti,dscr-devstate-stat-regs = + <0 10 0x40014 1 0 0 3 + 10 2 0x40018 1 0 0 3>; + + ti,dscr-mac-fuse-regs = <0x700 1 2 3 4 + 0x704 5 6 0 0>; + + ti,dscr-privperm = <0x41c 0xaaaaaaaa>; + + ti,dscr-kick-regs = <0x38 0x83E70B13 + 0x3c 0x95A4F1E0>; + }; diff --git a/Documentation/devicetree/bindings/c6x/emifa.txt b/Documentation/devicetree/bindings/c6x/emifa.txt new file mode 100644 index 000000000000..0ff6e9b9a13f --- /dev/null +++ b/Documentation/devicetree/bindings/c6x/emifa.txt @@ -0,0 +1,62 @@ +External Memory Interface +------------------------- + +The emifa node describes a simple external bus controller found on some C6X +SoCs. This interface provides external busses with a number of chip selects. + +Required properties: + +- compatible: must be "ti,c64x+emifa", "simple-bus" +- reg: register area base and size +- #address-cells: must be 2 (chip-select + offset) +- #size-cells: must be 1 +- ranges: mapping from EMIFA space to parent space + + +Optional properties: + +- ti,dscr-dev-enable: Device ID if EMIF is enabled/disabled from DSCR + +- ti,emifa-burst-priority: + Number of memory transfers after which the EMIF will elevate the priority + of the oldest command in the command FIFO. Setting this field to 255 + disables this feature, thereby allowing old commands to stay in the FIFO + indefinitely. + +- ti,emifa-ce-config: + Configuration values for each of the supported chip selects. + +Example: + + emifa@70000000 { + compatible = "ti,c64x+emifa", "simple-bus"; + #address-cells = <2>; + #size-cells = <1>; + reg = <0x70000000 0x100>; + ranges = <0x2 0x0 0xa0000000 0x00000008 + 0x3 0x0 0xb0000000 0x00400000 + 0x4 0x0 0xc0000000 0x10000000 + 0x5 0x0 0xD0000000 0x10000000>; + + ti,dscr-dev-enable = <13>; + ti,emifa-burst-priority = <255>; + ti,emifa-ce-config = <0x00240120 + 0x00240120 + 0x00240122 + 0x00240122>; + + flash@3,0 { + #address-cells = <1>; + #size-cells = <1>; + compatible = "cfi-flash"; + reg = <0x3 0x0 0x400000>; + bank-width = <1>; + device-width = <1>; + partition@0 { + reg = <0x0 0x400000>; + label = "NOR"; + }; + }; + }; + +This shows a flash chip attached to chip select 3. diff --git a/Documentation/devicetree/bindings/c6x/interrupt.txt b/Documentation/devicetree/bindings/c6x/interrupt.txt new file mode 100644 index 000000000000..42bb796cc4ad --- /dev/null +++ b/Documentation/devicetree/bindings/c6x/interrupt.txt @@ -0,0 +1,104 @@ +C6X Interrupt Chips +------------------- + +* C64X+ Core Interrupt Controller + + The core interrupt controller provides 16 prioritized interrupts to the + C64X+ core. Priority 0 and 1 are used for reset and NMI respectively. + Priority 2 and 3 are reserved. Priority 4-15 are used for interrupt + sources coming from outside the core. + + Required properties: + -------------------- + - compatible: Should be "ti,c64x+core-pic"; + - #interrupt-cells: <1> + + Interrupt Specifier Definition + ------------------------------ + Single cell specifying the core interrupt priority level (4-15) where + 4 is highest priority and 15 is lowest priority. + + Example + ------- + core_pic: interrupt-controller@0 { + interrupt-controller; + #interrupt-cells = <1>; + compatible = "ti,c64x+core-pic"; + }; + + + +* C64x+ Megamodule Interrupt Controller + + The megamodule PIC consists of four interrupt mupliplexers each of which + combine up to 32 interrupt inputs into a single interrupt output which + may be cascaded into the core interrupt controller. The megamodule PIC + has a total of 12 outputs cascading into the core interrupt controller. + One for each core interrupt priority level. In addition to the combined + interrupt sources, individual megamodule interrupts may be cascaded to + the core interrupt controller. When an individual interrupt is cascaded, + it is no longer handled through a megamodule interrupt combiner and is + considered to have the core interrupt controller as the parent. + + Required properties: + -------------------- + - compatible: "ti,c64x+megamod-pic" + - interrupt-controller + - #interrupt-cells: <1> + - reg: base address and size of register area + - interrupt-parent: must be core interrupt controller + - interrupts: This should have four cells; one for each interrupt combiner. + The cells contain the core priority interrupt to which the + corresponding combiner output is wired. + + Optional properties: + -------------------- + - ti,c64x+megamod-pic-mux: Array of 12 cells correspnding to the 12 core + priority interrupts. The first cell corresponds to + core priority 4 and the last cell corresponds to + core priority 15. The value of each cell is the + megamodule interrupt source which is MUXed to + the core interrupt corresponding to the cell + position. Allowed values are 4 - 127. Mapping for + interrupts 0 - 3 (combined interrupt sources) are + ignored. + + Interrupt Specifier Definition + ------------------------------ + Single cell specifying the megamodule interrupt source (4-127). Note that + interrupts mapped directly to the core with "ti,c64x+megamod-pic-mux" will + use the core interrupt controller as their parent and the specifier will + be the core priority level, not the megamodule interrupt number. + + Examples + -------- + megamod_pic: interrupt-controller@1800000 { + compatible = "ti,c64x+megamod-pic"; + interrupt-controller; + #interrupt-cells = <1>; + reg = <0x1800000 0x1000>; + interrupt-parent = <&core_pic>; + interrupts = < 12 13 14 15 >; + }; + + This is a minimal example where all individual interrupts go through a + combiner. Combiner-0 is mapped to core interrupt 12, combiner-1 is mapped + to interrupt 13, etc. + + + megamod_pic: interrupt-controller@1800000 { + compatible = "ti,c64x+megamod-pic"; + interrupt-controller; + #interrupt-cells = <1>; + reg = <0x1800000 0x1000>; + interrupt-parent = <&core_pic>; + interrupts = < 12 13 14 15 >; + ti,c64x+megamod-pic-mux = < 0 0 0 0 + 32 0 0 0 + 0 0 0 0 >; + }; + + This the same as the first example except that megamodule interrupt 32 is + mapped directly to core priority interrupt 8. The node using this interrupt + must set the core controller as its interrupt parent and use 8 in the + interrupt specifier value. diff --git a/Documentation/devicetree/bindings/c6x/soc.txt b/Documentation/devicetree/bindings/c6x/soc.txt new file mode 100644 index 000000000000..b1e4973b5769 --- /dev/null +++ b/Documentation/devicetree/bindings/c6x/soc.txt @@ -0,0 +1,28 @@ +C6X System-on-Chip +------------------ + +Required properties: + +- compatible: "simple-bus" +- #address-cells: must be 1 +- #size-cells: must be 1 +- ranges + +Optional properties: + +- model: specific SoC model + +- nodes for IP blocks within SoC + + +Example: + + soc { + compatible = "simple-bus"; + model = "tms320c6455"; + #address-cells = <1>; + #size-cells = <1>; + ranges; + + ... + }; diff --git a/Documentation/devicetree/bindings/c6x/timer64.txt b/Documentation/devicetree/bindings/c6x/timer64.txt new file mode 100644 index 000000000000..95911fe70224 --- /dev/null +++ b/Documentation/devicetree/bindings/c6x/timer64.txt @@ -0,0 +1,26 @@ +Timer64 +------- + +The timer64 node describes C6X event timers. + +Required properties: + +- compatible: must be "ti,c64x+timer64" +- reg: base address and size of register region +- interrupt-parent: interrupt controller +- interrupts: interrupt id + +Optional properties: + +- ti,dscr-dev-enable: Device ID used to enable timer IP through DSCR interface. + +- ti,core-mask: on multi-core SoCs, bitmask of cores allowed to use this timer. + +Example: + timer0: timer@25e0000 { + compatible = "ti,c64x+timer64"; + ti,core-mask = < 0x01 >; + reg = <0x25e0000 0x40>; + interrupt-parent = <&megamod_pic>; + interrupts = < 16 >; + }; diff --git a/Documentation/devicetree/bindings/dma/arm-pl330.txt b/Documentation/devicetree/bindings/dma/arm-pl330.txt new file mode 100644 index 000000000000..a4cd273b2a67 --- /dev/null +++ b/Documentation/devicetree/bindings/dma/arm-pl330.txt @@ -0,0 +1,30 @@ +* ARM PrimeCell PL330 DMA Controller + +The ARM PrimeCell PL330 DMA controller can move blocks of memory contents +between memory and peripherals or memory to memory. + +Required properties: + - compatible: should include both "arm,pl330" and "arm,primecell". + - reg: physical base address of the controller and length of memory mapped + region. + - interrupts: interrupt number to the cpu. + +Example: + + pdma0: pdma@12680000 { + compatible = "arm,pl330", "arm,primecell"; + reg = <0x12680000 0x1000>; + interrupts = <99>; + }; + +Client drivers (device nodes requiring dma transfers from dev-to-mem or +mem-to-dev) should specify the DMA channel numbers using a two-value pair +as shown below. + + [property name] = <[phandle of the dma controller] [dma request id]>; + + where 'dma request id' is the dma request number which is connected + to the client controller. The 'property name' is recommended to be + of the form <name>-dma-channel. + + Example: tx-dma-channel = <&pdma0 12>; diff --git a/Documentation/devicetree/bindings/gpio/gpio-samsung.txt b/Documentation/devicetree/bindings/gpio/gpio-samsung.txt new file mode 100644 index 000000000000..8f50fe5e6c42 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/gpio-samsung.txt @@ -0,0 +1,40 @@ +Samsung Exynos4 GPIO Controller + +Required properties: +- compatible: Compatible property value should be "samsung,exynos4-gpio>". + +- reg: Physical base address of the controller and length of memory mapped + region. + +- #gpio-cells: Should be 4. The syntax of the gpio specifier used by client nodes + should be the following with values derived from the SoC user manual. + <[phandle of the gpio controller node] + [pin number within the gpio controller] + [mux function] + [pull up/down] + [drive strength]> + + Values for gpio specifier: + - Pin number: is a value between 0 to 7. + - Pull Up/Down: 0 - Pull Up/Down Disabled. + 1 - Pull Down Enabled. + 3 - Pull Up Enabled. + - Drive Strength: 0 - 1x, + 1 - 3x, + 2 - 2x, + 3 - 4x + +- gpio-controller: Specifies that the node is a gpio controller. +- #address-cells: should be 1. +- #size-cells: should be 1. + +Example: + + gpa0: gpio-controller@11400000 { + #address-cells = <1>; + #size-cells = <1>; + compatible = "samsung,exynos4-gpio"; + reg = <0x11400000 0x20>; + #gpio-cells = <4>; + gpio-controller; + }; diff --git a/Documentation/devicetree/bindings/input/samsung-keypad.txt b/Documentation/devicetree/bindings/input/samsung-keypad.txt new file mode 100644 index 000000000000..ce3e394c0e64 --- /dev/null +++ b/Documentation/devicetree/bindings/input/samsung-keypad.txt @@ -0,0 +1,88 @@ +* Samsung's Keypad Controller device tree bindings + +Samsung's Keypad controller is used to interface a SoC with a matrix-type +keypad device. The keypad controller supports multiple row and column lines. +A key can be placed at each intersection of a unique row and a unique column. +The keypad controller can sense a key-press and key-release and report the +event using a interrupt to the cpu. + +Required SoC Specific Properties: +- compatible: should be one of the following + - "samsung,s3c6410-keypad": For controllers compatible with s3c6410 keypad + controller. + - "samsung,s5pv210-keypad": For controllers compatible with s5pv210 keypad + controller. + +- reg: physical base address of the controller and length of memory mapped + region. + +- interrupts: The interrupt number to the cpu. + +Required Board Specific Properties: +- samsung,keypad-num-rows: Number of row lines connected to the keypad + controller. + +- samsung,keypad-num-columns: Number of column lines connected to the + keypad controller. + +- row-gpios: List of gpios used as row lines. The gpio specifier for + this property depends on the gpio controller to which these row lines + are connected. + +- col-gpios: List of gpios used as column lines. The gpio specifier for + this property depends on the gpio controller to which these column + lines are connected. + +- Keys represented as child nodes: Each key connected to the keypad + controller is represented as a child node to the keypad controller + device node and should include the following properties. + - keypad,row: the row number to which the key is connected. + - keypad,column: the column number to which the key is connected. + - linux,code: the key-code to be reported when the key is pressed + and released. + +Optional Properties specific to linux: +- linux,keypad-no-autorepeat: do no enable autorepeat feature. +- linux,keypad-wakeup: use any event on keypad as wakeup event. + + +Example: + keypad@100A0000 { + compatible = "samsung,s5pv210-keypad"; + reg = <0x100A0000 0x100>; + interrupts = <173>; + samsung,keypad-num-rows = <2>; + samsung,keypad-num-columns = <8>; + linux,input-no-autorepeat; + linux,input-wakeup; + + row-gpios = <&gpx2 0 3 3 0 + &gpx2 1 3 3 0>; + + col-gpios = <&gpx1 0 3 0 0 + &gpx1 1 3 0 0 + &gpx1 2 3 0 0 + &gpx1 3 3 0 0 + &gpx1 4 3 0 0 + &gpx1 5 3 0 0 + &gpx1 6 3 0 0 + &gpx1 7 3 0 0>; + + key_1 { + keypad,row = <0>; + keypad,column = <3>; + linux,code = <2>; + }; + + key_2 { + keypad,row = <0>; + keypad,column = <4>; + linux,code = <3>; + }; + + key_3 { + keypad,row = <0>; + keypad,column = <5>; + linux,code = <4>; + }; + }; diff --git a/Documentation/devicetree/bindings/input/tegra-kbc.txt b/Documentation/devicetree/bindings/input/tegra-kbc.txt new file mode 100644 index 000000000000..5ecfa99089b4 --- /dev/null +++ b/Documentation/devicetree/bindings/input/tegra-kbc.txt @@ -0,0 +1,18 @@ +* Tegra keyboard controller + +Required properties: +- compatible: "nvidia,tegra20-kbc" + +Optional properties: +- debounce-delay: delay in milliseconds per row scan for debouncing +- repeat-delay: delay in milliseconds before repeat starts +- ghost-filter: enable ghost filtering for this device +- wakeup-source: configure keyboard as a wakeup source for suspend/resume + +Example: + +keyboard: keyboard { + compatible = "nvidia,tegra20-kbc"; + reg = <0x7000e200 0x100>; + ghost-filter; +}; diff --git a/Documentation/devicetree/bindings/mfd/mc13xxx.txt b/Documentation/devicetree/bindings/mfd/mc13xxx.txt new file mode 100644 index 000000000000..19f6af47a792 --- /dev/null +++ b/Documentation/devicetree/bindings/mfd/mc13xxx.txt @@ -0,0 +1,78 @@ +* Freescale MC13783/MC13892 Power Management Integrated Circuit (PMIC) + +Required properties: +- compatible : Should be "fsl,mc13783" or "fsl,mc13892" + +Optional properties: +- fsl,mc13xxx-uses-adc : Indicate the ADC is being used +- fsl,mc13xxx-uses-codec : Indicate the Audio Codec is being used +- fsl,mc13xxx-uses-rtc : Indicate the RTC is being used +- fsl,mc13xxx-uses-touch : Indicate the touchscreen controller is being used + +Sub-nodes: +- regulators : Contain the regulator nodes. The MC13892 regulators are + bound using their names as listed below with their registers and bits + for enabling. + + vcoincell : regulator VCOINCELL (register 13, bit 23) + sw1 : regulator SW1 (register 24, bit 0) + sw2 : regulator SW2 (register 25, bit 0) + sw3 : regulator SW3 (register 26, bit 0) + sw4 : regulator SW4 (register 27, bit 0) + swbst : regulator SWBST (register 29, bit 20) + vgen1 : regulator VGEN1 (register 32, bit 0) + viohi : regulator VIOHI (register 32, bit 3) + vdig : regulator VDIG (register 32, bit 9) + vgen2 : regulator VGEN2 (register 32, bit 12) + vpll : regulator VPLL (register 32, bit 15) + vusb2 : regulator VUSB2 (register 32, bit 18) + vgen3 : regulator VGEN3 (register 33, bit 0) + vcam : regulator VCAM (register 33, bit 6) + vvideo : regulator VVIDEO (register 33, bit 12) + vaudio : regulator VAUDIO (register 33, bit 15) + vsd : regulator VSD (register 33, bit 18) + gpo1 : regulator GPO1 (register 34, bit 6) + gpo2 : regulator GPO2 (register 34, bit 8) + gpo3 : regulator GPO3 (register 34, bit 10) + gpo4 : regulator GPO4 (register 34, bit 12) + pwgt1spi : regulator PWGT1SPI (register 34, bit 15) + pwgt2spi : regulator PWGT2SPI (register 34, bit 16) + vusb : regulator VUSB (register 50, bit 3) + + The bindings details of individual regulator device can be found in: + Documentation/devicetree/bindings/regulator/regulator.txt + +Examples: + +ecspi@70010000 { /* ECSPI1 */ + fsl,spi-num-chipselects = <2>; + cs-gpios = <&gpio3 24 0>, /* GPIO4_24 */ + <&gpio3 25 0>; /* GPIO4_25 */ + status = "okay"; + + pmic: mc13892@0 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "fsl,mc13892"; + spi-max-frequency = <6000000>; + reg = <0>; + interrupt-parent = <&gpio0>; + interrupts = <8>; + + regulators { + sw1_reg: mc13892__sw1 { + regulator-min-microvolt = <600000>; + regulator-max-microvolt = <1375000>; + regulator-boot-on; + regulator-always-on; + }; + + sw2_reg: mc13892__sw2 { + regulator-min-microvolt = <900000>; + regulator-max-microvolt = <1850000>; + regulator-boot-on; + regulator-always-on; + }; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/mfd/twl-familly.txt b/Documentation/devicetree/bindings/mfd/twl-familly.txt new file mode 100644 index 000000000000..a66fcf946759 --- /dev/null +++ b/Documentation/devicetree/bindings/mfd/twl-familly.txt @@ -0,0 +1,47 @@ +Texas Instruments TWL family + +The TWLs are Integrated Power Management Chips. +Some version might contain much more analog function like +USB transceiver or Audio amplifier. +These chips are connected to an i2c bus. + + +Required properties: +- compatible : Must be "ti,twl4030"; + For Integrated power-management/audio CODEC device used in OMAP3 + based boards +- compatible : Must be "ti,twl6030"; + For Integrated power-management used in OMAP4 based boards +- interrupts : This i2c device has an IRQ line connected to the main SoC +- interrupt-controller : Since the twl support several interrupts internally, + it is considered as an interrupt controller cascaded to the SoC one. +- #interrupt-cells = <1>; +- interrupt-parent : The parent interrupt controller. + +Optional node: +- Child nodes contain in the twl. The twl family is made of several variants + that support a different number of features. + The children nodes will thus depend of the capability of the variant. + + +Example: +/* + * Integrated Power Management Chip + * http://www.ti.com/lit/ds/symlink/twl6030.pdf + */ +twl@48 { + compatible = "ti,twl6030"; + reg = <0x48>; + interrupts = <39>; /* IRQ_SYS_1N cascaded to gic */ + interrupt-controller; + #interrupt-cells = <1>; + interrupt-parent = <&gic>; + #address-cells = <1>; + #size-cells = <0>; + + twl_rtc { + compatible = "ti,twl_rtc"; + interrupts = <11>; + reg = <0>; + }; +}; diff --git a/Documentation/devicetree/bindings/mtd/gpio-control-nand.txt b/Documentation/devicetree/bindings/mtd/gpio-control-nand.txt new file mode 100644 index 000000000000..719f4dc58df7 --- /dev/null +++ b/Documentation/devicetree/bindings/mtd/gpio-control-nand.txt @@ -0,0 +1,44 @@ +GPIO assisted NAND flash + +The GPIO assisted NAND flash uses a memory mapped interface to +read/write the NAND commands and data and GPIO pins for the control +signals. + +Required properties: +- compatible : "gpio-control-nand" +- reg : should specify localbus chip select and size used for the chip. The + resource describes the data bus connected to the NAND flash and all accesses + are made in native endianness. +- #address-cells, #size-cells : Must be present if the device has sub-nodes + representing partitions. +- gpios : specifies the gpio pins to control the NAND device. nwp is an + optional gpio and may be set to 0 if not present. + +Optional properties: +- bank-width : Width (in bytes) of the device. If not present, the width + defaults to 1 byte. +- chip-delay : chip dependent delay for transferring data from array to + read registers (tR). If not present then a default of 20us is used. +- gpio-control-nand,io-sync-reg : A 64-bit physical address for a read + location used to guard against bus reordering with regards to accesses to + the GPIO's and the NAND flash data bus. If present, then after changing + GPIO state and before and after command byte writes, this register will be + read to ensure that the GPIO accesses have completed. + +Examples: + +gpio-nand@1,0 { + compatible = "gpio-control-nand"; + reg = <1 0x0000 0x2>; + #address-cells = <1>; + #size-cells = <1>; + gpios = <&banka 1 0 /* rdy */ + &banka 2 0 /* nce */ + &banka 3 0 /* ale */ + &banka 4 0 /* cle */ + 0 /* nwp */>; + + partition@0 { + ... + }; +}; diff --git a/Documentation/devicetree/bindings/net/calxeda-xgmac.txt b/Documentation/devicetree/bindings/net/calxeda-xgmac.txt new file mode 100644 index 000000000000..411727a3f82d --- /dev/null +++ b/Documentation/devicetree/bindings/net/calxeda-xgmac.txt @@ -0,0 +1,15 @@ +* Calxeda Highbank 10Gb XGMAC Ethernet + +Required properties: +- compatible : Should be "calxeda,hb-xgmac" +- reg : Address and length of the register set for the device +- interrupts : Should contain 3 xgmac interrupts. The 1st is main interrupt. + The 2nd is pwr mgt interrupt. The 3rd is low power state interrupt. + +Example: + +ethernet@fff50000 { + compatible = "calxeda,hb-xgmac"; + reg = <0xfff50000 0x1000>; + interrupts = <0 77 4 0 78 4 0 79 4>; +}; diff --git a/Documentation/devicetree/bindings/net/can/cc770.txt b/Documentation/devicetree/bindings/net/can/cc770.txt new file mode 100644 index 000000000000..77027bf6460a --- /dev/null +++ b/Documentation/devicetree/bindings/net/can/cc770.txt @@ -0,0 +1,53 @@ +Memory mapped Bosch CC770 and Intel AN82527 CAN controller + +Note: The CC770 is a CAN controller from Bosch, which is 100% +compatible with the old AN82527 from Intel, but with "bugs" being fixed. + +Required properties: + +- compatible : should be "bosch,cc770" for the CC770 and "intc,82527" + for the AN82527. + +- reg : should specify the chip select, address offset and size required + to map the registers of the controller. The size is usually 0x80. + +- interrupts : property with a value describing the interrupt source + (number and sensitivity) required for the controller. + +Optional properties: + +- bosch,external-clock-frequency : frequency of the external oscillator + clock in Hz. Note that the internal clock frequency used by the + controller is half of that value. If not specified, a default + value of 16000000 (16 MHz) is used. + +- bosch,clock-out-frequency : slock frequency in Hz on the CLKOUT pin. + If not specified or if the specified value is 0, the CLKOUT pin + will be disabled. + +- bosch,slew-rate : slew rate of the CLKOUT signal. If not specified, + a resonable value will be calculated. + +- bosch,disconnect-rx0-input : see data sheet. + +- bosch,disconnect-rx1-input : see data sheet. + +- bosch,disconnect-tx1-output : see data sheet. + +- bosch,polarity-dominant : see data sheet. + +- bosch,divide-memory-clock : see data sheet. + +- bosch,iso-low-speed-mux : see data sheet. + +For further information, please have a look to the CC770 or AN82527. + +Examples: + +can@3,100 { + compatible = "bosch,cc770"; + reg = <3 0x100 0x80>; + interrupts = <2 0>; + interrupt-parent = <&mpic>; + bosch,external-clock-frequency = <16000000>; +}; diff --git a/Documentation/devicetree/bindings/net/macb.txt b/Documentation/devicetree/bindings/net/macb.txt new file mode 100644 index 000000000000..44afa0e5057d --- /dev/null +++ b/Documentation/devicetree/bindings/net/macb.txt @@ -0,0 +1,25 @@ +* Cadence MACB/GEM Ethernet controller + +Required properties: +- compatible: Should be "cdns,[<chip>-]{macb|gem}" + Use "cdns,at91sam9260-macb" Atmel at91sam9260 and at91sam9263 SoCs. + Use "cdns,at32ap7000-macb" for other 10/100 usage or use the generic form: "cdns,macb". + Use "cnds,pc302-gem" for Picochip picoXcell pc302 and later devices based on + the Cadence GEM, or the generic form: "cdns,gem". +- reg: Address and length of the register set for the device +- interrupts: Should contain macb interrupt +- phy-mode: String, operation mode of the PHY interface. + Supported values are: "mii", "rmii", "gmii", "rgmii". + +Optional properties: +- local-mac-address: 6 bytes, mac address + +Examples: + + macb0: ethernet@fffc4000 { + compatible = "cdns,at32ap7000-macb"; + reg = <0xfffc4000 0x4000>; + interrupts = <21>; + phy-mode = "rmii"; + local-mac-address = [3a 0e 03 04 05 06]; + }; diff --git a/Documentation/devicetree/bindings/nvec/nvec_nvidia.txt b/Documentation/devicetree/bindings/nvec/nvec_nvidia.txt new file mode 100644 index 000000000000..5aeee53ff9f4 --- /dev/null +++ b/Documentation/devicetree/bindings/nvec/nvec_nvidia.txt @@ -0,0 +1,9 @@ +NVIDIA compliant embedded controller + +Required properties: +- compatible : should be "nvidia,nvec". +- reg : the iomem of the i2c slave controller +- interrupts : the interrupt line of the i2c slave controller +- clock-frequency : the frequency of the i2c bus +- gpios : the gpio used for ec request +- slave-addr: the i2c address of the slave controller diff --git a/Documentation/devicetree/bindings/power_supply/olpc_battery.txt b/Documentation/devicetree/bindings/power_supply/olpc_battery.txt new file mode 100644 index 000000000000..c8901b3992d9 --- /dev/null +++ b/Documentation/devicetree/bindings/power_supply/olpc_battery.txt @@ -0,0 +1,5 @@ +OLPC battery +~~~~~~~~~~~~ + +Required properties: + - compatible : "olpc,xo1-battery" diff --git a/Documentation/devicetree/bindings/power_supply/sbs_sbs-battery.txt b/Documentation/devicetree/bindings/power_supply/sbs_sbs-battery.txt new file mode 100644 index 000000000000..c40e8926facf --- /dev/null +++ b/Documentation/devicetree/bindings/power_supply/sbs_sbs-battery.txt @@ -0,0 +1,23 @@ +SBS sbs-battery +~~~~~~~~~~ + +Required properties : + - compatible : "sbs,sbs-battery" + +Optional properties : + - sbs,i2c-retry-count : The number of times to retry i2c transactions on i2c + IO failure. + - sbs,poll-retry-count : The number of times to try looking for new status + after an external change notification. + - sbs,battery-detect-gpios : The gpio which signals battery detection and + a flag specifying its polarity. + +Example: + + bq20z75@b { + compatible = "sbs,sbs-battery"; + reg = < 0xb >; + sbs,i2c-retry-count = <2>; + sbs,poll-retry-count = <10>; + sbs,battery-detect-gpios = <&gpio-controller 122 1>; + } diff --git a/Documentation/devicetree/bindings/powerpc/fsl/srio-rmu.txt b/Documentation/devicetree/bindings/powerpc/fsl/srio-rmu.txt new file mode 100644 index 000000000000..b9a8a2bcfae7 --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/fsl/srio-rmu.txt @@ -0,0 +1,163 @@ +Message unit node: + +For SRIO controllers that implement the message unit as part of the controller +this node is required. For devices with RMAN this node should NOT exist. The +node is composed of three types of sub-nodes ("fsl-srio-msg-unit", +"fsl-srio-dbell-unit" and "fsl-srio-port-write-unit"). + +See srio.txt for more details about generic SRIO controller details. + + - compatible + Usage: required + Value type: <string> + Definition: Must include "fsl,srio-rmu-vX.Y", "fsl,srio-rmu". + + The version X.Y should match the general SRIO controller's IP Block + revision register's Major(X) and Minor (Y) value. + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers for message units + and doorbell units. + + - fsl,liodn + Usage: optional-but-recommended (for devices with PAMU) + Value type: <prop-encoded-array> + Definition: The logical I/O device number for the PAMU (IOMMU) to be + correctly configured for SRIO accesses. The property should + not exist on devices that do not support PAMU. + + The LIODN value is associated with all RMU transactions + (msg-unit, doorbell, port-write). + +Sub-Nodes for RMU: The RMU node is composed of multiple sub-nodes that +correspond to the actual sub-controllers in the RMU. The manual for a given +SoC will detail which and how many of these sub-controllers are implemented. + +Message Unit: + + - compatible + Usage: required + Value type: <string> + Definition: Must include "fsl,srio-msg-unit-vX.Y", "fsl,srio-msg-unit". + + The version X.Y should match the general SRIO controller's IP Block + revision register's Major(X) and Minor (Y) value. + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers for message units + and doorbell units. + + - interrupts + Usage: required + Value type: <prop_encoded-array> + Definition: Specifies the interrupts generated by this device. The + value of the interrupts property consists of one interrupt + specifier. The format of the specifier is defined by the + binding document describing the node's interrupt parent. + + A pair of IRQs are specified in this property. The first + element is associated with the transmit (TX) interrupt and the + second element is associated with the receive (RX) interrupt. + +Doorbell Unit: + + - compatible + Usage: required + Value type: <string> + Definition: Must include: + "fsl,srio-dbell-unit-vX.Y", "fsl,srio-dbell-unit" + + The version X.Y should match the general SRIO controller's IP Block + revision register's Major(X) and Minor (Y) value. + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers for message units + and doorbell units. + + - interrupts + Usage: required + Value type: <prop_encoded-array> + Definition: Specifies the interrupts generated by this device. The + value of the interrupts property consists of one interrupt + specifier. The format of the specifier is defined by the + binding document describing the node's interrupt parent. + + A pair of IRQs are specified in this property. The first + element is associated with the transmit (TX) interrupt and the + second element is associated with the receive (RX) interrupt. + +Port-Write Unit: + + - compatible + Usage: required + Value type: <string> + Definition: Must include: + "fsl,srio-port-write-unit-vX.Y", "fsl,srio-port-write-unit" + + The version X.Y should match the general SRIO controller's IP Block + revision register's Major(X) and Minor (Y) value. + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers for message units + and doorbell units. + + - interrupts + Usage: required + Value type: <prop_encoded-array> + Definition: Specifies the interrupts generated by this device. The + value of the interrupts property consists of one interrupt + specifier. The format of the specifier is defined by the + binding document describing the node's interrupt parent. + + A single IRQ that handles port-write conditions is + specified by this property. (Typically shared with error). + + Note: All other standard properties (see the ePAPR) are allowed + but are optional. + +Example: + rmu: rmu@d3000 { + compatible = "fsl,srio-rmu"; + reg = <0xd3000 0x400>; + ranges = <0x0 0xd3000 0x400>; + fsl,liodn = <0xc8>; + + message-unit@0 { + compatible = "fsl,srio-msg-unit"; + reg = <0x0 0x100>; + interrupts = < + 60 2 0 0 /* msg1_tx_irq */ + 61 2 0 0>;/* msg1_rx_irq */ + }; + message-unit@100 { + compatible = "fsl,srio-msg-unit"; + reg = <0x100 0x100>; + interrupts = < + 62 2 0 0 /* msg2_tx_irq */ + 63 2 0 0>;/* msg2_rx_irq */ + }; + doorbell-unit@400 { + compatible = "fsl,srio-dbell-unit"; + reg = <0x400 0x80>; + interrupts = < + 56 2 0 0 /* bell_outb_irq */ + 57 2 0 0>;/* bell_inb_irq */ + }; + port-write-unit@4e0 { + compatible = "fsl,srio-port-write-unit"; + reg = <0x4e0 0x20>; + interrupts = <16 2 1 11>; + }; + }; diff --git a/Documentation/devicetree/bindings/powerpc/fsl/srio.txt b/Documentation/devicetree/bindings/powerpc/fsl/srio.txt new file mode 100644 index 000000000000..b039bcbee134 --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/fsl/srio.txt @@ -0,0 +1,103 @@ +* Freescale Serial RapidIO (SRIO) Controller + +RapidIO port node: +Properties: + - compatible + Usage: required + Value type: <string> + Definition: Must include "fsl,srio" for IP blocks with IP Block + Revision Register (SRIO IPBRR1) Major ID equal to 0x01c0. + + Optionally, a compatiable string of "fsl,srio-vX.Y" where X is Major + version in IP Block Revision Register and Y is Minor version. If this + compatiable is provided it should be ordered before "fsl,srio". + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers. The size should + be set to 0x11000. + + - interrupts + Usage: required + Value type: <prop_encoded-array> + Definition: Specifies the interrupts generated by this device. The + value of the interrupts property consists of one interrupt + specifier. The format of the specifier is defined by the + binding document describing the node's interrupt parent. + + A single IRQ that handles error conditions is specified by this + property. (Typically shared with port-write). + + - fsl,srio-rmu-handle: + Usage: required if rmu node is defined + Value type: <phandle> + Definition: A single <phandle> value that points to the RMU. + (See srio-rmu.txt for more details on RMU node binding) + +Port Child Nodes: There should a port child node for each port that exists in +the controller. The ports are numbered starting at one (1) and should have +the following properties: + + - cell-index + Usage: required + Value type: <u32> + Definition: A standard property. Matches the port id. + + - ranges + Usage: required if local access windows preset + Value type: <prop-encoded-array> + Definition: A standard property. Utilized to describe the memory mapped + IO space utilized by the controller. This corresponds to the + setting of the local access windows that are targeted to this + SRIO port. + + - fsl,liodn + Usage: optional-but-recommended (for devices with PAMU) + Value type: <prop-encoded-array> + Definition: The logical I/O device number for the PAMU (IOMMU) to be + correctly configured for SRIO accesses. The property should + not exist on devices that do not support PAMU. + + For HW (ie, the P4080) that only supports a LIODN for both + memory and maintenance transactions then a single LIODN is + represented in the property for both transactions. + + For HW (ie, the P304x/P5020, etc) that supports an LIODN for + memory transactions and a unique LIODN for maintenance + transactions then a pair of LIODNs are represented in the + property. Within the pair, the first element represents the + LIODN associated with memory transactions and the second element + represents the LIODN associated with maintenance transactions + for the port. + +Note: All other standard properties (see ePAPR) are allowed but are optional. + +Example: + + rapidio: rapidio@ffe0c0000 { + #address-cells = <2>; + #size-cells = <2>; + reg = <0xf 0xfe0c0000 0 0x11000>; + compatible = "fsl,srio"; + interrupts = <16 2 1 11>; /* err_irq */ + fsl,srio-rmu-handle = <&rmu>; + ranges; + + port1 { + cell-index = <1>; + #address-cells = <2>; + #size-cells = <2>; + fsl,liodn = <34>; + ranges = <0 0 0xc 0x20000000 0 0x10000000>; + }; + + port2 { + cell-index = <2>; + #address-cells = <2>; + #size-cells = <2>; + fsl,liodn = <48>; + ranges = <0 0 0xc 0x30000000 0 0x10000000>; + }; + }; diff --git a/Documentation/devicetree/bindings/regulator/fixed-regulator.txt b/Documentation/devicetree/bindings/regulator/fixed-regulator.txt new file mode 100644 index 000000000000..9cf57fd042d2 --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/fixed-regulator.txt @@ -0,0 +1,29 @@ +Fixed Voltage regulators + +Required properties: +- compatible: Must be "regulator-fixed"; + +Optional properties: +- gpio: gpio to use for enable control +- startup-delay-us: startup time in microseconds +- enable-active-high: Polarity of GPIO is Active high +If this property is missing, the default assumed is Active low. + +Any property defined as part of the core regulator +binding, defined in regulator.txt, can also be used. +However a fixed voltage regulator is expected to have the +regulator-min-microvolt and regulator-max-microvolt +to be the same. + +Example: + + abc: fixedregulator@0 { + compatible = "regulator-fixed"; + regulator-name = "fixed-supply"; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + gpio = <&gpio1 16 0>; + startup-delay-us = <70000>; + enable-active-high; + regulator-boot-on + }; diff --git a/Documentation/devicetree/bindings/regulator/regulator.txt b/Documentation/devicetree/bindings/regulator/regulator.txt new file mode 100644 index 000000000000..5b7a408acdaa --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/regulator.txt @@ -0,0 +1,54 @@ +Voltage/Current Regulators + +Optional properties: +- regulator-name: A string used as a descriptive name for regulator outputs +- regulator-min-microvolt: smallest voltage consumers may set +- regulator-max-microvolt: largest voltage consumers may set +- regulator-microvolt-offset: Offset applied to voltages to compensate for voltage drops +- regulator-min-microamp: smallest current consumers may set +- regulator-max-microamp: largest current consumers may set +- regulator-always-on: boolean, regulator should never be disabled +- regulator-boot-on: bootloader/firmware enabled regulator +- <name>-supply: phandle to the parent supply/regulator node + +Example: + + xyzreg: regulator@0 { + regulator-min-microvolt = <1000000>; + regulator-max-microvolt = <2500000>; + regulator-always-on; + vin-supply = <&vin>; + }; + +Regulator Consumers: +Consumer nodes can reference one or more of its supplies/ +regulators using the below bindings. + +- <name>-supply: phandle to the regulator node + +These are the same bindings that a regulator in the above +example used to reference its own supply, in which case +its just seen as a special case of a regulator being a +consumer itself. + +Example of a consumer device node (mmc) referencing two +regulators (twl_reg1 and twl_reg2), + + twl_reg1: regulator@0 { + ... + ... + ... + }; + + twl_reg2: regulator@1 { + ... + ... + ... + }; + + mmc: mmc@0x0 { + ... + ... + vmmc-supply = <&twl_reg1>; + vmmcaux-supply = <&twl_reg2>; + }; diff --git a/Documentation/devicetree/bindings/rtc/s3c-rtc.txt b/Documentation/devicetree/bindings/rtc/s3c-rtc.txt new file mode 100644 index 000000000000..90ec45fd33ec --- /dev/null +++ b/Documentation/devicetree/bindings/rtc/s3c-rtc.txt @@ -0,0 +1,20 @@ +* Samsung's S3C Real Time Clock controller + +Required properties: +- compatible: should be one of the following. + * "samsung,s3c2410-rtc" - for controllers compatible with s3c2410 rtc. + * "samsung,s3c6410-rtc" - for controllers compatible with s3c6410 rtc. +- reg: physical base address of the controller and length of memory mapped + region. +- interrupts: Two interrupt numbers to the cpu should be specified. First + interrupt number is the rtc alarm interupt and second interrupt number + is the rtc tick interrupt. The number of cells representing a interrupt + depends on the parent interrupt controller. + +Example: + + rtc@10070000 { + compatible = "samsung,s3c6410-rtc"; + reg = <0x10070000 0x100>; + interrupts = <44 0 45 0>; + }; diff --git a/Documentation/devicetree/bindings/rtc/twl-rtc.txt b/Documentation/devicetree/bindings/rtc/twl-rtc.txt new file mode 100644 index 000000000000..596e0c97be7a --- /dev/null +++ b/Documentation/devicetree/bindings/rtc/twl-rtc.txt @@ -0,0 +1,12 @@ +* TI twl RTC + +The TWL family (twl4030/6030) contains a RTC. + +Required properties: +- compatible : Should be twl4030-rtc + +Examples: + +rtc@0 { + compatible = "ti,twl4030-rtc"; +}; diff --git a/Documentation/devicetree/bindings/serial/omap_serial.txt b/Documentation/devicetree/bindings/serial/omap_serial.txt new file mode 100644 index 000000000000..342eedd10050 --- /dev/null +++ b/Documentation/devicetree/bindings/serial/omap_serial.txt @@ -0,0 +1,10 @@ +OMAP UART controller + +Required properties: +- compatible : should be "ti,omap2-uart" for OMAP2 controllers +- compatible : should be "ti,omap3-uart" for OMAP3 controllers +- compatible : should be "ti,omap4-uart" for OMAP4 controllers +- ti,hwmods : Must be "uart<n>", n being the instance number (1-based) + +Optional properties: +- clock-frequency : frequency of the clock input to the UART diff --git a/Documentation/devicetree/bindings/serial/samsung_uart.txt b/Documentation/devicetree/bindings/serial/samsung_uart.txt new file mode 100644 index 000000000000..2c8a17cf5cb5 --- /dev/null +++ b/Documentation/devicetree/bindings/serial/samsung_uart.txt @@ -0,0 +1,14 @@ +* Samsung's UART Controller + +The Samsung's UART controller is used for interfacing SoC with serial communicaion +devices. + +Required properties: +- compatible: should be + - "samsung,exynos4210-uart", for UART's compatible with Exynos4210 uart ports. + +- reg: base physical address of the controller and length of memory mapped + region. + +- interrupts: interrupt number to the cpu. The interrupt specifier format depends + on the interrupt controller parent. diff --git a/Documentation/devicetree/bindings/sound/tegra-audio-wm8903.txt b/Documentation/devicetree/bindings/sound/tegra-audio-wm8903.txt new file mode 100644 index 000000000000..d5b0da8bf1d8 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/tegra-audio-wm8903.txt @@ -0,0 +1,71 @@ +NVIDIA Tegra audio complex + +Required properties: +- compatible : "nvidia,tegra-audio-wm8903" +- nvidia,model : The user-visible name of this sound complex. +- nvidia,audio-routing : A list of the connections between audio components. + Each entry is a pair of strings, the first being the connection's sink, + the second being the connection's source. Valid names for sources and + sinks are the WM8903's pins, and the jacks on the board: + + WM8903 pins: + + * IN1L + * IN1R + * IN2L + * IN2R + * IN3L + * IN3R + * DMICDAT + * HPOUTL + * HPOUTR + * LINEOUTL + * LINEOUTR + * LOP + * LON + * ROP + * RON + * MICBIAS + + Board connectors: + + * Headphone Jack + * Int Spk + * Mic Jack + +- nvidia,i2s-controller : The phandle of the Tegra I2S1 controller +- nvidia,audio-codec : The phandle of the WM8903 audio codec + +Optional properties: +- nvidia,spkr-en-gpios : The GPIO that enables the speakers +- nvidia,hp-mute-gpios : The GPIO that mutes the headphones +- nvidia,hp-det-gpios : The GPIO that detect headphones are plugged in +- nvidia,int-mic-en-gpios : The GPIO that enables the internal microphone +- nvidia,ext-mic-en-gpios : The GPIO that enables the external microphone + +Example: + +sound { + compatible = "nvidia,tegra-audio-wm8903-harmony", + "nvidia,tegra-audio-wm8903" + nvidia,model = "tegra-wm8903-harmony"; + + nvidia,audio-routing = + "Headphone Jack", "HPOUTR", + "Headphone Jack", "HPOUTL", + "Int Spk", "ROP", + "Int Spk", "RON", + "Int Spk", "LOP", + "Int Spk", "LON", + "Mic Jack", "MICBIAS", + "IN1L", "Mic Jack"; + + nvidia,i2s-controller = <&i2s1>; + nvidia,audio-codec = <&wm8903>; + + nvidia,spkr-en-gpios = <&codec 2 0>; + nvidia,hp-det-gpios = <&gpio 178 0>; /* gpio PW2 */ + nvidia,int-mic-en-gpios = <&gpio 184 0>; /*gpio PX0 */ + nvidia,ext-mic-en-gpios = <&gpio 185 0>; /* gpio PX1 */ +}; + diff --git a/Documentation/devicetree/bindings/sound/tegra20-das.txt b/Documentation/devicetree/bindings/sound/tegra20-das.txt new file mode 100644 index 000000000000..6de3a7ee4efb --- /dev/null +++ b/Documentation/devicetree/bindings/sound/tegra20-das.txt @@ -0,0 +1,12 @@ +NVIDIA Tegra 20 DAS (Digital Audio Switch) controller + +Required properties: +- compatible : "nvidia,tegra20-das" +- reg : Should contain DAS registers location and length + +Example: + +das@70000c00 { + compatible = "nvidia,tegra20-das"; + reg = <0x70000c00 0x80>; +}; diff --git a/Documentation/devicetree/bindings/sound/tegra20-i2s.txt b/Documentation/devicetree/bindings/sound/tegra20-i2s.txt new file mode 100644 index 000000000000..0df2b5c816e3 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/tegra20-i2s.txt @@ -0,0 +1,17 @@ +NVIDIA Tegra 20 I2S controller + +Required properties: +- compatible : "nvidia,tegra20-i2s" +- reg : Should contain I2S registers location and length +- interrupts : Should contain I2S interrupt +- nvidia,dma-request-selector : The Tegra DMA controller's phandle and + request selector for this I2S controller + +Example: + +i2s@70002800 { + compatible = "nvidia,tegra20-i2s"; + reg = <0x70002800 0x200>; + interrupts = < 45 >; + nvidia,dma-request-selector = < &apbdma 2 >; +}; diff --git a/Documentation/devicetree/bindings/sound/wm8903.txt b/Documentation/devicetree/bindings/sound/wm8903.txt new file mode 100644 index 000000000000..f102cbc42694 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/wm8903.txt @@ -0,0 +1,50 @@ +WM8903 audio CODEC + +This device supports I2C only. + +Required properties: + + - compatible : "wlf,wm8903" + + - reg : the I2C address of the device. + + - gpio-controller : Indicates this device is a GPIO controller. + + - #gpio-cells : Should be two. The first cell is the pin number and the + second cell is used to specify optional parameters (currently unused). + +Optional properties: + + - interrupts : The interrupt line the codec is connected to. + + - micdet-cfg : Default register value for R6 (Mic Bias). If absent, the + default is 0. + + - micdet-delay : The debounce delay for microphone detection in mS. If + absent, the default is 100. + + - gpio-cfg : A list of GPIO configuration register values. The list must + be 5 entries long. If absent, no configuration of these registers is + performed. If any entry has the value 0xffffffff, that GPIO's + configuration will not be modified. + +Example: + +codec: wm8903@1a { + compatible = "wlf,wm8903"; + reg = <0x1a>; + interrupts = < 347 >; + + gpio-controller; + #gpio-cells = <2>; + + micdet-cfg = <0>; + micdet-delay = <100>; + gpio-cfg = < + 0x0600 /* DMIC_LR, output */ + 0x0680 /* DMIC_DAT, input */ + 0x0000 /* GPIO, output, low */ + 0x0200 /* Interrupt, output */ + 0x01a0 /* BCLK, input, active high */ + >; +}; diff --git a/Documentation/devicetree/bindings/sound/wm8994.txt b/Documentation/devicetree/bindings/sound/wm8994.txt new file mode 100644 index 000000000000..7a7eb1e7bda6 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/wm8994.txt @@ -0,0 +1,18 @@ +WM1811/WM8994/WM8958 audio CODEC + +These devices support both I2C and SPI (configured with pin strapping +on the board). + +Required properties: + + - compatible : "wlf,wm1811", "wlf,wm8994", "wlf,wm8958" + + - reg : the I2C address of the device for I2C, the chip select + number for SPI. + +Example: + +codec: wm8994@1a { + compatible = "wlf,wm8994"; + reg = <0x1a>; +}; diff --git a/Documentation/devicetree/bindings/usb/tegra-usb.txt b/Documentation/devicetree/bindings/usb/tegra-usb.txt new file mode 100644 index 000000000000..035d63d5646d --- /dev/null +++ b/Documentation/devicetree/bindings/usb/tegra-usb.txt @@ -0,0 +1,13 @@ +Tegra SOC USB controllers + +The device node for a USB controller that is part of a Tegra +SOC is as described in the document "Open Firmware Recommended +Practice : Universal Serial Bus" with the following modifications +and additions : + +Required properties : + - compatible : Should be "nvidia,tegra20-ehci" for USB controllers + used in host mode. + - phy_type : Should be one of "ulpi" or "utmi". + - nvidia,vbus-gpio : If present, specifies a gpio that needs to be + activated for the bus to be powered. diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index 18626965159e..ecc6a6cd26c1 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -34,6 +34,7 @@ powervr Imagination Technologies qcom Qualcomm, Inc. ramtron Ramtron International samsung Samsung Semiconductor +sbs Smart Battery System schindler Schindler sil Silicon Image simtek @@ -41,4 +42,5 @@ sirf SiRF Technology, Inc. st STMicroelectronics stericsson ST-Ericsson ti Texas Instruments +wlf Wolfson Microelectronics xlnx Xilinx diff --git a/Documentation/digsig.txt b/Documentation/digsig.txt new file mode 100644 index 000000000000..3f682889068b --- /dev/null +++ b/Documentation/digsig.txt @@ -0,0 +1,96 @@ +Digital Signature Verification API + +CONTENTS + +1. Introduction +2. API +3. User-space utilities + + +1. Introduction + +Digital signature verification API provides a method to verify digital signature. +Currently digital signatures are used by the IMA/EVM integrity protection subsystem. + +Digital signature verification is implemented using cut-down kernel port of +GnuPG multi-precision integers (MPI) library. The kernel port provides +memory allocation errors handling, has been refactored according to kernel +coding style, and checkpatch.pl reported errors and warnings have been fixed. + +Public key and signature consist of header and MPIs. + +struct pubkey_hdr { + uint8_t version; /* key format version */ + time_t timestamp; /* key made, always 0 for now */ + uint8_t algo; + uint8_t nmpi; + char mpi[0]; +} __packed; + +struct signature_hdr { + uint8_t version; /* signature format version */ + time_t timestamp; /* signature made */ + uint8_t algo; + uint8_t hash; + uint8_t keyid[8]; + uint8_t nmpi; + char mpi[0]; +} __packed; + +keyid equals to SHA1[12-19] over the total key content. +Signature header is used as an input to generate a signature. +Such approach insures that key or signature header could not be changed. +It protects timestamp from been changed and can be used for rollback +protection. + +2. API + +API currently includes only 1 function: + + digsig_verify() - digital signature verification with public key + + +/** + * digsig_verify() - digital signature verification with public key + * @keyring: keyring to search key in + * @sig: digital signature + * @sigen: length of the signature + * @data: data + * @datalen: length of the data + * @return: 0 on success, -EINVAL otherwise + * + * Verifies data integrity against digital signature. + * Currently only RSA is supported. + * Normally hash of the content is used as a data for this function. + * + */ +int digsig_verify(struct key *keyring, const char *sig, int siglen, + const char *data, int datalen); + +3. User-space utilities + +The signing and key management utilities evm-utils provide functionality +to generate signatures, to load keys into the kernel keyring. +Keys can be in PEM or converted to the kernel format. +When the key is added to the kernel keyring, the keyid defines the name +of the key: 5D2B05FC633EE3E8 in the example bellow. + +Here is example output of the keyctl utility. + +$ keyctl show +Session Keyring + -3 --alswrv 0 0 keyring: _ses +603976250 --alswrv 0 -1 \_ keyring: _uid.0 +817777377 --alswrv 0 0 \_ user: kmk +891974900 --alswrv 0 0 \_ encrypted: evm-key +170323636 --alswrv 0 0 \_ keyring: _module +548221616 --alswrv 0 0 \_ keyring: _ima +128198054 --alswrv 0 0 \_ keyring: _evm + +$ keyctl list 128198054 +1 key in keyring: +620789745: --alswrv 0 0 user: 5D2B05FC633EE3E8 + + +Dmitry Kasatkin +06.10.2011 diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt new file mode 100644 index 000000000000..225f96d88f55 --- /dev/null +++ b/Documentation/dma-buf-sharing.txt @@ -0,0 +1,228 @@ + DMA Buffer Sharing API Guide + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Sumit Semwal + <sumit dot semwal at linaro dot org> + <sumit dot semwal at ti dot com> + +This document serves as a guide to device-driver writers on what is the dma-buf +buffer sharing API, how to use it for exporting and using shared buffers. + +Any device driver which wishes to be a part of DMA buffer sharing, can do so as +either the 'exporter' of buffers, or the 'user' of buffers. + +Say a driver A wants to use buffers created by driver B, then we call B as the +exporter, and A as buffer-user. + +The exporter +- implements and manages operations[1] for the buffer +- allows other users to share the buffer by using dma_buf sharing APIs, +- manages the details of buffer allocation, +- decides about the actual backing storage where this allocation happens, +- takes care of any migration of scatterlist - for all (shared) users of this + buffer, + +The buffer-user +- is one of (many) sharing users of the buffer. +- doesn't need to worry about how the buffer is allocated, or where. +- needs a mechanism to get access to the scatterlist that makes up this buffer + in memory, mapped into its own address space, so it can access the same area + of memory. + +*IMPORTANT*: [see https://lkml.org/lkml/2011/12/20/211 for more details] +For this first version, A buffer shared using the dma_buf sharing API: +- *may* be exported to user space using "mmap" *ONLY* by exporter, outside of + this framework. +- may be used *ONLY* by importers that do not need CPU access to the buffer. + +The dma_buf buffer sharing API usage contains the following steps: + +1. Exporter announces that it wishes to export a buffer +2. Userspace gets the file descriptor associated with the exported buffer, and + passes it around to potential buffer-users based on use case +3. Each buffer-user 'connects' itself to the buffer +4. When needed, buffer-user requests access to the buffer from exporter +5. When finished with its use, the buffer-user notifies end-of-DMA to exporter +6. when buffer-user is done using this buffer completely, it 'disconnects' + itself from the buffer. + + +1. Exporter's announcement of buffer export + + The buffer exporter announces its wish to export a buffer. In this, it + connects its own private buffer data, provides implementation for operations + that can be performed on the exported dma_buf, and flags for the file + associated with this buffer. + + Interface: + struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops, + size_t size, int flags) + + If this succeeds, dma_buf_export allocates a dma_buf structure, and returns a + pointer to the same. It also associates an anonymous file with this buffer, + so it can be exported. On failure to allocate the dma_buf object, it returns + NULL. + +2. Userspace gets a handle to pass around to potential buffer-users + + Userspace entity requests for a file-descriptor (fd) which is a handle to the + anonymous file associated with the buffer. It can then share the fd with other + drivers and/or processes. + + Interface: + int dma_buf_fd(struct dma_buf *dmabuf) + + This API installs an fd for the anonymous file associated with this buffer; + returns either 'fd', or error. + +3. Each buffer-user 'connects' itself to the buffer + + Each buffer-user now gets a reference to the buffer, using the fd passed to + it. + + Interface: + struct dma_buf *dma_buf_get(int fd) + + This API will return a reference to the dma_buf, and increment refcount for + it. + + After this, the buffer-user needs to attach its device with the buffer, which + helps the exporter to know of device buffer constraints. + + Interface: + struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, + struct device *dev) + + This API returns reference to an attachment structure, which is then used + for scatterlist operations. It will optionally call the 'attach' dma_buf + operation, if provided by the exporter. + + The dma-buf sharing framework does the bookkeeping bits related to managing + the list of all attachments to a buffer. + +Until this stage, the buffer-exporter has the option to choose not to actually +allocate the backing storage for this buffer, but wait for the first buffer-user +to request use of buffer for allocation. + + +4. When needed, buffer-user requests access to the buffer + + Whenever a buffer-user wants to use the buffer for any DMA, it asks for + access to the buffer using dma_buf_map_attachment API. At least one attach to + the buffer must have happened before map_dma_buf can be called. + + Interface: + struct sg_table * dma_buf_map_attachment(struct dma_buf_attachment *, + enum dma_data_direction); + + This is a wrapper to dma_buf->ops->map_dma_buf operation, which hides the + "dma_buf->ops->" indirection from the users of this interface. + + In struct dma_buf_ops, map_dma_buf is defined as + struct sg_table * (*map_dma_buf)(struct dma_buf_attachment *, + enum dma_data_direction); + + It is one of the buffer operations that must be implemented by the exporter. + It should return the sg_table containing scatterlist for this buffer, mapped + into caller's address space. + + If this is being called for the first time, the exporter can now choose to + scan through the list of attachments for this buffer, collate the requirements + of the attached devices, and choose an appropriate backing storage for the + buffer. + + Based on enum dma_data_direction, it might be possible to have multiple users + accessing at the same time (for reading, maybe), or any other kind of sharing + that the exporter might wish to make available to buffer-users. + + map_dma_buf() operation can return -EINTR if it is interrupted by a signal. + + +5. When finished, the buffer-user notifies end-of-DMA to exporter + + Once the DMA for the current buffer-user is over, it signals 'end-of-DMA' to + the exporter using the dma_buf_unmap_attachment API. + + Interface: + void dma_buf_unmap_attachment(struct dma_buf_attachment *, + struct sg_table *); + + This is a wrapper to dma_buf->ops->unmap_dma_buf() operation, which hides the + "dma_buf->ops->" indirection from the users of this interface. + + In struct dma_buf_ops, unmap_dma_buf is defined as + void (*unmap_dma_buf)(struct dma_buf_attachment *, struct sg_table *); + + unmap_dma_buf signifies the end-of-DMA for the attachment provided. Like + map_dma_buf, this API also must be implemented by the exporter. + + +6. when buffer-user is done using this buffer, it 'disconnects' itself from the + buffer. + + After the buffer-user has no more interest in using this buffer, it should + disconnect itself from the buffer: + + - it first detaches itself from the buffer. + + Interface: + void dma_buf_detach(struct dma_buf *dmabuf, + struct dma_buf_attachment *dmabuf_attach); + + This API removes the attachment from the list in dmabuf, and optionally calls + dma_buf->ops->detach(), if provided by exporter, for any housekeeping bits. + + - Then, the buffer-user returns the buffer reference to exporter. + + Interface: + void dma_buf_put(struct dma_buf *dmabuf); + + This API then reduces the refcount for this buffer. + + If, as a result of this call, the refcount becomes 0, the 'release' file + operation related to this fd is called. It calls the dmabuf->ops->release() + operation in turn, and frees the memory allocated for dmabuf when exported. + +NOTES: +- Importance of attach-detach and {map,unmap}_dma_buf operation pairs + The attach-detach calls allow the exporter to figure out backing-storage + constraints for the currently-interested devices. This allows preferential + allocation, and/or migration of pages across different types of storage + available, if possible. + + Bracketing of DMA access with {map,unmap}_dma_buf operations is essential + to allow just-in-time backing of storage, and migration mid-way through a + use-case. + +- Migration of backing storage if needed + If after + - at least one map_dma_buf has happened, + - and the backing storage has been allocated for this buffer, + another new buffer-user intends to attach itself to this buffer, it might + be allowed, if possible for the exporter. + + In case it is allowed by the exporter: + if the new buffer-user has stricter 'backing-storage constraints', and the + exporter can handle these constraints, the exporter can just stall on the + map_dma_buf until all outstanding access is completed (as signalled by + unmap_dma_buf). + Once all users have finished accessing and have unmapped this buffer, the + exporter could potentially move the buffer to the stricter backing-storage, + and then allow further {map,unmap}_dma_buf operations from any buffer-user + from the migrated backing-storage. + + If the exporter cannot fulfil the backing-storage constraints of the new + buffer-user device as requested, dma_buf_attach() would return an error to + denote non-compatibility of the new buffer-sharing request with the current + buffer. + + If the exporter chooses not to allow an attach() operation once a + map_dma_buf() API has been called, it simply returns an error. + +Miscellaneous notes: +- Any exporters or users of the dma-buf buffer sharing framework must have + a 'select DMA_SHARED_BUFFER' in their respective Kconfigs. + +References: +[1] struct dma_buf_ops in include/linux/dma-buf.h +[2] All interfaces mentioned above defined in include/linux/dma-buf.h diff --git a/Documentation/dontdiff b/Documentation/dontdiff index dfa6fc6e4b28..0c083c5c2faa 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff @@ -66,7 +66,6 @@ GRTAGS GSYMS GTAGS Image -Kerntypes Module.markers Module.symvers PENDING diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt index d79aead9418b..10c64c8a13d4 100644 --- a/Documentation/driver-model/devres.txt +++ b/Documentation/driver-model/devres.txt @@ -262,6 +262,7 @@ IOMAP devm_ioremap() devm_ioremap_nocache() devm_iounmap() + devm_request_and_ioremap() : checks resource, requests region, ioremaps pcim_iomap() pcim_iounmap() pcim_iomap_table() : array of mapped addresses indexed by BAR diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 3d849122b5b1..d49c2ec72d12 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -85,17 +85,6 @@ Who: Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com> --------------------------- -What: Deprecated snapshot ioctls -When: 2.6.36 - -Why: The ioctls in kernel/power/user.c were marked as deprecated long time - ago. Now they notify users about that so that they need to replace - their userspace. After some more time, remove them completely. - -Who: Jiri Slaby <jirislaby@gmail.com> - ---------------------------- - What: The ieee80211_regdom module parameter When: March 2010 / desktop catchup @@ -263,8 +252,7 @@ Who: Ravikiran Thirumalai <kiran@scalex86.org> What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS (in net/core/net-sysfs.c) -When: After the only user (hal) has seen a release with the patches - for enough time, probably some time in 2010. +When: 3.5 Why: Over 1K .text/.data size reduction, data is available in other ways (ioctls) Who: Johannes Berg <johannes@sipsolutions.net> @@ -362,15 +350,6 @@ Who: anybody or Florian Mickler <florian@mickler.org> ---------------------------- -What: KVM paravirt mmu host support -When: January 2011 -Why: The paravirt mmu host support is slower than non-paravirt mmu, both - on newer and older hardware. It is already not exposed to the guest, - and kept only for live migration purposes. -Who: Avi Kivity <avi@redhat.com> - ----------------------------- - What: iwlwifi 50XX module parameters When: 3.0 Why: The "..50" modules parameters were used to configure 5000 series and @@ -535,6 +514,20 @@ Why: In 3.0, we can now autodetect internal 3G device and already have information log when acer-wmi initial. Who: Lee, Chun-Yi <jlee@novell.com> +--------------------------- + +What: /sys/devices/platform/_UDC_/udc/_UDC_/is_dualspeed file and + is_dualspeed line in /sys/devices/platform/ci13xxx_*/udc/device file. +When: 3.8 +Why: The is_dualspeed file is superseded by maximum_speed in the same + directory and is_dualspeed line in device file is superseded by + max_speed line in the same file. + + The maximum_speed/max_speed specifies maximum speed supported by UDC. + To check if dualspeeed is supported, check if the value is >= 3. + Various possible speeds are defined in <linux/usb/ch9.h>. +Who: Michal Nazarewicz <mina86@mina86.com> + ---------------------------- What: The XFS nodelaylog mount option @@ -551,3 +544,15 @@ When: 3.5 Why: The iwlagn module has been renamed iwlwifi. The alias will be around for backward compatibility for several cycles and then dropped. Who: Don Fry <donald.h.fry@intel.com> + +---------------------------- + +What: pci_scan_bus_parented() +When: 3.5 +Why: The pci_scan_bus_parented() interface creates a new root bus. The + bus is created with default resources (ioport_resource and + iomem_resource) that are always wrong, so we rely on arch code to + correct them later. Callers of pci_scan_bus_parented() should + convert to using pci_scan_root_bus() so they can supply a list of + bus resources when the bus is created. +Who: Bjorn Helgaas <bhelgaas@google.com> diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index d819ba16a0c7..4fca82e5276e 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -37,15 +37,15 @@ d_manage: no no yes (ref-walk) maybe --------------------------- inode_operations --------------------------- prototypes: - int (*create) (struct inode *,struct dentry *,int, struct nameidata *); + int (*create) (struct inode *,struct dentry *,umode_t, struct nameidata *); struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameid ata *); int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); int (*symlink) (struct inode *,struct dentry *,const char *); - int (*mkdir) (struct inode *,struct dentry *,int); + int (*mkdir) (struct inode *,struct dentry *,umode_t); int (*rmdir) (struct inode *,struct dentry *); - int (*mknod) (struct inode *,struct dentry *,int,dev_t); + int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t); int (*rename) (struct inode *, struct dentry *, struct inode *, struct dentry *); int (*readlink) (struct dentry *, char __user *,int); @@ -117,7 +117,7 @@ prototypes: int (*statfs) (struct dentry *, struct kstatfs *); int (*remount_fs) (struct super_block *, int *, char *); void (*umount_begin) (struct super_block *); - int (*show_options)(struct seq_file *, struct vfsmount *); + int (*show_options)(struct seq_file *, struct dentry *); ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t); diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph.txt index 763d8ebbbebd..d6030aa33376 100644 --- a/Documentation/filesystems/ceph.txt +++ b/Documentation/filesystems/ceph.txt @@ -119,12 +119,20 @@ Mount Options must rely on TCP's error correction to detect data corruption in the data payload. - noasyncreaddir - Disable client's use its local cache to satisfy readdir - requests. (This does not change correctness; the client uses - cached metadata only when a lease or capability ensures it is - valid.) + dcache + Use the dcache contents to perform negative lookups and + readdir when the client has the entire directory contents in + its cache. (This does not change correctness; the client uses + cached metadata only when a lease or capability ensures it is + valid.) + + nodcache + Do not use the dcache as above. This avoids a significant amount of + complex code, sacrificing performance without affecting correctness, + and is useful for tracking down bugs. + noasyncreaddir + Do not use the dcache as above for readdir. More Information ================ diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt index dd57bb6bb390..b40fec9d3f53 100644 --- a/Documentation/filesystems/configfs/configfs.txt +++ b/Documentation/filesystems/configfs/configfs.txt @@ -192,7 +192,7 @@ attribute value uses the store_attribute() method. struct configfs_attribute { char *ca_name; struct module *ca_owner; - mode_t ca_mode; + umode_t ca_mode; }; When a config_item wants an attribute to appear as a file in the item's diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt index 742cc06e138f..6872c91bce35 100644 --- a/Documentation/filesystems/debugfs.txt +++ b/Documentation/filesystems/debugfs.txt @@ -35,7 +35,7 @@ described below will work. The most general way to create a file within a debugfs directory is with: - struct dentry *debugfs_create_file(const char *name, mode_t mode, + struct dentry *debugfs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops); @@ -53,13 +53,13 @@ actually necessary; the debugfs code provides a number of helper functions for simple situations. Files containing a single integer value can be created with any of: - struct dentry *debugfs_create_u8(const char *name, mode_t mode, + struct dentry *debugfs_create_u8(const char *name, umode_t mode, struct dentry *parent, u8 *value); - struct dentry *debugfs_create_u16(const char *name, mode_t mode, + struct dentry *debugfs_create_u16(const char *name, umode_t mode, struct dentry *parent, u16 *value); - struct dentry *debugfs_create_u32(const char *name, mode_t mode, + struct dentry *debugfs_create_u32(const char *name, umode_t mode, struct dentry *parent, u32 *value); - struct dentry *debugfs_create_u64(const char *name, mode_t mode, + struct dentry *debugfs_create_u64(const char *name, umode_t mode, struct dentry *parent, u64 *value); These files support both reading and writing the given value; if a specific @@ -67,13 +67,13 @@ file should not be written to, simply set the mode bits accordingly. The values in these files are in decimal; if hexadecimal is more appropriate, the following functions can be used instead: - struct dentry *debugfs_create_x8(const char *name, mode_t mode, + struct dentry *debugfs_create_x8(const char *name, umode_t mode, struct dentry *parent, u8 *value); - struct dentry *debugfs_create_x16(const char *name, mode_t mode, + struct dentry *debugfs_create_x16(const char *name, umode_t mode, struct dentry *parent, u16 *value); - struct dentry *debugfs_create_x32(const char *name, mode_t mode, + struct dentry *debugfs_create_x32(const char *name, umode_t mode, struct dentry *parent, u32 *value); - struct dentry *debugfs_create_x64(const char *name, mode_t mode, + struct dentry *debugfs_create_x64(const char *name, umode_t mode, struct dentry *parent, u64 *value); These functions are useful as long as the developer knows the size of the @@ -81,7 +81,7 @@ value to be exported. Some types can have different widths on different architectures, though, complicating the situation somewhat. There is a function meant to help out in one special case: - struct dentry *debugfs_create_size_t(const char *name, mode_t mode, + struct dentry *debugfs_create_size_t(const char *name, umode_t mode, struct dentry *parent, size_t *value); @@ -90,21 +90,22 @@ a variable of type size_t. Boolean values can be placed in debugfs with: - struct dentry *debugfs_create_bool(const char *name, mode_t mode, + struct dentry *debugfs_create_bool(const char *name, umode_t mode, struct dentry *parent, u32 *value); A read on the resulting file will yield either Y (for non-zero values) or N, followed by a newline. If written to, it will accept either upper- or lower-case values, or 1 or 0. Any other input will be silently ignored. -Finally, a block of arbitrary binary data can be exported with: +Another option is exporting a block of arbitrary binary data, with +this structure and function: struct debugfs_blob_wrapper { void *data; unsigned long size; }; - struct dentry *debugfs_create_blob(const char *name, mode_t mode, + struct dentry *debugfs_create_blob(const char *name, umode_t mode, struct dentry *parent, struct debugfs_blob_wrapper *blob); @@ -115,6 +116,35 @@ can be used to export binary information, but there does not appear to be any code which does so in the mainline. Note that all files created with debugfs_create_blob() are read-only. +If you want to dump a block of registers (something that happens quite +often during development, even if little such code reaches mainline. +Debugfs offers two functions: one to make a registers-only file, and +another to insert a register block in the middle of another sequential +file. + + struct debugfs_reg32 { + char *name; + unsigned long offset; + }; + + struct debugfs_regset32 { + struct debugfs_reg32 *regs; + int nregs; + void __iomem *base; + }; + + struct dentry *debugfs_create_regset32(const char *name, mode_t mode, + struct dentry *parent, + struct debugfs_regset32 *regset); + + int debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs, + int nregs, void __iomem *base, char *prefix); + +The "base" argument may be 0, but you may want to build the reg32 array +using __stringify, and a number of register names (macros) are actually +byte offsets over a base for the register block. + + There are a couple of other directory-oriented helper functions: struct dentry *debugfs_rename(struct dentry *old_dir, diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 4917cf24a5e0..10ec4639f152 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -581,6 +581,13 @@ Table of Ext4 specific ioctls behaviour may change in the future as it is not necessary and has been done this way only for sake of simplicity. + + EXT4_IOC_RESIZE_FS Resize the filesystem to a new size. The number + of blocks of resized filesystem is passed in via + 64 bit integer argument. The kernel allocates + bitmaps and inode table, the userspace tool thus + just passes the new number of blocks. + .............................................................................. References diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX index a57e12411d2a..1716874a651e 100644 --- a/Documentation/filesystems/nfs/00-INDEX +++ b/Documentation/filesystems/nfs/00-INDEX @@ -2,6 +2,8 @@ - this file (nfs-related documentation). Exporting - explanation of how to make filesystems exportable. +fault_injection.txt + - information for using fault injection on the server knfsd-stats.txt - statistics which the NFS server makes available to user space. nfs.txt diff --git a/Documentation/filesystems/nfs/fault_injection.txt b/Documentation/filesystems/nfs/fault_injection.txt new file mode 100644 index 000000000000..426d166089a3 --- /dev/null +++ b/Documentation/filesystems/nfs/fault_injection.txt @@ -0,0 +1,69 @@ + +Fault Injection +=============== +Fault injection is a method for forcing errors that may not normally occur, or +may be difficult to reproduce. Forcing these errors in a controlled environment +can help the developer find and fix bugs before their code is shipped in a +production system. Injecting an error on the Linux NFS server will allow us to +observe how the client reacts and if it manages to recover its state correctly. + +NFSD_FAULT_INJECTION must be selected when configuring the kernel to use this +feature. + + +Using Fault Injection +===================== +On the client, mount the fault injection server through NFS v4.0+ and do some +work over NFS (open files, take locks, ...). + +On the server, mount the debugfs filesystem to <debug_dir> and ls +<debug_dir>/nfsd. This will show a list of files that will be used for +injecting faults on the NFS server. As root, write a number n to the file +corresponding to the action you want the server to take. The server will then +process the first n items it finds. So if you want to forget 5 locks, echo '5' +to <debug_dir>/nfsd/forget_locks. A value of 0 will tell the server to forget +all corresponding items. A log message will be created containing the number +of items forgotten (check dmesg). + +Go back to work on the client and check if the client recovered from the error +correctly. + + +Available Faults +================ +forget_clients: + The NFS server keeps a list of clients that have placed a mount call. If + this list is cleared, the server will have no knowledge of who the client + is, forcing the client to reauthenticate with the server. + +forget_openowners: + The NFS server keeps a list of what files are currently opened and who + they were opened by. Clearing this list will force the client to reopen + its files. + +forget_locks: + The NFS server keeps a list of what files are currently locked in the VFS. + Clearing this list will force the client to reclaim its locks (files are + unlocked through the VFS as they are cleared from this list). + +forget_delegations: + A delegation is used to assure the client that a file, or part of a file, + has not changed since the delegation was awarded. Clearing this list will + force the client to reaquire its delegation before accessing the file + again. + +recall_delegations: + Delegations can be recalled by the server when another client attempts to + access a file. This test will notify the client that its delegation has + been revoked, forcing the client to reaquire the delegation before using + the file again. + + +tools/nfs/inject_faults.sh script +================================= +This script has been created to ease the fault injection process. This script +will detect the mounted debugfs directory and write to the files located there +based on the arguments passed by the user. For example, running +`inject_faults.sh forget_locks 1` as root will instruct the server to forget +one lock. Running `inject_faults forget_locks` will instruct the server to +forgetall locks. diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 0ec91f03422e..a76a26a1db8a 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -41,6 +41,8 @@ Table of Contents 3.5 /proc/<pid>/mountinfo - Information about mounts 3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm + 4 Configuring procfs + 4.1 Mount options ------------------------------------------------------------------------------ Preface @@ -305,6 +307,9 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7) blkio_ticks time spent waiting for block IO gtime guest time of the task in jiffies cgtime guest time of the task children in jiffies + start_data address above which program data+bss is placed + end_data address below which program data+bss is placed + start_brk address above which program heap can be expanded with brk() .............................................................................. The /proc/PID/maps file containing the currently mapped memory regions and @@ -1542,3 +1547,40 @@ a task to set its own or one of its thread siblings comm value. The comm value is limited in size compared to the cmdline value, so writing anything longer then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated comm value. + + +------------------------------------------------------------------------------ +Configuring procfs +------------------------------------------------------------------------------ + +4.1 Mount options +--------------------- + +The following mount options are supported: + + hidepid= Set /proc/<pid>/ access mode. + gid= Set the group authorized to learn processes information. + +hidepid=0 means classic mode - everybody may access all /proc/<pid>/ directories +(default). + +hidepid=1 means users may not access any /proc/<pid>/ directories but their +own. Sensitive files like cmdline, sched*, status are now protected against +other users. This makes it impossible to learn whether any user runs +specific program (given the program doesn't reveal itself by its behaviour). +As an additional bonus, as /proc/<pid>/cmdline is unaccessible for other users, +poorly written programs passing sensitive information via program arguments are +now protected against local eavesdroppers. + +hidepid=2 means hidepid=1 plus all /proc/<pid>/ will be fully invisible to other +users. It doesn't mean that it hides a fact whether a process with a specific +pid value exists (it can be learned by other means, e.g. by "kill -0 $PID"), +but it hides process' uid and gid, which may be learned by stat()'ing +/proc/<pid>/ otherwise. It greatly complicates an intruder's task of gathering +information about running processes, whether some daemon runs with elevated +privileges, whether other user runs some sensitive program, whether other users +run any program at all, etc. + +gid= defines a group authorized to learn processes information otherwise +prohibited by hidepid=. If you use some daemon like identd which needs to learn +information about processes information, just add identd to this group. diff --git a/Documentation/filesystems/squashfs.txt b/Documentation/filesystems/squashfs.txt index 7db3ebda5a4c..403c090aca39 100644 --- a/Documentation/filesystems/squashfs.txt +++ b/Documentation/filesystems/squashfs.txt @@ -93,8 +93,8 @@ byte alignment: Compressed data blocks are written to the filesystem as files are read from the source directory, and checked for duplicates. Once all file data has been -written the completed inode, directory, fragment, export and uid/gid lookup -tables are written. +written the completed inode, directory, fragment, export, uid/gid lookup and +xattr tables are written. 3.1 Compression options ----------------------- @@ -151,7 +151,7 @@ in each metadata block. Directories are sorted in alphabetical order, and at lookup the index is scanned linearly looking for the first filename alphabetically larger than the filename being looked up. At this point the location of the metadata block the filename is in has been found. -The general idea of the index is ensure only one metadata block needs to be +The general idea of the index is to ensure only one metadata block needs to be decompressed to do a lookup irrespective of the length of the directory. This scheme has the advantage that it doesn't require extra memory overhead and doesn't require much extra storage on disk. diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt index 07235caec22c..a6619b7064b9 100644 --- a/Documentation/filesystems/sysfs.txt +++ b/Documentation/filesystems/sysfs.txt @@ -70,7 +70,7 @@ An attribute definition is simply: struct attribute { char * name; struct module *owner; - mode_t mode; + umode_t mode; }; diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 43cbd0821721..3d9393b845b8 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -225,7 +225,7 @@ struct super_operations { void (*clear_inode) (struct inode *); void (*umount_begin) (struct super_block *); - int (*show_options)(struct seq_file *, struct vfsmount *); + int (*show_options)(struct seq_file *, struct dentry *); ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); @@ -341,14 +341,14 @@ This describes how the VFS can manipulate an inode in your filesystem. As of kernel 2.6.22, the following members are defined: struct inode_operations { - int (*create) (struct inode *,struct dentry *,int, struct nameidata *); + int (*create) (struct inode *,struct dentry *, umode_t, struct nameidata *); struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameidata *); int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); int (*symlink) (struct inode *,struct dentry *,const char *); - int (*mkdir) (struct inode *,struct dentry *,int); + int (*mkdir) (struct inode *,struct dentry *,umode_t); int (*rmdir) (struct inode *,struct dentry *); - int (*mknod) (struct inode *,struct dentry *,int,dev_t); + int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t); int (*rename) (struct inode *, struct dentry *, struct inode *, struct dentry *); int (*readlink) (struct dentry *, char __user *,int); diff --git a/Documentation/hwmon/pmbus b/Documentation/hwmon/pmbus index 15ac911ce51b..d28b591753d1 100644 --- a/Documentation/hwmon/pmbus +++ b/Documentation/hwmon/pmbus @@ -2,9 +2,8 @@ Kernel driver pmbus ==================== Supported chips: - * Ericsson BMR45X series - DC/DC Converter - Prefixes: 'bmr450', 'bmr451', 'bmr453', 'bmr454' + * Ericsson BMR453, BMR454 + Prefixes: 'bmr453', 'bmr454' Addresses scanned: - Datasheet: http://archive.ericsson.net/service/internet/picov/get?DocNo=28701-EN/LZT146395 diff --git a/Documentation/hwmon/zl6100 b/Documentation/hwmon/zl6100 index 7617798b5c97..51f76a189fee 100644 --- a/Documentation/hwmon/zl6100 +++ b/Documentation/hwmon/zl6100 @@ -6,6 +6,10 @@ Supported chips: Prefix: 'zl2004' Addresses scanned: - Datasheet: http://www.intersil.com/data/fn/fn6847.pdf + * Intersil / Zilker Labs ZL2005 + Prefix: 'zl2005' + Addresses scanned: - + Datasheet: http://www.intersil.com/data/fn/fn6848.pdf * Intersil / Zilker Labs ZL2006 Prefix: 'zl2006' Addresses scanned: - @@ -30,6 +34,17 @@ Supported chips: Prefix: 'zl6105' Addresses scanned: - Datasheet: http://www.intersil.com/data/fn/fn6906.pdf + * Ericsson BMR450, BMR451 + Prefix: 'bmr450', 'bmr451' + Addresses scanned: - + Datasheet: +http://archive.ericsson.net/service/internet/picov/get?DocNo=28701-EN/LZT146401 + * Ericsson BMR462, BMR463, BMR464 + Prefixes: 'bmr462', 'bmr463', 'bmr464' + Addresses scanned: - + Datasheet: +http://archive.ericsson.net/service/internet/picov/get?DocNo=28701-EN/LZT146256 + Author: Guenter Roeck <guenter.roeck@ericsson.com> diff --git a/Documentation/input/alps.txt b/Documentation/input/alps.txt new file mode 100644 index 000000000000..f274c28b5103 --- /dev/null +++ b/Documentation/input/alps.txt @@ -0,0 +1,188 @@ +ALPS Touchpad Protocol +---------------------- + +Introduction +------------ + +Currently the ALPS touchpad driver supports four protocol versions in use by +ALPS touchpads, called versions 1, 2, 3, and 4. Information about the various +protocol versions is contained in the following sections. + +Detection +--------- + +All ALPS touchpads should respond to the "E6 report" command sequence: +E8-E6-E6-E6-E9. An ALPS touchpad should respond with either 00-00-0A or +00-00-64. + +If the E6 report is successful, the touchpad model is identified using the "E7 +report" sequence: E8-E7-E7-E7-E9. The response is the model signature and is +matched against known models in the alps_model_data_array. + +With protocol versions 3 and 4, the E7 report model signature is always +73-02-64. To differentiate between these versions, the response from the +"Enter Command Mode" sequence must be inspected as described below. + +Command Mode +------------ + +Protocol versions 3 and 4 have a command mode that is used to read and write +one-byte device registers in a 16-bit address space. The command sequence +EC-EC-EC-E9 places the device in command mode, and the device will respond +with 88-07 followed by a third byte. This third byte can be used to determine +whether the devices uses the version 3 or 4 protocol. + +To exit command mode, PSMOUSE_CMD_SETSTREAM (EA) is sent to the touchpad. + +While in command mode, register addresses can be set by first sending a +specific command, either EC for v3 devices or F5 for v4 devices. Then the +address is sent one nibble at a time, where each nibble is encoded as a +command with optional data. This enoding differs slightly between the v3 and +v4 protocols. + +Once an address has been set, the addressed register can be read by sending +PSMOUSE_CMD_GETINFO (E9). The first two bytes of the response contains the +address of the register being read, and the third contains the value of the +register. Registers are written by writing the value one nibble at a time +using the same encoding used for addresses. + +Packet Format +------------- + +In the following tables, the following notation is used. + + CAPITALS = stick, miniscules = touchpad + +?'s can have different meanings on different models, such as wheel rotation, +extra buttons, stick buttons on a dualpoint, etc. + +PS/2 packet format +------------------ + + byte 0: 0 0 YSGN XSGN 1 M R L + byte 1: X7 X6 X5 X4 X3 X2 X1 X0 + byte 2: Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 + +Note that the device never signals overflow condition. + +ALPS Absolute Mode - Protocol Verion 1 +-------------------------------------- + + byte 0: 1 0 0 0 1 x9 x8 x7 + byte 1: 0 x6 x5 x4 x3 x2 x1 x0 + byte 2: 0 ? ? l r ? fin ges + byte 3: 0 ? ? ? ? y9 y8 y7 + byte 4: 0 y6 y5 y4 y3 y2 y1 y0 + byte 5: 0 z6 z5 z4 z3 z2 z1 z0 + +ALPS Absolute Mode - Protocol Version 2 +--------------------------------------- + + byte 0: 1 ? ? ? 1 ? ? ? + byte 1: 0 x6 x5 x4 x3 x2 x1 x0 + byte 2: 0 x10 x9 x8 x7 ? fin ges + byte 3: 0 y9 y8 y7 1 M R L + byte 4: 0 y6 y5 y4 y3 y2 y1 y0 + byte 5: 0 z6 z5 z4 z3 z2 z1 z0 + +Dualpoint device -- interleaved packet format +--------------------------------------------- + + byte 0: 1 1 0 0 1 1 1 1 + byte 1: 0 x6 x5 x4 x3 x2 x1 x0 + byte 2: 0 x10 x9 x8 x7 0 fin ges + byte 3: 0 0 YSGN XSGN 1 1 1 1 + byte 4: X7 X6 X5 X4 X3 X2 X1 X0 + byte 5: Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 + byte 6: 0 y9 y8 y7 1 m r l + byte 7: 0 y6 y5 y4 y3 y2 y1 y0 + byte 8: 0 z6 z5 z4 z3 z2 z1 z0 + +ALPS Absolute Mode - Protocol Version 3 +--------------------------------------- + +ALPS protocol version 3 has three different packet formats. The first two are +associated with touchpad events, and the third is associatd with trackstick +events. + +The first type is the touchpad position packet. + + byte 0: 1 ? x1 x0 1 1 1 1 + byte 1: 0 x10 x9 x8 x7 x6 x5 x4 + byte 2: 0 y10 y9 y8 y7 y6 y5 y4 + byte 3: 0 M R L 1 m r l + byte 4: 0 mt x3 x2 y3 y2 y1 y0 + byte 5: 0 z6 z5 z4 z3 z2 z1 z0 + +Note that for some devices the trackstick buttons are reported in this packet, +and on others it is reported in the trackstick packets. + +The second packet type contains bitmaps representing the x and y axes. In the +bitmaps a given bit is set if there is a finger covering that position on the +given axis. Thus the bitmap packet can be used for low-resolution multi-touch +data, although finger tracking is not possible. This packet also encodes the +number of contacts (f1 and f0 in the table below). + + byte 0: 1 1 x1 x0 1 1 1 1 + byte 1: 0 x8 x7 x6 x5 x4 x3 x2 + byte 2: 0 y7 y6 y5 y4 y3 y2 y1 + byte 3: 0 y10 y9 y8 1 1 1 1 + byte 4: 0 x14 x13 x12 x11 x10 x9 y0 + byte 5: 0 1 ? ? ? ? f1 f0 + +This packet only appears after a position packet with the mt bit set, and +ususally only appears when there are two or more contacts (although +ocassionally it's seen with only a single contact). + +The final v3 packet type is the trackstick packet. + + byte 0: 1 1 x7 y7 1 1 1 1 + byte 1: 0 x6 x5 x4 x3 x2 x1 x0 + byte 2: 0 y6 y5 y4 y3 y2 y1 y0 + byte 3: 0 1 0 0 1 0 0 0 + byte 4: 0 z4 z3 z2 z1 z0 ? ? + byte 5: 0 0 1 1 1 1 1 1 + +ALPS Absolute Mode - Protocol Version 4 +--------------------------------------- + +Protocol version 4 has an 8-byte packet format. + + byte 0: 1 ? x1 x0 1 1 1 1 + byte 1: 0 x10 x9 x8 x7 x6 x5 x4 + byte 2: 0 y10 y9 y8 y7 y6 y5 y4 + byte 3: 0 1 x3 x2 y3 y2 y1 y0 + byte 4: 0 ? ? ? 1 ? r l + byte 5: 0 z6 z5 z4 z3 z2 z1 z0 + byte 6: bitmap data (described below) + byte 7: bitmap data (described below) + +The last two bytes represent a partial bitmap packet, with 3 full packets +required to construct a complete bitmap packet. Once assembled, the 6-byte +bitmap packet has the following format: + + byte 0: 0 1 x7 x6 x5 x4 x3 x2 + byte 1: 0 x1 x0 y4 y3 y2 y1 y0 + byte 2: 0 0 ? x14 x13 x12 x11 x10 + byte 3: 0 x9 x8 y9 y8 y7 y6 y5 + byte 4: 0 0 0 0 0 0 0 0 + byte 5: 0 0 0 0 0 0 0 y10 + +There are several things worth noting here. + + 1) In the bitmap data, bit 6 of byte 0 serves as a sync byte to + identify the first fragment of a bitmap packet. + + 2) The bitmaps represent the same data as in the v3 bitmap packets, although + the packet layout is different. + + 3) There doesn't seem to be a count of the contact points anywhere in the v4 + protocol packets. Deriving a count of contact points must be done by + analyzing the bitmaps. + + 4) There is a 3 to 1 ratio of position packets to bitmap packets. Therefore + MT position can only be updated for every third ST position update, and + the count of contact points can only be updated every third packet as + well. + +So far no v4 devices with tracksticks have been encountered. diff --git a/Documentation/input/gpio-tilt.txt b/Documentation/input/gpio-tilt.txt new file mode 100644 index 000000000000..06d60c3ff5e7 --- /dev/null +++ b/Documentation/input/gpio-tilt.txt @@ -0,0 +1,103 @@ +Driver for tilt-switches connected via GPIOs +============================================ + +Generic driver to read data from tilt switches connected via gpios. +Orientation can be provided by one or more than one tilt switches, +i.e. each tilt switch providing one axis, and the number of axes +is also not limited. + + +Data structures: +---------------- + +The array of struct gpio in the gpios field is used to list the gpios +that represent the current tilt state. + +The array of struct gpio_tilt_axis describes the axes that are reported +to the input system. The values set therein are used for the +input_set_abs_params calls needed to init the axes. + +The array of struct gpio_tilt_state maps gpio states to the corresponding +values to report. The gpio state is represented as a bitfield where the +bit-index corresponds to the index of the gpio in the struct gpio array. +In the same manner the values stored in the axes array correspond to +the elements of the gpio_tilt_axis-array. + + +Example: +-------- + +Example configuration for a single TS1003 tilt switch that rotates around +one axis in 4 steps and emitts the current tilt via two GPIOs. + +static int sg060_tilt_enable(struct device *dev) { + /* code to enable the sensors */ +}; + +static void sg060_tilt_disable(struct device *dev) { + /* code to disable the sensors */ +}; + +static struct gpio sg060_tilt_gpios[] = { + { SG060_TILT_GPIO_SENSOR1, GPIOF_IN, "tilt_sensor1" }, + { SG060_TILT_GPIO_SENSOR2, GPIOF_IN, "tilt_sensor2" }, +}; + +static struct gpio_tilt_state sg060_tilt_states[] = { + { + .gpios = (0 << 1) | (0 << 0), + .axes = (int[]) { + 0, + }, + }, { + .gpios = (0 << 1) | (1 << 0), + .axes = (int[]) { + 1, /* 90 degrees */ + }, + }, { + .gpios = (1 << 1) | (1 << 0), + .axes = (int[]) { + 2, /* 180 degrees */ + }, + }, { + .gpios = (1 << 1) | (0 << 0), + .axes = (int[]) { + 3, /* 270 degrees */ + }, + }, +}; + +static struct gpio_tilt_axis sg060_tilt_axes[] = { + { + .axis = ABS_RY, + .min = 0, + .max = 3, + .fuzz = 0, + .flat = 0, + }, +}; + +static struct gpio_tilt_platform_data sg060_tilt_pdata= { + .gpios = sg060_tilt_gpios, + .nr_gpios = ARRAY_SIZE(sg060_tilt_gpios), + + .axes = sg060_tilt_axes, + .nr_axes = ARRAY_SIZE(sg060_tilt_axes), + + .states = sg060_tilt_states, + .nr_states = ARRAY_SIZE(sg060_tilt_states), + + .debounce_interval = 100, + + .poll_interval = 1000, + .enable = sg060_tilt_enable, + .disable = sg060_tilt_disable, +}; + +static struct platform_device sg060_device_tilt = { + .name = "gpio-tilt-polled", + .id = -1, + .dev = { + .platform_data = &sg060_tilt_pdata, + }, +}; diff --git a/Documentation/input/sentelic.txt b/Documentation/input/sentelic.txt index b2ef125b71f8..89251e2a3eba 100644 --- a/Documentation/input/sentelic.txt +++ b/Documentation/input/sentelic.txt @@ -1,5 +1,5 @@ -Copyright (C) 2002-2010 Sentelic Corporation. -Last update: Jan-13-2010 +Copyright (C) 2002-2011 Sentelic Corporation. +Last update: Dec-07-2011 ============================================================================== * Finger Sensing Pad Intellimouse Mode(scrolling wheel, 4th and 5th buttons) @@ -140,6 +140,7 @@ BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|--------- Byte 1: Bit7~Bit6 => 00, Normal data packet => 01, Absolute coordination packet => 10, Notify packet + => 11, Normal data packet with on-pad click Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up. When both fingers are up, the last two reports have zero valid bit. @@ -164,6 +165,7 @@ BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|--------- Byte 1: Bit7~Bit6 => 00, Normal data packet => 01, Absolute coordinates packet => 10, Notify packet + => 11, Normal data packet with on-pad click Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up. When both fingers are up, the last two reports have zero valid bit. @@ -188,6 +190,7 @@ BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|--------- Byte 1: Bit7~Bit6 => 00, Normal data packet => 01, Absolute coordinates packet => 10, Notify packet + => 11, Normal data packet with on-pad click Bit5 => 1 Bit4 => when in absolute coordinates mode (valid when EN_PKT_GO is 1): 0: left button is generated by the on-pad command @@ -205,7 +208,7 @@ Byte 4: Bit7 => scroll right button Bit6 => scroll left button Bit5 => scroll down button Bit4 => scroll up button - * Note that if gesture and additional buttoni (Bit4~Bit7) + * Note that if gesture and additional button (Bit4~Bit7) happen at the same time, the button information will not be sent. Bit3~Bit0 => Reserved @@ -227,6 +230,7 @@ BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|--------- Byte 1: Bit7~Bit6 => 00, Normal data packet => 01, Absolute coordinates packet => 10, Notify packet + => 11, Normal data packet with on-pad click Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up. When both fingers are up, the last two reports have zero valid bit. @@ -253,6 +257,7 @@ BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|--------- Byte 1: Bit7~Bit6 => 00, Normal data packet => 01, Absolute coordination packet => 10, Notify packet + => 11, Normal data packet with on-pad click Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up. When both fingers are up, the last two reports have zero valid bit. @@ -279,8 +284,9 @@ BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|--------- Byte 1: Bit7~Bit6 => 00, Normal data packet => 01, Absolute coordination packet => 10, Notify packet + => 11, Normal data packet with on-pad click Bit5 => 1 - Bit4 => when in absolute coordinate mode (valid when EN_PKT_GO is 1): + Bit4 => when in absolute coordinates mode (valid when EN_PKT_GO is 1): 0: left button is generated by the on-pad command 1: left button is generated by the external button Bit3 => 1 @@ -307,6 +313,110 @@ Sample sequence of Multi-finger, Multi-coordinate mode: abs pkt 2, ..., notify packet (valid bit == 0) ============================================================================== +* Absolute position for STL3888-Cx and STL3888-Dx. +============================================================================== +Single Finger, Absolute Coordinate Mode (SFAC) + Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 +BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| + 1 |0|1|0|P|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |r|l|B|F|X|X|Y|Y| + |---------------| |---------------| |---------------| |---------------| + +Byte 1: Bit7~Bit6 => 00, Normal data packet + => 01, Absolute coordinates packet + => 10, Notify packet + Bit5 => Coordinate mode(always 0 in SFAC mode): + 0: single-finger absolute coordinates (SFAC) mode + 1: multi-finger, multiple coordinates (MFMC) mode + Bit4 => 0: The LEFT button is generated by on-pad command (OPC) + 1: The LEFT button is generated by external button + Default is 1 even if the LEFT button is not pressed. + Bit3 => Always 1, as specified by PS/2 protocol. + Bit2 => Middle Button, 1 is pressed, 0 is not pressed. + Bit1 => Right Button, 1 is pressed, 0 is not pressed. + Bit0 => Left Button, 1 is pressed, 0 is not pressed. +Byte 2: X coordinate (xpos[9:2]) +Byte 3: Y coordinate (ypos[9:2]) +Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0]) + Bit3~Bit2 => X coordinate (ypos[1:0]) + Bit4 => 4th mouse button(forward one page) + Bit5 => 5th mouse button(backward one page) + Bit6 => scroll left button + Bit7 => scroll right button + +Multi Finger, Multiple Coordinates Mode (MFMC): + Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 +BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| + 1 |0|1|1|P|1|F|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |r|l|B|F|X|X|Y|Y| + |---------------| |---------------| |---------------| |---------------| + +Byte 1: Bit7~Bit6 => 00, Normal data packet + => 01, Absolute coordination packet + => 10, Notify packet + Bit5 => Coordinate mode (always 1 in MFMC mode): + 0: single-finger absolute coordinates (SFAC) mode + 1: multi-finger, multiple coordinates (MFMC) mode + Bit4 => 0: The LEFT button is generated by on-pad command (OPC) + 1: The LEFT button is generated by external button + Default is 1 even if the LEFT button is not pressed. + Bit3 => Always 1, as specified by PS/2 protocol. + Bit2 => Finger index, 0 is the first finger, 1 is the second finger. + If bit 1 and 0 are all 1 and bit 4 is 0, the middle external + button is pressed. + Bit1 => Right Button, 1 is pressed, 0 is not pressed. + Bit0 => Left Button, 1 is pressed, 0 is not pressed. +Byte 2: X coordinate (xpos[9:2]) +Byte 3: Y coordinate (ypos[9:2]) +Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0]) + Bit3~Bit2 => X coordinate (ypos[1:0]) + Bit4 => 4th mouse button(forward one page) + Bit5 => 5th mouse button(backward one page) + Bit6 => scroll left button + Bit7 => scroll right button + + When one of the two fingers is up, the device will output four consecutive +MFMC#0 report packets with zero X and Y to represent 1st finger is up or +four consecutive MFMC#1 report packets with zero X and Y to represent that +the 2nd finger is up. On the other hand, if both fingers are up, the device +will output four consecutive single-finger, absolute coordinate(SFAC) packets +with zero X and Y. + +Notify Packet for STL3888-Cx/Dx + Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 +BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| + 1 |1|0|0|P|1|M|R|L| 2 |C|C|C|C|C|C|C|C| 3 |0|0|F|F|0|0|0|i| 4 |r|l|u|d|0|0|0|0| + |---------------| |---------------| |---------------| |---------------| + +Byte 1: Bit7~Bit6 => 00, Normal data packet + => 01, Absolute coordinates packet + => 10, Notify packet + Bit5 => Always 0 + Bit4 => 0: The LEFT button is generated by on-pad command(OPC) + 1: The LEFT button is generated by external button + Default is 1 even if the LEFT button is not pressed. + Bit3 => 1 + Bit2 => Middle Button, 1 is pressed, 0 is not pressed. + Bit1 => Right Button, 1 is pressed, 0 is not pressed. + Bit0 => Left Button, 1 is pressed, 0 is not pressed. +Byte 2: Message type: + 0xba => gesture information + 0xc0 => one finger hold-rotating gesture +Byte 3: The first parameter for the received message: + 0xba => gesture ID (refer to the 'Gesture ID' section) + 0xc0 => region ID +Byte 4: The second parameter for the received message: + 0xba => N/A + 0xc0 => finger up/down information + +Sample sequence of Multi-finger, Multi-coordinates mode: + + notify packet (valid bit == 1), MFMC packet 1 (byte 1, bit 2 == 0), + MFMC packet 2 (byte 1, bit 2 == 1), MFMC packet 1, MFMC packet 2, + ..., notify packet (valid bit == 0) + + That is, when the device is in MFMC mode, the host will receive + interleaved absolute coordinate packets for each finger. + +============================================================================== * FSP Enable/Disable packet ============================================================================== Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 @@ -348,9 +458,10 @@ http://www.computer-engineering.org/ps2mouse/ ============================================================================== 1. Identify FSP by reading device ID(0x00) and version(0x01) register -2. Determine number of buttons by reading status2 (0x0b) register +2a. For FSP version < STL3888 Cx, determine number of buttons by reading + the 'test mode status' (0x20) register: - buttons = reg[0x0b] & 0x30 + buttons = reg[0x20] & 0x30 if buttons == 0x30 or buttons == 0x20: # two/four buttons @@ -365,6 +476,10 @@ http://www.computer-engineering.org/ps2mouse/ Refer to 'Finger Sensing Pad PS/2 Mouse Intellimouse' section A for packet parsing detail +2b. For FSP version >= STL3888 Cx: + Refer to 'Finger Sensing Pad PS/2 Mouse Intellimouse' + section A for packet parsing detail (ignore byte 4, bit ~ 7) + ============================================================================== * Programming Sequence for Register Reading/Writing ============================================================================== @@ -374,7 +489,7 @@ Register inversion requirement: Following values needed to be inverted(the '~' operator in C) before being sent to FSP: - 0xe9, 0xee, 0xf2 and 0xff. + 0xe8, 0xe9, 0xee, 0xf2, 0xf3 and 0xff. Register swapping requirement: @@ -415,7 +530,18 @@ Register reading sequence: 8. send 0xe9(status request) PS/2 command to FSP; - 9. the response read from FSP should be the requested register value. + 9. the 4th byte of the response read from FSP should be the + requested register value(?? indicates don't care byte): + + host: 0xe9 + 3888: 0xfa (??) (??) (val) + + * Note that since the Cx release, the hardware will return 1's + complement of the register value at the 3rd byte of status request + result: + + host: 0xe9 + 3888: 0xfa (??) (~val) (val) Register writing sequence: @@ -465,71 +591,194 @@ Register writing sequence: 9. the register writing sequence is completed. + * Note that since the Cx release, the hardware will return 1's + complement of the register value at the 3rd byte of status request + result. Host can optionally send another 0xe9 (status request) PS/2 + command to FSP at the end of register writing to verify that the + register writing operation is successful (?? indicates don't care + byte): + + host: 0xe9 + 3888: 0xfa (??) (~val) (val) + +============================================================================== +* Programming Sequence for Page Register Reading/Writing +============================================================================== + + In order to overcome the limitation of maximum number of registers +supported, the hardware separates register into different groups called +'pages.' Each page is able to include up to 255 registers. + + The default page after power up is 0x82; therefore, if one has to get +access to register 0x8301, one has to use following sequence to switch +to page 0x83, then start reading/writing from/to offset 0x01 by using +the register read/write sequence described in previous section. + +Page register reading sequence: + + 1. send 0xf3 PS/2 command to FSP; + + 2. send 0x66 PS/2 command to FSP; + + 3. send 0x88 PS/2 command to FSP; + + 4. send 0xf3 PS/2 command to FSP; + + 5. send 0x83 PS/2 command to FSP; + + 6. send 0x88 PS/2 command to FSP; + + 7. send 0xe9(status request) PS/2 command to FSP; + + 8. the response read from FSP should be the requested page value. + +Page register writing sequence: + + 1. send 0xf3 PS/2 command to FSP; + + 2. send 0x38 PS/2 command to FSP; + + 3. send 0x88 PS/2 command to FSP; + + 4. send 0xf3 PS/2 command to FSP; + + 5. if the page address being written is not required to be + inverted(refer to the 'Register inversion requirement' section), + goto step 6 + + 5a. send 0x47 PS/2 command to FSP; + + 5b. send the inverted page address to FSP and goto step 9; + + 6. if the page address being written is not required to be + swapped(refer to the 'Register swapping requirement' section), + goto step 7 + + 6a. send 0x44 PS/2 command to FSP; + + 6b. send the swapped page address to FSP and goto step 9; + + 7. send 0x33 PS/2 command to FSP; + + 8. send the page address to FSP; + + 9. the page register writing sequence is completed. + +============================================================================== +* Gesture ID +============================================================================== + + Unlike other devices which sends multiple fingers' coordinates to host, +FSP processes multiple fingers' coordinates internally and convert them +into a 8 bits integer, namely 'Gesture ID.' Following is a list of +supported gesture IDs: + + ID Description + 0x86 2 finger straight up + 0x82 2 finger straight down + 0x80 2 finger straight right + 0x84 2 finger straight left + 0x8f 2 finger zoom in + 0x8b 2 finger zoom out + 0xc0 2 finger curve, counter clockwise + 0xc4 2 finger curve, clockwise + 0x2e 3 finger straight up + 0x2a 3 finger straight down + 0x28 3 finger straight right + 0x2c 3 finger straight left + 0x38 palm + ============================================================================== * Register Listing ============================================================================== + Registers are represented in 16 bits values. The higher 8 bits represent +the page address and the lower 8 bits represent the relative offset within +that particular page. Refer to the 'Programming Sequence for Page Register +Reading/Writing' section for instructions on how to change current page +address. + offset width default r/w name -0x00 bit7~bit0 0x01 RO device ID +0x8200 bit7~bit0 0x01 RO device ID -0x01 bit7~bit0 0xc0 RW version ID +0x8201 bit7~bit0 RW version ID + 0xc1: STL3888 Ax + 0xd0 ~ 0xd2: STL3888 Bx + 0xe0 ~ 0xe1: STL3888 Cx + 0xe2 ~ 0xe3: STL3888 Dx -0x02 bit7~bit0 0x01 RO vendor ID +0x8202 bit7~bit0 0x01 RO vendor ID -0x03 bit7~bit0 0x01 RO product ID +0x8203 bit7~bit0 0x01 RO product ID -0x04 bit3~bit0 0x01 RW revision ID +0x8204 bit3~bit0 0x01 RW revision ID -0x0b RO test mode status 1 - bit3 1 RO 0: rotate 180 degree, 1: no rotation +0x820b test mode status 1 + bit3 1 RO 0: rotate 180 degree + 1: no rotation + *only supported by H/W prior to Cx - bit5~bit4 RO number of buttons - 11 => 2, lbtn/rbtn - 10 => 4, lbtn/rbtn/scru/scrd - 01 => 6, lbtn/rbtn/scru/scrd/scrl/scrr - 00 => 6, lbtn/rbtn/scru/scrd/fbtn/bbtn +0x820f register file page control + bit2 0 RW 1: rotate 180 degree + 0: no rotation + *supported since Cx -0x0f RW register file page control bit0 0 RW 1 to enable page 1 register files + *only supported by H/W prior to Cx -0x10 RW system control 1 +0x8210 RW system control 1 bit0 1 RW Reserved, must be 1 bit1 0 RW Reserved, must be 0 - bit4 1 RW Reserved, must be 0 - bit5 0 RW register clock gating enable + bit4 0 RW Reserved, must be 0 + bit5 1 RW register clock gating enable 0: read only, 1: read/write enable (Note that following registers does not require clock gating being enabled prior to write: 05 06 07 08 09 0c 0f 10 11 12 16 17 18 23 2e 40 41 42 43. In addition to that, this bit must be 1 when gesture mode is enabled) -0x31 RW on-pad command detection +0x8220 test mode status + bit5~bit4 RO number of buttons + 11 => 2, lbtn/rbtn + 10 => 4, lbtn/rbtn/scru/scrd + 01 => 6, lbtn/rbtn/scru/scrd/scrl/scrr + 00 => 6, lbtn/rbtn/scru/scrd/fbtn/bbtn + *only supported by H/W prior to Cx + +0x8231 RW on-pad command detection bit7 0 RW on-pad command left button down tag enable 0: disable, 1: enable + *only supported by H/W prior to Cx -0x34 RW on-pad command control 5 +0x8234 RW on-pad command control 5 bit4~bit0 0x05 RW XLO in 0s/4/1, so 03h = 0010.1b = 2.5 (Note that position unit is in 0.5 scanline) + *only supported by H/W prior to Cx bit7 0 RW on-pad tap zone enable 0: disable, 1: enable + *only supported by H/W prior to Cx -0x35 RW on-pad command control 6 +0x8235 RW on-pad command control 6 bit4~bit0 0x1d RW XHI in 0s/4/1, so 19h = 1100.1b = 12.5 (Note that position unit is in 0.5 scanline) + *only supported by H/W prior to Cx -0x36 RW on-pad command control 7 +0x8236 RW on-pad command control 7 bit4~bit0 0x04 RW YLO in 0s/4/1, so 03h = 0010.1b = 2.5 (Note that position unit is in 0.5 scanline) + *only supported by H/W prior to Cx -0x37 RW on-pad command control 8 +0x8237 RW on-pad command control 8 bit4~bit0 0x13 RW YHI in 0s/4/1, so 11h = 1000.1b = 8.5 (Note that position unit is in 0.5 scanline) + *only supported by H/W prior to Cx -0x40 RW system control 5 +0x8240 RW system control 5 bit1 0 RW FSP Intellimouse mode enable 0: disable, 1: enable + *only supported by H/W prior to Cx bit2 0 RW movement + abs. coordinate mode enable 0: disable, 1: enable @@ -537,6 +786,7 @@ offset width default r/w name bit 1 is not set. However, the format is different from that of bit 1. In addition, when bit 1 and bit 2 are set at the same time, bit 2 will override bit 1.) + *only supported by H/W prior to Cx bit3 0 RW abs. coordinate only mode enable 0: disable, 1: enable @@ -544,9 +794,11 @@ offset width default r/w name bit 1 is not set. However, the format is different from that of bit 1. In addition, when bit 1, bit 2 and bit 3 are set at the same time, bit 3 will override bit 1 and 2.) + *only supported by H/W prior to Cx bit5 0 RW auto switch enable 0: disable, 1: enable + *only supported by H/W prior to Cx bit6 0 RW G0 abs. + notify packet format enable 0: disable, 1: enable @@ -554,18 +806,68 @@ offset width default r/w name bit 2 and 3. That is, if any of those bit is 1, host will receive absolute coordinates; otherwise, host only receives packets with relative coordinate.) + *only supported by H/W prior to Cx bit7 0 RW EN_PS2_F2: PS/2 gesture mode 2nd finger packet enable 0: disable, 1: enable + *only supported by H/W prior to Cx -0x43 RW on-pad control +0x8243 RW on-pad control bit0 0 RW on-pad control enable 0: disable, 1: enable (Note that if this bit is cleared, bit 3/5 will be ineffective) + *only supported by H/W prior to Cx bit3 0 RW on-pad fix vertical scrolling enable 0: disable, 1: enable + *only supported by H/W prior to Cx bit5 0 RW on-pad fix horizontal scrolling enable 0: disable, 1: enable + *only supported by H/W prior to Cx + +0x8290 RW software control register 1 + bit0 0 RW absolute coordination mode + 0: disable, 1: enable + *supported since Cx + + bit1 0 RW gesture ID output + 0: disable, 1: enable + *supported since Cx + + bit2 0 RW two fingers' coordinates output + 0: disable, 1: enable + *supported since Cx + + bit3 0 RW finger up one packet output + 0: disable, 1: enable + *supported since Cx + + bit4 0 RW absolute coordination continuous mode + 0: disable, 1: enable + *supported since Cx + + bit6~bit5 00 RW gesture group selection + 00: basic + 01: suite + 10: suite pro + 11: advanced + *supported since Cx + + bit7 0 RW Bx packet output compatible mode + 0: disable, 1: enable *supported since Cx + *supported since Cx + + +0x833d RW on-pad command control 1 + bit7 1 RW on-pad command detection enable + 0: disable, 1: enable + *supported since Cx + +0x833e RW on-pad command detection + bit7 0 RW on-pad command left button down tag + enable. Works only in H/W based PS/2 + data packet mode. + 0: disable, 1: enable + *supported since Cx diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index 7a9e0b4b2903..506c7390c2b9 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt @@ -17,8 +17,8 @@ You can use common commands, such as cp and scp, to copy the memory image to a dump file on the local disk, or across the network to a remote system. -Kdump and kexec are currently supported on the x86, x86_64, ppc64 and ia64 -architectures. +Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64, +and s390x architectures. When the system kernel boots, it reserves a small section of memory for the dump-capture kernel. This ensures that ongoing Direct Memory Access @@ -34,11 +34,18 @@ Similarly on PPC64 machines first 32KB of physical memory is needed for booting regardless of where the kernel is loaded and to support 64K page size kexec backs up the first 64KB memory. +For s390x, when kdump is triggered, the crashkernel region is exchanged +with the region [0, crashkernel region size] and then the kdump kernel +runs in [0, crashkernel region size]. Therefore no relocatable kernel is +needed for s390x. + All of the necessary information about the system kernel's core image is encoded in the ELF format, and stored in a reserved area of memory before a crash. The physical address of the start of the ELF header is passed to the dump-capture kernel through the elfcorehdr= boot -parameter. +parameter. Optionally the size of the ELF header can also be passed +when using the elfcorehdr=[size[KMG]@]offset[KMG] syntax. + With the dump-capture kernel, you can access the memory image, or "old memory," in two ways: @@ -291,6 +298,10 @@ Boot into System Kernel The region may be automatically placed on ia64, see the dump-capture kernel config option notes above. + On s390x, typically use "crashkernel=xxM". The value of xx is dependent + on the memory consumption of the kdump system. In general this is not + dependent on the memory size of the production system. + Load the Dump-capture Kernel ============================ @@ -308,6 +319,8 @@ For ppc64: - Use vmlinux For ia64: - Use vmlinux or vmlinuz.gz +For s390x: + - Use image or bzImage If you are using a uncompressed vmlinux image then use following command @@ -337,6 +350,8 @@ For i386, x86_64 and ia64: For ppc64: "1 maxcpus=1 noirqdistrib reset_devices" +For s390x: + "1 maxcpus=1 cgroup_disable=memory" Notes on loading the dump-capture kernel: @@ -362,6 +377,20 @@ Notes on loading the dump-capture kernel: dump. Hence generally it is useful either to build a UP dump-capture kernel or specify maxcpus=1 option while loading dump-capture kernel. +* For s390x there are two kdump modes: If a ELF header is specified with + the elfcorehdr= kernel parameter, it is used by the kdump kernel as it + is done on all other architectures. If no elfcorehdr= kernel parameter is + specified, the s390x kdump kernel dynamically creates the header. The + second mode has the advantage that for CPU and memory hotplug, kdump has + not to be reloaded with kexec_load(). + +* For s390x systems with many attached devices the "cio_ignore" kernel + parameter should be used for the kdump kernel in order to prevent allocation + of kernel memory for devices that are not relevant for kdump. The same + applies to systems that use SCSI/FCP devices. In that case the + "allow_lun_scan" zfcp module parameter should be set to zero before + setting FCP devices online. + Kernel Panic ============ diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 81c287fad79d..eb93fd0ec734 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -329,6 +329,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted. is a lot of faster off - do not initialize any AMD IOMMU found in the system + force_isolation - Force device isolation for all + devices. The IOMMU driver is not + allowed anymore to lift isolation + requirements as needed. This option + does not override iommu=pt amijoy.map= [HW,JOY] Amiga joystick support Map of devices attached to JOY0DAT and JOY1DAT @@ -623,6 +628,25 @@ bytes respectively. Such letter suffixes can also be entirely omitted. no_debug_objects [KNL] Disable object debugging + debug_guardpage_minorder= + [KNL] When CONFIG_DEBUG_PAGEALLOC is set, this + parameter allows control of the order of pages that will + be intentionally kept free (and hence protected) by the + buddy allocator. Bigger value increase the probability + of catching random memory corruption, but reduce the + amount of memory for normal system use. The maximum + possible value is MAX_ORDER/2. Setting this parameter + to 1 or 2 should be enough to identify most random + memory corruption problems caused by bugs in kernel or + driver code when a CPU writes to (or reads from) a + random memory location. Note that there exists a class + of memory corruptions problems caused by buggy H/W or + F/W or by drivers badly programing DMA (basically when + memory is written at bus level and the CPU MMU is + bypassed) which are not detectable by + CONFIG_DEBUG_PAGEALLOC, hence this option will not help + tracking down these problems. + debugpat [X86] Enable PAT debugging decnet.addr= [HW,NET] @@ -1059,7 +1083,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted. nomerge forcesac soft - pt [x86, IA-64] + pt [x86, IA-64] + group_mf [x86, IA-64] + io7= [HW] IO7 for Marvel based alpha systems See comment before marvel_specify_io7 in @@ -1178,9 +1204,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs. Default is 0 (don't ignore, but inject #GP) - kvm.oos_shadow= [KVM] Disable out-of-sync shadow paging. - Default is 1 (enabled) - kvm.mmu_audit= [KVM] This is a R/W parameter which allows audit KVM MMU at runtime. Default is 0 (off) @@ -1630,12 +1653,17 @@ bytes respectively. Such letter suffixes can also be entirely omitted. The default is to return 64-bit inode numbers. nfs.nfs4_disable_idmapping= - [NFSv4] When set, this option disables the NFSv4 - idmapper on the client, but only if the mount - is using the 'sec=sys' security flavour. This may - make migration from legacy NFSv2/v3 systems easier - provided that the server has the appropriate support. - The default is to always enable NFSv4 idmapping. + [NFSv4] When set to the default of '1', this option + ensures that both the RPC level authentication + scheme and the NFS level operations agree to use + numeric uids/gids if the mount is using the + 'sec=sys' security flavour. In effect it is + disabling idmapping, which can make migration from + legacy NFSv2/v3 systems to NFSv4 easier. + Servers that do not support this mode of operation + will be autodetected by the client, and it will fall + back to using the idmapper. + To turn off this behaviour, set the value to '0'. nmi_debug= [KNL,AVR32,SH] Specify one or more actions to take when a NMI is triggered. @@ -1796,6 +1824,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. nomfgpt [X86-32] Disable Multi-Function General Purpose Timer usage (for AMD Geode machines). + nonmi_ipi [X86] Disable using NMI IPIs during panic/reboot to + shutdown the other cpus. Instead use the REBOOT_VECTOR + irq. + nopat [X86] Disable PAT (page attribute table extension of pagetables) support. @@ -1885,6 +1917,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted. arch_perfmon: [X86] Force use of architectural perfmon on Intel CPUs instead of the CPU specific event set. + timer: [X86] Force use of architectural NMI + timer mode (see also oprofile.timer + for generic hr timer mode) + [s390] Force legacy basic mode sampling + (report cpu_type "timer") oops=panic Always panic on oopses. Default is to just kill the process, but there is a small probability of @@ -2362,6 +2399,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. slram= [HW,MTD] + slab_max_order= [MM, SLAB] + Determines the maximum allowed order for slabs. + A high setting may cause OOMs due to memory + fragmentation. Defaults to 1 for systems with + more than 32MB of RAM, 0 otherwise. + slub_debug[=options[,slabs]] [MM, SLUB] Enabling slub_debug allows one to determine the culprit if slab objects become corrupted. Enabling @@ -2632,6 +2675,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. [USB] Start with the old device initialization scheme (default 0 = off). + usbcore.usbfs_memory_mb= + [USB] Memory limit (in MB) for buffers allocated by + usbfs (default = 16, 0 = max = 2047). + usbcore.use_both_schemes= [USB] Try the other device initialization scheme if the first one fails (default 1 = enabled). @@ -2750,11 +2797,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. functions are at fixed addresses, they make nice targets for exploits that can control RIP. - emulate Vsyscalls turn into traps and are emulated - reasonably safely. + emulate [default] Vsyscalls turn into traps and are + emulated reasonably safely. - native [default] Vsyscalls are native syscall - instructions. + native Vsyscalls are native syscall instructions. This is a little bit faster than trapping and makes a few dynamic recompilers work better than they would in emulation mode. diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt index abf768c681e2..5dbc99c04f6e 100644 --- a/Documentation/lockdep-design.txt +++ b/Documentation/lockdep-design.txt @@ -221,3 +221,66 @@ when the chain is validated for the first time, is then put into a hash table, which hash-table can be checked in a lockfree manner. If the locking chain occurs again later on, the hash table tells us that we dont have to validate the chain again. + +Troubleshooting: +---------------- + +The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes. +Exceeding this number will trigger the following lockdep warning: + + (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS)) + +By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical +desktop systems have less than 1,000 lock classes, so this warning +normally results from lock-class leakage or failure to properly +initialize locks. These two problems are illustrated below: + +1. Repeated module loading and unloading while running the validator + will result in lock-class leakage. The issue here is that each + load of the module will create a new set of lock classes for + that module's locks, but module unloading does not remove old + classes (see below discussion of reuse of lock classes for why). + Therefore, if that module is loaded and unloaded repeatedly, + the number of lock classes will eventually reach the maximum. + +2. Using structures such as arrays that have large numbers of + locks that are not explicitly initialized. For example, + a hash table with 8192 buckets where each bucket has its own + spinlock_t will consume 8192 lock classes -unless- each spinlock + is explicitly initialized at runtime, for example, using the + run-time spin_lock_init() as opposed to compile-time initializers + such as __SPIN_LOCK_UNLOCKED(). Failure to properly initialize + the per-bucket spinlocks would guarantee lock-class overflow. + In contrast, a loop that called spin_lock_init() on each lock + would place all 8192 locks into a single lock class. + + The moral of this story is that you should always explicitly + initialize your locks. + +One might argue that the validator should be modified to allow +lock classes to be reused. However, if you are tempted to make this +argument, first review the code and think through the changes that would +be required, keeping in mind that the lock classes to be removed are +likely to be linked into the lock-dependency graph. This turns out to +be harder to do than to say. + +Of course, if you do run out of lock classes, the next thing to do is +to find the offending lock classes. First, the following command gives +you the number of lock classes currently in use along with the maximum: + + grep "lock-classes" /proc/lockdep_stats + +This command produces the following output on a modest system: + + lock-classes: 748 [max: 8191] + +If the number allocated (748 above) increases continually over time, +then there is likely a leak. The following command can be used to +identify the leaking lock classes: + + grep "BD" /proc/lockdep + +Run the command and save the output, then compare against the output from +a later run of this command to identify the leakers. This same output +can also help you find situations where runtime lock initialization has +been omitted. diff --git a/Documentation/md.txt b/Documentation/md.txt index fc94770f44ab..993fba37b7d1 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -357,14 +357,14 @@ Each directory contains: written to, that device. state - A file recording the current state of the device in the array + A file recording the current state of the device in the array which can be a comma separated list of faulty - device has been kicked from active use due to - a detected fault or it has unacknowledged bad - blocks + a detected fault, or it has unacknowledged bad + blocks in_sync - device is a fully in-sync member of the array writemostly - device will only be subject to read - requests if there are no other options. + requests if there are no other options. This applies only to raid1 arrays. blocked - device has failed, and the failure hasn't been acknowledged yet by the metadata handler. @@ -374,6 +374,13 @@ Each directory contains: This includes spares that are in the process of being recovered to write_error - device has ever seen a write error. + want_replacement - device is (mostly) working but probably + should be replaced, either due to errors or + due to user request. + replacement - device is a replacement for another active + device with same raid_disk. + + This list may grow in future. This can be written to. Writing "faulty" simulates a failure on the device. @@ -386,6 +393,13 @@ Each directory contains: Writing "in_sync" sets the in_sync flag. Writing "write_error" sets writeerrorseen flag. Writing "-write_error" clears writeerrorseen flag. + Writing "want_replacement" is allowed at any time except to a + replacement device or a spare. It sets the flag. + Writing "-want_replacement" is allowed at any time. It clears + the flag. + Writing "replacement" or "-replacement" is only allowed before + starting the array. It sets or clears the flag. + This file responds to select/poll. Any change to 'faulty' or 'blocked' causes an event. diff --git a/Documentation/mmc/mmc-dev-attrs.txt b/Documentation/mmc/mmc-dev-attrs.txt index 8898a95b41e5..22ae8441489f 100644 --- a/Documentation/mmc/mmc-dev-attrs.txt +++ b/Documentation/mmc/mmc-dev-attrs.txt @@ -64,3 +64,13 @@ Note on Erase Size and Preferred Erase Size: size specified by the card. "preferred_erase_size" is in bytes. + +SD/MMC/SDIO Clock Gating Attribute +================================== + +Read and write access is provided to following attribute. +This attribute appears only if CONFIG_MMC_CLKGATE is enabled. + + clkgate_delay Tune the clock gating delay with desired value in milliseconds. + +echo <desired delay> > /sys/class/mmc_host/mmcX/clkgate_delay diff --git a/Documentation/mmc/mmc-dev-parts.txt b/Documentation/mmc/mmc-dev-parts.txt index 2db28b8e662f..f08d078d43cf 100644 --- a/Documentation/mmc/mmc-dev-parts.txt +++ b/Documentation/mmc/mmc-dev-parts.txt @@ -25,3 +25,16 @@ echo 0 > /sys/block/mmcblkXbootY/force_ro To re-enable read-only access: echo 1 > /sys/block/mmcblkXbootY/force_ro + +The boot partitions can also be locked read only until the next power on, +with: + +echo 1 > /sys/block/mmcblkXbootY/ro_lock_until_next_power_on + +This is a feature of the card and not of the kernel. If the card does +not support boot partition locking, the file will not exist. If the +feature has been disabled on the card, the file will be read-only. + +The boot partitions can also be locked permanently, but this feature is +not accessible through sysfs in order to avoid accidental or malicious +bricking. diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index bbce1215434a..9ad9ddeb384c 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -144,6 +144,8 @@ nfc.txt - The Linux Near Field Communication (NFS) subsystem. olympic.txt - IBM PCI Pit/Pit-Phy/Olympic Token Ring driver info. +openvswitch.txt + - Open vSwitch developer documentation. operstates.txt - Overview of network interface operational states. packet_mmap.txt diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt index c86d03f18a5b..221ad0cdf11f 100644 --- a/Documentation/networking/batman-adv.txt +++ b/Documentation/networking/batman-adv.txt @@ -200,15 +200,16 @@ abled during run time. Following log_levels are defined: 0 - All debug output disabled 1 - Enable messages related to routing / flooding / broadcasting -2 - Enable route or tt entry added / changed / deleted -3 - Enable all messages +2 - Enable messages related to route added / changed / deleted +4 - Enable messages related to translation table operations +7 - Enable all messages The debug output can be changed at runtime using the file /sys/class/net/bat0/mesh/log_level. e.g. # echo 2 > /sys/class/net/bat0/mesh/log_level -will enable debug messages for when routes or TTs change. +will enable debug messages for when routes change. BATCTL diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 91df678fb7f8..080ad26690ae 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -196,6 +196,23 @@ or, for backwards compatibility, the option value. E.g., The parameters are as follows: +active_slave + + Specifies the new active slave for modes that support it + (active-backup, balance-alb and balance-tlb). Possible values + are the name of any currently enslaved interface, or an empty + string. If a name is given, the slave and its link must be up in order + to be selected as the new active slave. If an empty string is + specified, the current active slave is cleared, and a new active + slave is selected automatically. + + Note that this is only available through the sysfs interface. No module + parameter by this name exists. + + The normal value of this option is the name of the currently + active slave, or the empty string if there is no active slave or + the current mode does not use an active slave. + ad_select Specifies the 802.3ad aggregation selection logic to use. The diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt index f41ea2405220..1dc1c24a7547 100644 --- a/Documentation/networking/ieee802154.txt +++ b/Documentation/networking/ieee802154.txt @@ -78,3 +78,30 @@ in software. This is currently WIP. See header include/net/mac802154.h and several drivers in drivers/ieee802154/. +6LoWPAN Linux implementation +============================ + +The IEEE 802.15.4 standard specifies an MTU of 128 bytes, yielding about 80 +octets of actual MAC payload once security is turned on, on a wireless link +with a link throughput of 250 kbps or less. The 6LoWPAN adaptation format +[RFC4944] was specified to carry IPv6 datagrams over such constrained links, +taking into account limited bandwidth, memory, or energy resources that are +expected in applications such as wireless Sensor Networks. [RFC4944] defines +a Mesh Addressing header to support sub-IP forwarding, a Fragmentation header +to support the IPv6 minimum MTU requirement [RFC2460], and stateless header +compression for IPv6 datagrams (LOWPAN_HC1 and LOWPAN_HC2) to reduce the +relatively large IPv6 and UDP headers down to (in the best case) several bytes. + +In Semptember 2011 the standard update was published - [RFC6282]. +It deprecates HC1 and HC2 compression and defines IPHC encoding format which is +used in this Linux implementation. + +All the code related to 6lowpan you may find in files: net/ieee802154/6lowpan.* + +To setup 6lowpan interface you need (busybox release > 1.17.0): +1. Add IEEE802.15.4 interface and initialize PANid; +2. Add 6lowpan interface by command like: + # ip link add link wpan0 name lowpan0 type lowpan +3. Set MAC (if needs): + # ip link set lowpan0 address de:ad:be:ef:ca:fe:ba:be +4. Bring up 'lowpan0' interface diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c index 65968fbf1e49..ac5debb2f16c 100644 --- a/Documentation/networking/ifenslave.c +++ b/Documentation/networking/ifenslave.c @@ -539,12 +539,14 @@ static int if_getconfig(char *ifname) metric = 0; } else metric = ifr.ifr_metric; + printf("The result of SIOCGIFMETRIC is %d\n", metric); strcpy(ifr.ifr_name, ifname); if (ioctl(skfd, SIOCGIFMTU, &ifr) < 0) mtu = 0; else mtu = ifr.ifr_mtu; + printf("The result of SIOCGIFMTU is %d\n", mtu); strcpy(ifr.ifr_name, ifname); if (ioctl(skfd, SIOCGIFDSTADDR, &ifr) < 0) { diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 589f2da5d545..ad3e80e17b4f 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -31,6 +31,16 @@ neigh/default/gc_thresh3 - INTEGER when using large numbers of interfaces and when communicating with large numbers of directly-connected peers. +neigh/default/unres_qlen_bytes - INTEGER + The maximum number of bytes which may be used by packets + queued for each unresolved address by other network layers. + (added in linux 3.3) + +neigh/default/unres_qlen - INTEGER + The maximum number of packets which may be queued for each + unresolved address by other network layers. + (deprecated in linux 3.3) : use unres_qlen_bytes instead. + mtu_expires - INTEGER Time, in seconds, that cached PMTU information is kept. @@ -165,6 +175,9 @@ tcp_congestion_control - STRING connections. The algorithm "reno" is always available, but additional choices may be available based on kernel configuration. Default is set as part of kernel configuration. + For passive connections, the listener congestion control choice + is inherited. + [see setsockopt(listenfd, SOL_TCP, TCP_CONGESTION, "name" ...) ] tcp_cookie_size - INTEGER Default size of TCP Cookie Transactions (TCPCT) option, that may be diff --git a/Documentation/networking/openvswitch.txt b/Documentation/networking/openvswitch.txt new file mode 100644 index 000000000000..b8a048b8df3a --- /dev/null +++ b/Documentation/networking/openvswitch.txt @@ -0,0 +1,195 @@ +Open vSwitch datapath developer documentation +============================================= + +The Open vSwitch kernel module allows flexible userspace control over +flow-level packet processing on selected network devices. It can be +used to implement a plain Ethernet switch, network device bonding, +VLAN processing, network access control, flow-based network control, +and so on. + +The kernel module implements multiple "datapaths" (analogous to +bridges), each of which can have multiple "vports" (analogous to ports +within a bridge). Each datapath also has associated with it a "flow +table" that userspace populates with "flows" that map from keys based +on packet headers and metadata to sets of actions. The most common +action forwards the packet to another vport; other actions are also +implemented. + +When a packet arrives on a vport, the kernel module processes it by +extracting its flow key and looking it up in the flow table. If there +is a matching flow, it executes the associated actions. If there is +no match, it queues the packet to userspace for processing (as part of +its processing, userspace will likely set up a flow to handle further +packets of the same type entirely in-kernel). + + +Flow key compatibility +---------------------- + +Network protocols evolve over time. New protocols become important +and existing protocols lose their prominence. For the Open vSwitch +kernel module to remain relevant, it must be possible for newer +versions to parse additional protocols as part of the flow key. It +might even be desirable, someday, to drop support for parsing +protocols that have become obsolete. Therefore, the Netlink interface +to Open vSwitch is designed to allow carefully written userspace +applications to work with any version of the flow key, past or future. + +To support this forward and backward compatibility, whenever the +kernel module passes a packet to userspace, it also passes along the +flow key that it parsed from the packet. Userspace then extracts its +own notion of a flow key from the packet and compares it against the +kernel-provided version: + + - If userspace's notion of the flow key for the packet matches the + kernel's, then nothing special is necessary. + + - If the kernel's flow key includes more fields than the userspace + version of the flow key, for example if the kernel decoded IPv6 + headers but userspace stopped at the Ethernet type (because it + does not understand IPv6), then again nothing special is + necessary. Userspace can still set up a flow in the usual way, + as long as it uses the kernel-provided flow key to do it. + + - If the userspace flow key includes more fields than the + kernel's, for example if userspace decoded an IPv6 header but + the kernel stopped at the Ethernet type, then userspace can + forward the packet manually, without setting up a flow in the + kernel. This case is bad for performance because every packet + that the kernel considers part of the flow must go to userspace, + but the forwarding behavior is correct. (If userspace can + determine that the values of the extra fields would not affect + forwarding behavior, then it could set up a flow anyway.) + +How flow keys evolve over time is important to making this work, so +the following sections go into detail. + + +Flow key format +--------------- + +A flow key is passed over a Netlink socket as a sequence of Netlink +attributes. Some attributes represent packet metadata, defined as any +information about a packet that cannot be extracted from the packet +itself, e.g. the vport on which the packet was received. Most +attributes, however, are extracted from headers within the packet, +e.g. source and destination addresses from Ethernet, IP, or TCP +headers. + +The <linux/openvswitch.h> header file defines the exact format of the +flow key attributes. For informal explanatory purposes here, we write +them as comma-separated strings, with parentheses indicating arguments +and nesting. For example, the following could represent a flow key +corresponding to a TCP packet that arrived on vport 1: + + in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4), + eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0, + frag=no), tcp(src=49163, dst=80) + +Often we ellipsize arguments not important to the discussion, e.g.: + + in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...) + + +Basic rule for evolving flow keys +--------------------------------- + +Some care is needed to really maintain forward and backward +compatibility for applications that follow the rules listed under +"Flow key compatibility" above. + +The basic rule is obvious: + + ------------------------------------------------------------------ + New network protocol support must only supplement existing flow + key attributes. It must not change the meaning of already defined + flow key attributes. + ------------------------------------------------------------------ + +This rule does have less-obvious consequences so it is worth working +through a few examples. Suppose, for example, that the kernel module +did not already implement VLAN parsing. Instead, it just interpreted +the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the +packet. The flow key for any packet with an 802.1Q header would look +essentially like this, ignoring metadata: + + eth(...), eth_type(0x8100) + +Naively, to add VLAN support, it makes sense to add a new "vlan" flow +key attribute to contain the VLAN tag, then continue to decode the +encapsulated headers beyond the VLAN tag using the existing field +definitions. With this change, an TCP packet in VLAN 10 would have a +flow key much like this: + + eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...) + +But this change would negatively affect a userspace application that +has not been updated to understand the new "vlan" flow key attribute. +The application could, following the flow compatibility rules above, +ignore the "vlan" attribute that it does not understand and therefore +assume that the flow contained IP packets. This is a bad assumption +(the flow only contains IP packets if one parses and skips over the +802.1Q header) and it could cause the application's behavior to change +across kernel versions even though it follows the compatibility rules. + +The solution is to use a set of nested attributes. This is, for +example, why 802.1Q support uses nested attributes. A TCP packet in +VLAN 10 is actually expressed as: + + eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800), + ip(proto=6, ...), tcp(...))) + +Notice how the "eth_type", "ip", and "tcp" flow key attributes are +nested inside the "encap" attribute. Thus, an application that does +not understand the "vlan" key will not see either of those attributes +and therefore will not misinterpret them. (Also, the outer eth_type +is still 0x8100, not changed to 0x0800.) + +Handling malformed packets +-------------------------- + +Don't drop packets in the kernel for malformed protocol headers, bad +checksums, etc. This would prevent userspace from implementing a +simple Ethernet switch that forwards every packet. + +Instead, in such a case, include an attribute with "empty" content. +It doesn't matter if the empty content could be valid protocol values, +as long as those values are rarely seen in practice, because userspace +can always forward all packets with those values to userspace and +handle them individually. + +For example, consider a packet that contains an IP header that +indicates protocol 6 for TCP, but which is truncated just after the IP +header, so that the TCP header is missing. The flow key for this +packet would include a tcp attribute with all-zero src and dst, like +this: + + eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0) + +As another example, consider a packet with an Ethernet type of 0x8100, +indicating that a VLAN TCI should follow, but which is truncated just +after the Ethernet type. The flow key for this packet would include +an all-zero-bits vlan and an empty encap attribute, like this: + + eth(...), eth_type(0x8100), vlan(0), encap() + +Unlike a TCP packet with source and destination ports 0, an +all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka +VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan +attribute expressly to allow this situation to be distinguished. +Thus, the flow key in this second example unambiguously indicates a +missing or malformed VLAN TCI. + +Other rules +----------- + +The other rules for flow keys are much less subtle: + + - Duplicate attributes are not allowed at a given nesting level. + + - Ordering of attributes is not significant. + + - When the kernel sends a given flow key to userspace, it always + composes it the same way. This allows userspace to hash and + compare entire flow keys that it may not be able to fully + interpret. diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt index 4acea6603720..1c08a4b0981f 100644 --- a/Documentation/networking/packet_mmap.txt +++ b/Documentation/networking/packet_mmap.txt @@ -155,7 +155,7 @@ As capture, each frame contains two parts: /* fill sockaddr_ll struct to prepare binding */ my_addr.sll_family = AF_PACKET; - my_addr.sll_protocol = ETH_P_ALL; + my_addr.sll_protocol = htons(ETH_P_ALL); my_addr.sll_ifindex = s_ifr.ifr_ifindex; /* bind socket to eth0 */ diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt index a177de21d28e..579994afbe06 100644 --- a/Documentation/networking/scaling.txt +++ b/Documentation/networking/scaling.txt @@ -208,7 +208,7 @@ The counter in rps_dev_flow_table values records the length of the current CPU's backlog when a packet in this flow was last enqueued. Each backlog queue has a head counter that is incremented on dequeue. A tail counter is computed as head counter + queue length. In other words, the counter -in rps_dev_flow_table[i] records the last element in flow i that has +in rps_dev_flow[i] records the last element in flow i that has been enqueued onto the currently designated CPU for flow i (of course, entry i is actually selected by hash and multiple flows may hash to the same entry i). @@ -224,7 +224,7 @@ following is true: - The current CPU's queue head counter >= the recorded tail counter value in rps_dev_flow[i] -- The current CPU is unset (equal to NR_CPUS) +- The current CPU is unset (equal to RPS_NO_CPU) - The current CPU is offline After this check, the packet is sent to the (possibly updated) current @@ -235,7 +235,7 @@ CPU. ==== RFS Configuration -RFS is only available if the kconfig symbol CONFIG_RFS is enabled (on +RFS is only available if the kconfig symbol CONFIG_RPS is enabled (on by default for SMP). The functionality remains disabled until explicitly configured. The number of entries in the global flow table is set through: @@ -258,7 +258,7 @@ For a single queue device, the rps_flow_cnt value for the single queue would normally be configured to the same value as rps_sock_flow_entries. For a multi-queue device, the rps_flow_cnt for each queue might be configured as rps_sock_flow_entries / N, where N is the number of -queues. So for instance, if rps_flow_entries is set to 32768 and there +queues. So for instance, if rps_sock_flow_entries is set to 32768 and there are 16 configured receive queues, rps_flow_cnt for each queue might be configured as 2048. diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index 8d67980fabe8..d0aeeadd264b 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -4,14 +4,16 @@ Copyright (C) 2007-2010 STMicroelectronics Ltd Author: Giuseppe Cavallaro <peppe.cavallaro@st.com> This is the driver for the MAC 10/100/1000 on-chip Ethernet controllers -(Synopsys IP blocks); it has been fully tested on STLinux platforms. +(Synopsys IP blocks). Currently this network device driver is for all STM embedded MAC/GMAC -(i.e. 7xxx/5xxx SoCs) and it's known working on other platforms i.e. ARM SPEAr. +(i.e. 7xxx/5xxx SoCs), SPEAr (arm), Loongson1B (mips) and XLINX XC2V3000 +FF1152AMT0221 D1215994A VIRTEX FPGA board. -DWC Ether MAC 10/100/1000 Universal version 3.41a and DWC Ether MAC 10/100 -Universal version 4.0 have been used for developing the first code -implementation. +DWC Ether MAC 10/100/1000 Universal version 3.60a (and older) and DWC Ether MAC 10/100 +Universal version 4.0 have been used for developing this driver. + +This driver supports both the platform bus and PCI. Please, for more information also visit: www.stlinux.com @@ -277,5 +279,5 @@ In fact, these can generate an huge amount of debug messages. 6) TODO: o XGMAC is not supported. - o Review the timer optimisation code to use an embedded device that will be - available in new chip generations. + o Add the EEE - Energy Efficient Ethernet + o Add the PTP - precision time protocol diff --git a/Documentation/networking/team.txt b/Documentation/networking/team.txt new file mode 100644 index 000000000000..5a013686b9ea --- /dev/null +++ b/Documentation/networking/team.txt @@ -0,0 +1,2 @@ +Team devices are driven from userspace via libteam library which is here: + https://github.com/jpirko/libteam diff --git a/Documentation/pinctrl.txt b/Documentation/pinctrl.txt index b04cb7d45a16..6727b92bc2fb 100644 --- a/Documentation/pinctrl.txt +++ b/Documentation/pinctrl.txt @@ -7,12 +7,9 @@ This subsystem deals with: - Multiplexing of pins, pads, fingers (etc) see below for details -The intention is to also deal with: - -- Software-controlled biasing and driving mode specific pins, such as - pull-up/down, open drain etc, load capacitance configuration when controlled - by software, etc. - +- Configuration of pins, pads, fingers (etc), such as software-controlled + biasing and driving mode specific pins, such as pull-up/down, open drain, + load capacitance etc. Top-level interface =================== @@ -32,7 +29,7 @@ Definition of PIN: be sparse - i.e. there may be gaps in the space with numbers where no pin exists. -When a PIN CONTROLLER is instatiated, it will register a descriptor to the +When a PIN CONTROLLER is instantiated, it will register a descriptor to the pin control framework, and this descriptor contains an array of pin descriptors describing the pins handled by this specific pin controller. @@ -61,14 +58,14 @@ this in our driver: #include <linux/pinctrl/pinctrl.h> -const struct pinctrl_pin_desc __refdata foo_pins[] = { - PINCTRL_PIN(0, "A1"), - PINCTRL_PIN(1, "A2"), - PINCTRL_PIN(2, "A3"), +const struct pinctrl_pin_desc foo_pins[] = { + PINCTRL_PIN(0, "A8"), + PINCTRL_PIN(1, "B8"), + PINCTRL_PIN(2, "C8"), ... - PINCTRL_PIN(61, "H6"), - PINCTRL_PIN(62, "H7"), - PINCTRL_PIN(63, "H8"), + PINCTRL_PIN(61, "F1"), + PINCTRL_PIN(62, "G1"), + PINCTRL_PIN(63, "H1"), }; static struct pinctrl_desc foo_desc = { @@ -88,11 +85,16 @@ int __init foo_probe(void) pr_err("could not register foo pin driver\n"); } +To enable the pinctrl subsystem and the subgroups for PINMUX and PINCONF and +selected drivers, you need to select them from your machine's Kconfig entry, +since these are so tightly integrated with the machines they are used on. +See for example arch/arm/mach-u300/Kconfig for an example. + Pins usually have fancier names than this. You can find these in the dataheet for your chip. Notice that the core pinctrl.h file provides a fancy macro called PINCTRL_PIN() to create the struct entries. As you can see I enumerated -the pins from 0 in the upper left corner to 63 in the lower right corner, -this enumeration was arbitrarily chosen, in practice you need to think +the pins from 0 in the upper left corner to 63 in the lower right corner. +This enumeration was arbitrarily chosen, in practice you need to think through your numbering system so that it matches the layout of registers and such things in your driver, or the code may become complicated. You must also consider matching of offsets to the GPIO ranges that may be handled by @@ -133,8 +135,8 @@ struct foo_group { const unsigned num_pins; }; -static unsigned int spi0_pins[] = { 0, 8, 16, 24 }; -static unsigned int i2c0_pins[] = { 24, 25 }; +static const unsigned int spi0_pins[] = { 0, 8, 16, 24 }; +static const unsigned int i2c0_pins[] = { 24, 25 }; static const struct foo_group foo_groups[] = { { @@ -193,6 +195,88 @@ structure, for example specific register ranges associated with each group and so on. +Pin configuration +================= + +Pins can sometimes be software-configured in an various ways, mostly related +to their electronic properties when used as inputs or outputs. For example you +may be able to make an output pin high impedance, or "tristate" meaning it is +effectively disconnected. You may be able to connect an input pin to VDD or GND +using a certain resistor value - pull up and pull down - so that the pin has a +stable value when nothing is driving the rail it is connected to, or when it's +unconnected. + +For example, a platform may do this: + +ret = pin_config_set("foo-dev", "FOO_GPIO_PIN", PLATFORM_X_PULL_UP); + +To pull up a pin to VDD. The pin configuration driver implements callbacks for +changing pin configuration in the pin controller ops like this: + +#include <linux/pinctrl/pinctrl.h> +#include <linux/pinctrl/pinconf.h> +#include "platform_x_pindefs.h" + +static int foo_pin_config_get(struct pinctrl_dev *pctldev, + unsigned offset, + unsigned long *config) +{ + struct my_conftype conf; + + ... Find setting for pin @ offset ... + + *config = (unsigned long) conf; +} + +static int foo_pin_config_set(struct pinctrl_dev *pctldev, + unsigned offset, + unsigned long config) +{ + struct my_conftype *conf = (struct my_conftype *) config; + + switch (conf) { + case PLATFORM_X_PULL_UP: + ... + } + } +} + +static int foo_pin_config_group_get (struct pinctrl_dev *pctldev, + unsigned selector, + unsigned long *config) +{ + ... +} + +static int foo_pin_config_group_set (struct pinctrl_dev *pctldev, + unsigned selector, + unsigned long config) +{ + ... +} + +static struct pinconf_ops foo_pconf_ops = { + .pin_config_get = foo_pin_config_get, + .pin_config_set = foo_pin_config_set, + .pin_config_group_get = foo_pin_config_group_get, + .pin_config_group_set = foo_pin_config_group_set, +}; + +/* Pin config operations are handled by some pin controller */ +static struct pinctrl_desc foo_desc = { + ... + .confops = &foo_pconf_ops, +}; + +Since some controllers have special logic for handling entire groups of pins +they can exploit the special whole-group pin control function. The +pin_config_group_set() callback is allowed to return the error code -EAGAIN, +for groups it does not want to handle, or if it just wants to do some +group-level handling and then fall through to iterate over all pins, in which +case each individual pin will be treated by separate pin_config_set() calls as +well. + + Interaction with the GPIO subsystem =================================== @@ -214,19 +298,20 @@ static struct pinctrl_gpio_range gpio_range_a = { .name = "chip a", .id = 0, .base = 32, + .pin_base = 32, .npins = 16, .gc = &chip_a; }; -static struct pinctrl_gpio_range gpio_range_a = { +static struct pinctrl_gpio_range gpio_range_b = { .name = "chip b", .id = 0, .base = 48, + .pin_base = 64, .npins = 8, .gc = &chip_b; }; - { struct pinctrl_dev *pctl; ... @@ -235,42 +320,39 @@ static struct pinctrl_gpio_range gpio_range_a = { } So this complex system has one pin controller handling two different -GPIO chips. Chip a has 16 pins and chip b has 8 pins. They are mapped in -the global GPIO pin space at: +GPIO chips. "chip a" has 16 pins and "chip b" has 8 pins. The "chip a" and +"chip b" have different .pin_base, which means a start pin number of the +GPIO range. + +The GPIO range of "chip a" starts from the GPIO base of 32 and actual +pin range also starts from 32. However "chip b" has different starting +offset for the GPIO range and pin range. The GPIO range of "chip b" starts +from GPIO number 48, while the pin range of "chip b" starts from 64. + +We can convert a gpio number to actual pin number using this "pin_base". +They are mapped in the global GPIO pin space at: -chip a: [32 .. 47] -chip b: [48 .. 55] +chip a: + - GPIO range : [32 .. 47] + - pin range : [32 .. 47] +chip b: + - GPIO range : [48 .. 55] + - pin range : [64 .. 71] When GPIO-specific functions in the pin control subsystem are called, these -ranges will be used to look up the apropriate pin controller by inspecting +ranges will be used to look up the appropriate pin controller by inspecting and matching the pin to the pin ranges across all controllers. When a pin controller handling the matching range is found, GPIO-specific functions will be called on that specific pin controller. For all functionalities dealing with pin biasing, pin muxing etc, the pin controller subsystem will subtract the range's .base offset from the passed -in gpio pin number, and pass that on to the pin control driver, so the driver -will get an offset into its handled number range. Further it is also passed +in gpio number, and add the ranges's .pin_base offset to retrive a pin number. +After that, the subsystem passes it on to the pin control driver, so the driver +will get an pin number into its handled number range. Further it is also passed the range ID value, so that the pin controller knows which range it should deal with. -For example: if a user issues pinctrl_gpio_set_foo(50), the pin control -subsystem will find that the second range on this pin controller matches, -subtract the base 48 and call the -pinctrl_driver_gpio_set_foo(pinctrl, range, 2) where the latter function has -this signature: - -int pinctrl_driver_gpio_set_foo(struct pinctrl_dev *pctldev, - struct pinctrl_gpio_range *rangeid, - unsigned offset); - -Now the driver knows that we want to do some GPIO-specific operation on the -second GPIO range handled by "chip b", at offset 2 in that specific range. - -(If the GPIO subsystem is ever refactored to use a local per-GPIO controller -pin space, this mapping will need to be augmented accordingly.) - - PINMUX interfaces ================= @@ -438,7 +520,7 @@ you. Define enumerators only for the pins you can control if that makes sense. Assumptions: -We assume that the number possible function maps to pin groups is limited by +We assume that the number of possible function maps to pin groups is limited by the hardware. I.e. we assume that there is no system where any function can be mapped to any pin, like in a phone exchange. So the available pins groups for a certain function will be limited to a few choices (say up to eight or so), @@ -585,7 +667,7 @@ int foo_list_funcs(struct pinctrl_dev *pctldev, unsigned selector) const char *foo_get_fname(struct pinctrl_dev *pctldev, unsigned selector) { - return myfuncs[selector].name; + return foo_functions[selector].name; } static int foo_get_groups(struct pinctrl_dev *pctldev, unsigned selector, @@ -600,16 +682,16 @@ static int foo_get_groups(struct pinctrl_dev *pctldev, unsigned selector, int foo_enable(struct pinctrl_dev *pctldev, unsigned selector, unsigned group) { - u8 regbit = (1 << group); + u8 regbit = (1 << selector + group); writeb((readb(MUX)|regbit), MUX) return 0; } -int foo_disable(struct pinctrl_dev *pctldev, unsigned selector, +void foo_disable(struct pinctrl_dev *pctldev, unsigned selector, unsigned group) { - u8 regbit = (1 << group); + u8 regbit = (1 << selector + group); writeb((readb(MUX) & ~(regbit)), MUX) return 0; @@ -647,6 +729,17 @@ All the above functions are mandatory to implement for a pinmux driver. Pinmux interaction with the GPIO subsystem ========================================== +The public pinmux API contains two functions named pinmux_request_gpio() +and pinmux_free_gpio(). These two functions shall *ONLY* be called from +gpiolib-based drivers as part of their gpio_request() and +gpio_free() semantics. Likewise the pinmux_gpio_direction_[input|output] +shall only be called from within respective gpio_direction_[input|output] +gpiolib implementation. + +NOTE that platforms and individual drivers shall *NOT* request GPIO pins to be +muxed in. Instead, implement a proper gpiolib driver and have that driver +request proper muxing for its pins. + The function list could become long, especially if you can convert every individual pin into a GPIO pin independent of any other pins, and then try the approach to define every pin as a function. @@ -654,19 +747,24 @@ the approach to define every pin as a function. In this case, the function array would become 64 entries for each GPIO setting and then the device functions. -For this reason there is an additional function a pinmux driver can implement -to enable only GPIO on an individual pin: .gpio_request_enable(). The same -.free() function as for other functions is assumed to be usable also for -GPIO pins. +For this reason there are two functions a pinmux driver can implement +to enable only GPIO on an individual pin: .gpio_request_enable() and +.gpio_disable_free(). This function will pass in the affected GPIO range identified by the pin controller core, so you know which GPIO pins are being affected by the request operation. -Alternatively it is fully allowed to use named functions for each GPIO -pin, the pinmux_request_gpio() will attempt to obtain the function "gpioN" -where "N" is the global GPIO pin number if no special GPIO-handler is -registered. +If your driver needs to have an indication from the framework of whether the +GPIO pin shall be used for input or output you can implement the +.gpio_set_direction() function. As described this shall be called from the +gpiolib driver and the affected GPIO range, pin offset and desired direction +will be passed along to this function. + +Alternatively to using these special functions, it is fully allowed to use +named functions for each GPIO pin, the pinmux_request_gpio() will attempt to +obtain the function "gpioN" where "N" is the global GPIO pin number if no +special GPIO-handler is registered. Pinmux board/machine configuration @@ -683,19 +781,19 @@ spi on the second function mapping: #include <linux/pinctrl/machine.h> -static struct pinmux_map pmx_mapping[] = { +static const struct pinmux_map __initdata pmx_mapping[] = { { - .ctrl_dev_name = "pinctrl.0", + .ctrl_dev_name = "pinctrl-foo", .function = "spi0", .dev_name = "foo-spi.0", }, { - .ctrl_dev_name = "pinctrl.0", + .ctrl_dev_name = "pinctrl-foo", .function = "i2c0", .dev_name = "foo-i2c.0", }, { - .ctrl_dev_name = "pinctrl.0", + .ctrl_dev_name = "pinctrl-foo", .function = "mmc0", .dev_name = "foo-mmc.0", }, @@ -714,14 +812,14 @@ for example if they are not yet instantiated or cumbersome to obtain. You register this pinmux mapping to the pinmux subsystem by simply: - ret = pinmux_register_mappings(&pmx_mapping, ARRAY_SIZE(pmx_mapping)); + ret = pinmux_register_mappings(pmx_mapping, ARRAY_SIZE(pmx_mapping)); Since the above construct is pretty common there is a helper macro to make -it even more compact which assumes you want to use pinctrl.0 and position +it even more compact which assumes you want to use pinctrl-foo and position 0 for mapping, for example: -static struct pinmux_map pmx_mapping[] = { - PINMUX_MAP_PRIMARY("I2CMAP", "i2c0", "foo-i2c.0"), +static struct pinmux_map __initdata pmx_mapping[] = { + PINMUX_MAP("I2CMAP", "pinctrl-foo", "i2c0", "foo-i2c.0"), }; @@ -734,14 +832,14 @@ As it is possible to map a function to different groups of pins an optional ... { .name = "spi0-pos-A", - .ctrl_dev_name = "pinctrl.0", + .ctrl_dev_name = "pinctrl-foo", .function = "spi0", .group = "spi0_0_grp", .dev_name = "foo-spi.0", }, { .name = "spi0-pos-B", - .ctrl_dev_name = "pinctrl.0", + .ctrl_dev_name = "pinctrl-foo", .function = "spi0", .group = "spi0_1_grp", .dev_name = "foo-spi.0", @@ -760,44 +858,44 @@ case), we define a mapping like this: ... { .name "2bit" - .ctrl_dev_name = "pinctrl.0", + .ctrl_dev_name = "pinctrl-foo", .function = "mmc0", - .group = "mmc0_0_grp", + .group = "mmc0_1_grp", .dev_name = "foo-mmc.0", }, { .name "4bit" - .ctrl_dev_name = "pinctrl.0", + .ctrl_dev_name = "pinctrl-foo", .function = "mmc0", - .group = "mmc0_0_grp", + .group = "mmc0_1_grp", .dev_name = "foo-mmc.0", }, { .name "4bit" - .ctrl_dev_name = "pinctrl.0", + .ctrl_dev_name = "pinctrl-foo", .function = "mmc0", - .group = "mmc0_1_grp", + .group = "mmc0_2_grp", .dev_name = "foo-mmc.0", }, { .name "8bit" - .ctrl_dev_name = "pinctrl.0", + .ctrl_dev_name = "pinctrl-foo", .function = "mmc0", - .group = "mmc0_0_grp", + .group = "mmc0_1_grp", .dev_name = "foo-mmc.0", }, { .name "8bit" - .ctrl_dev_name = "pinctrl.0", + .ctrl_dev_name = "pinctrl-foo", .function = "mmc0", - .group = "mmc0_1_grp", + .group = "mmc0_2_grp", .dev_name = "foo-mmc.0", }, { .name "8bit" - .ctrl_dev_name = "pinctrl.0", + .ctrl_dev_name = "pinctrl-foo", .function = "mmc0", - .group = "mmc0_2_grp", + .group = "mmc0_3_grp", .dev_name = "foo-mmc.0", }, ... @@ -898,7 +996,7 @@ like this: { .name "POWERMAP" - .ctrl_dev_name = "pinctrl.0", + .ctrl_dev_name = "pinctrl-foo", .function = "power_func", .hog_on_boot = true, }, diff --git a/Documentation/power/charger-manager.txt b/Documentation/power/charger-manager.txt new file mode 100644 index 000000000000..fdcca991df30 --- /dev/null +++ b/Documentation/power/charger-manager.txt @@ -0,0 +1,163 @@ +Charger Manager + (C) 2011 MyungJoo Ham <myungjoo.ham@samsung.com>, GPL + +Charger Manager provides in-kernel battery charger management that +requires temperature monitoring during suspend-to-RAM state +and where each battery may have multiple chargers attached and the userland +wants to look at the aggregated information of the multiple chargers. + +Charger Manager is a platform_driver with power-supply-class entries. +An instance of Charger Manager (a platform-device created with Charger-Manager) +represents an independent battery with chargers. If there are multiple +batteries with their own chargers acting independently in a system, +the system may need multiple instances of Charger Manager. + +1. Introduction +=============== + +Charger Manager supports the following: + +* Support for multiple chargers (e.g., a device with USB, AC, and solar panels) + A system may have multiple chargers (or power sources) and some of + they may be activated at the same time. Each charger may have its + own power-supply-class and each power-supply-class can provide + different information about the battery status. This framework + aggregates charger-related information from multiple sources and + shows combined information as a single power-supply-class. + +* Support for in suspend-to-RAM polling (with suspend_again callback) + While the battery is being charged and the system is in suspend-to-RAM, + we may need to monitor the battery health by looking at the ambient or + battery temperature. We can accomplish this by waking up the system + periodically. However, such a method wakes up devices unncessary for + monitoring the battery health and tasks, and user processes that are + supposed to be kept suspended. That, in turn, incurs unnecessary power + consumption and slow down charging process. Or even, such peak power + consumption can stop chargers in the middle of charging + (external power input < device power consumption), which not + only affects the charging time, but the lifespan of the battery. + + Charger Manager provides a function "cm_suspend_again" that can be + used as suspend_again callback of platform_suspend_ops. If the platform + requires tasks other than cm_suspend_again, it may implement its own + suspend_again callback that calls cm_suspend_again in the middle. + Normally, the platform will need to resume and suspend some devices + that are used by Charger Manager. + +2. Global Charger-Manager Data related with suspend_again +======================================================== +In order to setup Charger Manager with suspend-again feature +(in-suspend monitoring), the user should provide charger_global_desc +with setup_charger_manager(struct charger_global_desc *). +This charger_global_desc data for in-suspend monitoring is global +as the name suggests. Thus, the user needs to provide only once even +if there are multiple batteries. If there are multiple batteries, the +multiple instances of Charger Manager share the same charger_global_desc +and it will manage in-suspend monitoring for all instances of Charger Manager. + +The user needs to provide all the two entries properly in order to activate +in-suspend monitoring: + +struct charger_global_desc { + +char *rtc_name; + : The name of rtc (e.g., "rtc0") used to wakeup the system from + suspend for Charger Manager. The alarm interrupt (AIE) of the rtc + should be able to wake up the system from suspend. Charger Manager + saves and restores the alarm value and use the previously-defined + alarm if it is going to go off earlier than Charger Manager so that + Charger Manager does not interfere with previously-defined alarms. + +bool (*rtc_only_wakeup)(void); + : This callback should let CM know whether + the wakeup-from-suspend is caused only by the alarm of "rtc" in the + same struct. If there is any other wakeup source triggered the + wakeup, it should return false. If the "rtc" is the only wakeup + reason, it should return true. +}; + +3. How to setup suspend_again +============================= +Charger Manager provides a function "extern bool cm_suspend_again(void)". +When cm_suspend_again is called, it monitors every battery. The suspend_ops +callback of the system's platform_suspend_ops can call cm_suspend_again +function to know whether Charger Manager wants to suspend again or not. +If there are no other devices or tasks that want to use suspend_again +feature, the platform_suspend_ops may directly refer to cm_suspend_again +for its suspend_again callback. + +The cm_suspend_again() returns true (meaning "I want to suspend again") +if the system was woken up by Charger Manager and the polling +(in-suspend monitoring) results in "normal". + +4. Charger-Manager Data (struct charger_desc) +============================================= +For each battery charged independently from other batteries (if a series of +batteries are charged by a single charger, they are counted as one independent +battery), an instance of Charger Manager is attached to it. + +struct charger_desc { + +char *psy_name; + : The power-supply-class name of the battery. Default is + "battery" if psy_name is NULL. Users can access the psy entries + at "/sys/class/power_supply/[psy_name]/". + +enum polling_modes polling_mode; + : CM_POLL_DISABLE: do not poll this battery. + CM_POLL_ALWAYS: always poll this battery. + CM_POLL_EXTERNAL_POWER_ONLY: poll this battery if and only if + an external power source is attached. + CM_POLL_CHARGING_ONLY: poll this battery if and only if the + battery is being charged. + +unsigned int fullbatt_uV; + : If specified with a non-zero value, Charger Manager assumes + that the battery is full (capacity = 100) if the battery is not being + charged and the battery voltage is equal to or greater than + fullbatt_uV. + +unsigned int polling_interval_ms; + : Required polling interval in ms. Charger Manager will poll + this battery every polling_interval_ms or more frequently. + +enum data_source battery_present; + CM_FUEL_GAUGE: get battery presence information from fuel gauge. + CM_CHARGER_STAT: get battery presence from chargers. + +char **psy_charger_stat; + : An array ending with NULL that has power-supply-class names of + chargers. Each power-supply-class should provide "PRESENT" (if + battery_present is "CM_CHARGER_STAT"), "ONLINE" (shows whether an + external power source is attached or not), and "STATUS" (shows whether + the battery is {"FULL" or not FULL} or {"FULL", "Charging", + "Discharging", "NotCharging"}). + +int num_charger_regulators; +struct regulator_bulk_data *charger_regulators; + : Regulators representing the chargers in the form for + regulator framework's bulk functions. + +char *psy_fuel_gauge; + : Power-supply-class name of the fuel gauge. + +int (*temperature_out_of_range)(int *mC); +bool measure_battery_temp; + : This callback returns 0 if the temperature is safe for charging, + a positive number if it is too hot to charge, and a negative number + if it is too cold to charge. With the variable mC, the callback returns + the temperature in 1/1000 of centigrade. + The source of temperature can be battery or ambient one according to + the value of measure_battery_temp. +}; + +5. Other Considerations +======================= + +At the charger/battery-related events such as battery-pulled-out, +charger-pulled-out, charger-inserted, DCIN-over/under-voltage, charger-stopped, +and others critical to chargers, the system should be configured to wake up. +At least the following should wake up the system from a suspend: +a) charger-on/off b) external-power-in/out c) battery-in/out (while charging) + +It is usually accomplished by configuring the PMIC as a wakeup source. diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt index 3139fb505dce..20af7def23c8 100644 --- a/Documentation/power/devices.txt +++ b/Documentation/power/devices.txt @@ -126,7 +126,9 @@ The core methods to suspend and resume devices reside in struct dev_pm_ops pointed to by the ops member of struct dev_pm_domain, or by the pm member of struct bus_type, struct device_type and struct class. They are mostly of interest to the people writing infrastructure for platforms and buses, like PCI -or USB, or device type and device class drivers. +or USB, or device type and device class drivers. They also are relevant to the +writers of device drivers whose subsystems (PM domains, device types, device +classes and bus types) don't provide all power management methods. Bus drivers implement these methods as appropriate for the hardware and the drivers using it; PCI works differently from USB, and so on. Not many people @@ -268,32 +270,35 @@ various phases always run after tasks have been frozen and before they are unfrozen. Furthermore, the *_noirq phases run at a time when IRQ handlers have been disabled (except for those marked with the IRQF_NO_SUSPEND flag). -All phases use PM domain, bus, type, or class callbacks (that is, methods -defined in dev->pm_domain->ops, dev->bus->pm, dev->type->pm, or dev->class->pm). -These callbacks are regarded by the PM core as mutually exclusive. Moreover, -PM domain callbacks always take precedence over bus, type and class callbacks, -while type callbacks take precedence over bus and class callbacks, and class -callbacks take precedence over bus callbacks. To be precise, the following -rules are used to determine which callback to execute in the given phase: +All phases use PM domain, bus, type, class or driver callbacks (that is, methods +defined in dev->pm_domain->ops, dev->bus->pm, dev->type->pm, dev->class->pm or +dev->driver->pm). These callbacks are regarded by the PM core as mutually +exclusive. Moreover, PM domain callbacks always take precedence over all of the +other callbacks and, for example, type callbacks take precedence over bus, class +and driver callbacks. To be precise, the following rules are used to determine +which callback to execute in the given phase: - 1. If dev->pm_domain is present, the PM core will attempt to execute the - callback included in dev->pm_domain->ops. If that callback is not - present, no action will be carried out for the given device. + 1. If dev->pm_domain is present, the PM core will choose the callback + included in dev->pm_domain->ops for execution 2. Otherwise, if both dev->type and dev->type->pm are present, the callback - included in dev->type->pm will be executed. + included in dev->type->pm will be chosen for execution. 3. Otherwise, if both dev->class and dev->class->pm are present, the - callback included in dev->class->pm will be executed. + callback included in dev->class->pm will be chosen for execution. 4. Otherwise, if both dev->bus and dev->bus->pm are present, the callback - included in dev->bus->pm will be executed. + included in dev->bus->pm will be chosen for execution. This allows PM domains and device types to override callbacks provided by bus types or device classes if necessary. -These callbacks may in turn invoke device- or driver-specific methods stored in -dev->driver->pm, but they don't have to. +The PM domain, type, class and bus callbacks may in turn invoke device- or +driver-specific methods stored in dev->driver->pm, but they don't have to do +that. + +If the subsystem callback chosen for execution is not present, the PM core will +execute the corresponding method from dev->driver->pm instead if there is one. Entering System Suspend diff --git a/Documentation/power/freezing-of-tasks.txt b/Documentation/power/freezing-of-tasks.txt index 316c2ba187f4..6ccb68f68da6 100644 --- a/Documentation/power/freezing-of-tasks.txt +++ b/Documentation/power/freezing-of-tasks.txt @@ -21,7 +21,7 @@ freeze_processes() (defined in kernel/power/process.c) is called. It executes try_to_freeze_tasks() that sets TIF_FREEZE for all of the freezable tasks and either wakes them up, if they are kernel threads, or sends fake signals to them, if they are user space processes. A task that has TIF_FREEZE set, should react -to it by calling the function called refrigerator() (defined in +to it by calling the function called __refrigerator() (defined in kernel/freezer.c), which sets the task's PF_FROZEN flag, changes its state to TASK_UNINTERRUPTIBLE and makes it loop until PF_FROZEN is cleared for it. Then, we say that the task is 'frozen' and therefore the set of functions @@ -29,10 +29,10 @@ handling this mechanism is referred to as 'the freezer' (these functions are defined in kernel/power/process.c, kernel/freezer.c & include/linux/freezer.h). User space processes are generally frozen before kernel threads. -It is not recommended to call refrigerator() directly. Instead, it is -recommended to use the try_to_freeze() function (defined in -include/linux/freezer.h), that checks the task's TIF_FREEZE flag and makes the -task enter refrigerator() if the flag is set. +__refrigerator() must not be called directly. Instead, use the +try_to_freeze() function (defined in include/linux/freezer.h), that checks +the task's TIF_FREEZE flag and makes the task enter __refrigerator() if the +flag is set. For user space processes try_to_freeze() is called automatically from the signal-handling code, but the freezable kernel threads need to call it @@ -61,13 +61,13 @@ wait_event_freezable() and wait_event_freezable_timeout() macros. After the system memory state has been restored from a hibernation image and devices have been reinitialized, the function thaw_processes() is called in order to clear the PF_FROZEN flag for each frozen task. Then, the tasks that -have been frozen leave refrigerator() and continue running. +have been frozen leave __refrigerator() and continue running. III. Which kernel threads are freezable? Kernel threads are not freezable by default. However, a kernel thread may clear PF_NOFREEZE for itself by calling set_freezable() (the resetting of PF_NOFREEZE -directly is strongly discouraged). From this point it is regarded as freezable +directly is not allowed). From this point it is regarded as freezable and must call try_to_freeze() in a suitable place. IV. Why do we do that? @@ -176,3 +176,28 @@ tasks, since it generally exists anyway. A driver must have all firmwares it may need in RAM before suspend() is called. If keeping them is not practical, for example due to their size, they must be requested early enough using the suspend notifier API described in notifiers.txt. + +VI. Are there any precautions to be taken to prevent freezing failures? + +Yes, there are. + +First of all, grabbing the 'pm_mutex' lock to mutually exclude a piece of code +from system-wide sleep such as suspend/hibernation is not encouraged. +If possible, that piece of code must instead hook onto the suspend/hibernation +notifiers to achieve mutual exclusion. Look at the CPU-Hotplug code +(kernel/cpu.c) for an example. + +However, if that is not feasible, and grabbing 'pm_mutex' is deemed necessary, +it is strongly discouraged to directly call mutex_[un]lock(&pm_mutex) since +that could lead to freezing failures, because if the suspend/hibernate code +successfully acquired the 'pm_mutex' lock, and hence that other entity failed +to acquire the lock, then that task would get blocked in TASK_UNINTERRUPTIBLE +state. As a consequence, the freezer would not be able to freeze that task, +leading to freezing failure. + +However, the [un]lock_system_sleep() APIs are safe to use in this scenario, +since they ask the freezer to skip freezing this task, since it is anyway +"frozen enough" as it is blocked on 'pm_mutex', which will be released +only after the entire suspend/hibernation sequence is complete. +So, to summarize, use [un]lock_system_sleep() instead of directly using +mutex_[un]lock(&pm_mutex). That would prevent freezing failures. diff --git a/Documentation/power/regulator/regulator.txt b/Documentation/power/regulator/regulator.txt index 3f8b528f237e..e272d9909e39 100644 --- a/Documentation/power/regulator/regulator.txt +++ b/Documentation/power/regulator/regulator.txt @@ -12,7 +12,7 @@ Drivers can register a regulator by calling :- struct regulator_dev *regulator_register(struct regulator_desc *regulator_desc, struct device *dev, struct regulator_init_data *init_data, - void *driver_data); + void *driver_data, struct device_node *of_node); This will register the regulators capabilities and operations to the regulator core. diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt index c2ae8bf77d46..4abe83e1045a 100644 --- a/Documentation/power/runtime_pm.txt +++ b/Documentation/power/runtime_pm.txt @@ -57,6 +57,10 @@ the following: 4. Bus type of the device, if both dev->bus and dev->bus->pm are present. +If the subsystem chosen by applying the above rules doesn't provide the relevant +callback, the PM core will invoke the corresponding driver callback stored in +dev->driver->pm directly (if present). + The PM core always checks which callback to use in the order given above, so the priority order of callbacks from high to low is: PM domain, device type, class and bus type. Moreover, the high-priority one will always take precedence over @@ -64,86 +68,88 @@ a low-priority one. The PM domain, bus type, device type and class callbacks are referred to as subsystem-level callbacks in what follows. By default, the callbacks are always invoked in process context with interrupts -enabled. However, subsystems can use the pm_runtime_irq_safe() helper function -to tell the PM core that their ->runtime_suspend(), ->runtime_resume() and -->runtime_idle() callbacks may be invoked in atomic context with interrupts -disabled for a given device. This implies that the callback routines in -question must not block or sleep, but it also means that the synchronous helper -functions listed at the end of Section 4 may be used for that device within an -interrupt handler or generally in an atomic context. - -The subsystem-level suspend callback is _entirely_ _responsible_ for handling -the suspend of the device as appropriate, which may, but need not include -executing the device driver's own ->runtime_suspend() callback (from the +enabled. However, the pm_runtime_irq_safe() helper function can be used to tell +the PM core that it is safe to run the ->runtime_suspend(), ->runtime_resume() +and ->runtime_idle() callbacks for the given device in atomic context with +interrupts disabled. This implies that the callback routines in question must +not block or sleep, but it also means that the synchronous helper functions +listed at the end of Section 4 may be used for that device within an interrupt +handler or generally in an atomic context. + +The subsystem-level suspend callback, if present, is _entirely_ _responsible_ +for handling the suspend of the device as appropriate, which may, but need not +include executing the device driver's own ->runtime_suspend() callback (from the PM core's point of view it is not necessary to implement a ->runtime_suspend() callback in a device driver as long as the subsystem-level suspend callback knows what to do to handle the device). - * Once the subsystem-level suspend callback has completed successfully - for given device, the PM core regards the device as suspended, which need - not mean that the device has been put into a low power state. It is - supposed to mean, however, that the device will not process data and will - not communicate with the CPU(s) and RAM until the subsystem-level resume - callback is executed for it. The runtime PM status of a device after - successful execution of the subsystem-level suspend callback is 'suspended'. - - * If the subsystem-level suspend callback returns -EBUSY or -EAGAIN, - the device's runtime PM status is 'active', which means that the device - _must_ be fully operational afterwards. - - * If the subsystem-level suspend callback returns an error code different - from -EBUSY or -EAGAIN, the PM core regards this as a fatal error and will - refuse to run the helper functions described in Section 4 for the device, - until the status of it is directly set either to 'active', or to 'suspended' - (the PM core provides special helper functions for this purpose). - -In particular, if the driver requires remote wake-up capability (i.e. hardware + * Once the subsystem-level suspend callback (or the driver suspend callback, + if invoked directly) has completed successfully for the given device, the PM + core regards the device as suspended, which need not mean that it has been + put into a low power state. It is supposed to mean, however, that the + device will not process data and will not communicate with the CPU(s) and + RAM until the appropriate resume callback is executed for it. The runtime + PM status of a device after successful execution of the suspend callback is + 'suspended'. + + * If the suspend callback returns -EBUSY or -EAGAIN, the device's runtime PM + status remains 'active', which means that the device _must_ be fully + operational afterwards. + + * If the suspend callback returns an error code different from -EBUSY and + -EAGAIN, the PM core regards this as a fatal error and will refuse to run + the helper functions described in Section 4 for the device until its status + is directly set to either'active', or 'suspended' (the PM core provides + special helper functions for this purpose). + +In particular, if the driver requires remote wakeup capability (i.e. hardware mechanism allowing the device to request a change of its power state, such as PCI PME) for proper functioning and device_run_wake() returns 'false' for the device, then ->runtime_suspend() should return -EBUSY. On the other hand, if -device_run_wake() returns 'true' for the device and the device is put into a low -power state during the execution of the subsystem-level suspend callback, it is -expected that remote wake-up will be enabled for the device. Generally, remote -wake-up should be enabled for all input devices put into a low power state at -run time. - -The subsystem-level resume callback is _entirely_ _responsible_ for handling the -resume of the device as appropriate, which may, but need not include executing -the device driver's own ->runtime_resume() callback (from the PM core's point of -view it is not necessary to implement a ->runtime_resume() callback in a device -driver as long as the subsystem-level resume callback knows what to do to handle -the device). - - * Once the subsystem-level resume callback has completed successfully, the PM - core regards the device as fully operational, which means that the device - _must_ be able to complete I/O operations as needed. The runtime PM status - of the device is then 'active'. - - * If the subsystem-level resume callback returns an error code, the PM core - regards this as a fatal error and will refuse to run the helper functions - described in Section 4 for the device, until its status is directly set - either to 'active' or to 'suspended' (the PM core provides special helper - functions for this purpose). - -The subsystem-level idle callback is executed by the PM core whenever the device -appears to be idle, which is indicated to the PM core by two counters, the -device's usage counter and the counter of 'active' children of the device. +device_run_wake() returns 'true' for the device and the device is put into a +low-power state during the execution of the suspend callback, it is expected +that remote wakeup will be enabled for the device. Generally, remote wakeup +should be enabled for all input devices put into low-power states at run time. + +The subsystem-level resume callback, if present, is _entirely_ _responsible_ for +handling the resume of the device as appropriate, which may, but need not +include executing the device driver's own ->runtime_resume() callback (from the +PM core's point of view it is not necessary to implement a ->runtime_resume() +callback in a device driver as long as the subsystem-level resume callback knows +what to do to handle the device). + + * Once the subsystem-level resume callback (or the driver resume callback, if + invoked directly) has completed successfully, the PM core regards the device + as fully operational, which means that the device _must_ be able to complete + I/O operations as needed. The runtime PM status of the device is then + 'active'. + + * If the resume callback returns an error code, the PM core regards this as a + fatal error and will refuse to run the helper functions described in Section + 4 for the device, until its status is directly set to either 'active', or + 'suspended' (by means of special helper functions provided by the PM core + for this purpose). + +The idle callback (a subsystem-level one, if present, or the driver one) is +executed by the PM core whenever the device appears to be idle, which is +indicated to the PM core by two counters, the device's usage counter and the +counter of 'active' children of the device. * If any of these counters is decreased using a helper function provided by the PM core and it turns out to be equal to zero, the other counter is checked. If that counter also is equal to zero, the PM core executes the - subsystem-level idle callback with the device as an argument. + idle callback with the device as its argument. -The action performed by a subsystem-level idle callback is totally dependent on -the subsystem in question, but the expected and recommended action is to check +The action performed by the idle callback is totally dependent on the subsystem +(or driver) in question, but the expected and recommended action is to check if the device can be suspended (i.e. if all of the conditions necessary for suspending the device are satisfied) and to queue up a suspend request for the device in that case. The value returned by this callback is ignored by the PM core. The helper functions provided by the PM core, described in Section 4, guarantee -that the following constraints are met with respect to the bus type's runtime -PM callbacks: +that the following constraints are met with respect to runtime PM callbacks for +one device: (1) The callbacks are mutually exclusive (e.g. it is forbidden to execute ->runtime_suspend() in parallel with ->runtime_resume() or with another diff --git a/Documentation/s390/Debugging390.txt b/Documentation/s390/Debugging390.txt index efe998becc5b..462321c1aeea 100644 --- a/Documentation/s390/Debugging390.txt +++ b/Documentation/s390/Debugging390.txt @@ -41,7 +41,6 @@ ldd Debugging modules The proc file system Starting points for debugging scripting languages etc. -Dumptool & Lcrash SysRq References Special Thanks @@ -2455,39 +2454,6 @@ jdb <filename> another fully interactive gdb style debugger. -Dumptool & Lcrash ( lkcd ) -========================== -Michael Holzheu & others here at IBM have a fairly mature port of -SGI's lcrash tool which allows one to look at kernel structures in a -running kernel. - -It also complements a tool called dumptool which dumps all the kernel's -memory pages & registers to either a tape or a disk. -This can be used by tech support or an ambitious end user do -post mortem debugging of a machine like gdb core dumps. - -Going into how to use this tool in detail will be explained -in other documentation supplied by IBM with the patches & the -lcrash homepage http://oss.sgi.com/projects/lkcd/ & the lcrash manpage. - -How they work -------------- -Lcrash is a perfectly normal program,however, it requires 2 -additional files, Kerntypes which is built using a patch to the -linux kernel sources in the linux root directory & the System.map. - -Kerntypes is an objectfile whose sole purpose in life -is to provide stabs debug info to lcrash, to do this -Kerntypes is built from kerntypes.c which just includes the most commonly -referenced header files used when debugging, lcrash can then read the -.stabs section of this file. - -Debugging a live system it uses /dev/mem -alternatively for post mortem debugging it uses the data -collected by dumptool. - - - SysRq ===== This is now supported by linux for s/390 & z/Architecture. diff --git a/Documentation/scsi/53c700.txt b/Documentation/scsi/53c700.txt index 0da681d497a2..e31aceb6df15 100644 --- a/Documentation/scsi/53c700.txt +++ b/Documentation/scsi/53c700.txt @@ -16,32 +16,13 @@ fill in to get the driver working. Compile Time Flags ================== -The driver may be either io mapped or memory mapped. This is -selectable by configuration flags: - -CONFIG_53C700_MEM_MAPPED - -define if the driver is memory mapped. - -CONFIG_53C700_IO_MAPPED - -define if the driver is to be io mapped. - -One or other of the above flags *must* be defined. - -Other flags are: +A compile time flag is: CONFIG_53C700_LE_ON_BE define if the chipset must be supported in little endian mode on a big endian architecture (used for the 700 on parisc). -CONFIG_53C700_USE_CONSISTENT - -allocate consistent memory (should only be used if your architecture -has a mixture of consistent and inconsistent memory). Fully -consistent or fully inconsistent architectures should not define this. - Using the Chip Core Driver ========================== diff --git a/Documentation/security/00-INDEX b/Documentation/security/00-INDEX index 19bc49439cac..99b85d39751c 100644 --- a/Documentation/security/00-INDEX +++ b/Documentation/security/00-INDEX @@ -1,5 +1,7 @@ 00-INDEX - this file. +LSM.txt + - description of the Linux Security Module framework. SELinux.txt - how to get started with the SELinux security enhancement. Smack.txt diff --git a/Documentation/security/LSM.txt b/Documentation/security/LSM.txt new file mode 100644 index 000000000000..c335a763a2ed --- /dev/null +++ b/Documentation/security/LSM.txt @@ -0,0 +1,34 @@ +Linux Security Module framework +------------------------------- + +The Linux Security Module (LSM) framework provides a mechanism for +various security checks to be hooked by new kernel extensions. The name +"module" is a bit of a misnomer since these extensions are not actually +loadable kernel modules. Instead, they are selectable at build-time via +CONFIG_DEFAULT_SECURITY and can be overridden at boot-time via the +"security=..." kernel command line argument, in the case where multiple +LSMs were built into a given kernel. + +The primary users of the LSM interface are Mandatory Access Control +(MAC) extensions which provide a comprehensive security policy. Examples +include SELinux, Smack, Tomoyo, and AppArmor. In addition to the larger +MAC extensions, other extensions can be built using the LSM to provide +specific changes to system operation when these tweaks are not available +in the core functionality of Linux itself. + +Without a specific LSM built into the kernel, the default LSM will be the +Linux capabilities system. Most LSMs choose to extend the capabilities +system, building their checks on top of the defined capability hooks. +For more details on capabilities, see capabilities(7) in the Linux +man-pages project. + +Based on http://kerneltrap.org/Linux/Documenting_Security_Module_Intent, +a new LSM is accepted into the kernel when its intent (a description of +what it tries to protect against and in what cases one would expect to +use it) has been appropriately documented in Documentation/security/. +This allows an LSM's code to be easily compared to its goals, and so +that end users and distros can make a more informed decision about which +LSMs suit their requirements. + +For extensive documentation on the available LSM hook interfaces, please +see include/linux/security.h. diff --git a/Documentation/security/credentials.txt b/Documentation/security/credentials.txt index fc0366cbd7ce..86257052e31a 100644 --- a/Documentation/security/credentials.txt +++ b/Documentation/security/credentials.txt @@ -221,10 +221,10 @@ The Linux kernel supports the following types of credentials: (5) LSM The Linux Security Module allows extra controls to be placed over the - operations that a task may do. Currently Linux supports two main - alternate LSM options: SELinux and Smack. + operations that a task may do. Currently Linux supports several LSM + options. - Both work by labelling the objects in a system and then applying sets of + Some work by labelling the objects in a system and then applying sets of rules (policies) that say what operations a task with one label may do to an object with another label. diff --git a/Documentation/serial/driver b/Documentation/serial/driver index 77ba0afbe4db..0a25a9191864 100644 --- a/Documentation/serial/driver +++ b/Documentation/serial/driver @@ -101,7 +101,7 @@ hardware. Returns the current state of modem control inputs. The state of the outputs should not be returned, since the core keeps track of their state. The state information should include: - - TIOCM_DCD state of DCD signal + - TIOCM_CAR state of DCD signal - TIOCM_CTS state of CTS signal - TIOCM_DSR state of DSR signal - TIOCM_RI state of RI signal diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt index edad99abec21..c8c54544abc5 100644 --- a/Documentation/sound/alsa/HD-Audio-Models.txt +++ b/Documentation/sound/alsa/HD-Audio-Models.txt @@ -42,19 +42,7 @@ ALC260 ALC262 ====== - fujitsu Fujitsu Laptop - benq Benq ED8 - benq-t31 Benq T31 - hippo Hippo (ATI) with jack detection, Sony UX-90s - hippo_1 Hippo (Benq) with jack detection - toshiba-s06 Toshiba S06 - toshiba-rx1 Toshiba RX1 - tyan Tyan Thunder n6650W (S2915-E) - ultra Samsung Q1 Ultra Vista model - lenovo-3000 Lenovo 3000 y410 - nec NEC Versa S9100 - basic fixed pin assignment w/o SPDIF - auto auto-config reading BIOS (default) + N/A ALC267/268 ========== @@ -350,7 +338,6 @@ STAC92HD83* mic-ref Reference board with power management for ports dell-s14 Dell laptop dell-vostro-3500 Dell Vostro 3500 laptop - hp HP laptops with (inverted) mute-LED hp-dv7-4000 HP dv-7 4000 auto BIOS setup (default) diff --git a/Documentation/sound/alsa/compress_offload.txt b/Documentation/sound/alsa/compress_offload.txt new file mode 100644 index 000000000000..c83a835350f0 --- /dev/null +++ b/Documentation/sound/alsa/compress_offload.txt @@ -0,0 +1,188 @@ + compress_offload.txt + ===================== + Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com> + Vinod Koul <vinod.koul@linux.intel.com> + +Overview + +Since its early days, the ALSA API was defined with PCM support or +constant bitrates payloads such as IEC61937 in mind. Arguments and +returned values in frames are the norm, making it a challenge to +extend the existing API to compressed data streams. + +In recent years, audio digital signal processors (DSP) were integrated +in system-on-chip designs, and DSPs are also integrated in audio +codecs. Processing compressed data on such DSPs results in a dramatic +reduction of power consumption compared to host-based +processing. Support for such hardware has not been very good in Linux, +mostly because of a lack of a generic API available in the mainline +kernel. + +Rather than requiring a compability break with an API change of the +ALSA PCM interface, a new 'Compressed Data' API is introduced to +provide a control and data-streaming interface for audio DSPs. + +The design of this API was inspired by the 2-year experience with the +Intel Moorestown SOC, with many corrections required to upstream the +API in the mainline kernel instead of the staging tree and make it +usable by others. + +Requirements + +The main requirements are: + +- separation between byte counts and time. Compressed formats may have + a header per file, per frame, or no header at all. The payload size + may vary from frame-to-frame. As a result, it is not possible to + estimate reliably the duration of audio buffers when handling + compressed data. Dedicated mechanisms are required to allow for + reliable audio-video synchronization, which requires precise + reporting of the number of samples rendered at any given time. + +- Handling of multiple formats. PCM data only requires a specification + of the sampling rate, number of channels and bits per sample. In + contrast, compressed data comes in a variety of formats. Audio DSPs + may also provide support for a limited number of audio encoders and + decoders embedded in firmware, or may support more choices through + dynamic download of libraries. + +- Focus on main formats. This API provides support for the most + popular formats used for audio and video capture and playback. It is + likely that as audio compression technology advances, new formats + will be added. + +- Handling of multiple configurations. Even for a given format like + AAC, some implementations may support AAC multichannel but HE-AAC + stereo. Likewise WMA10 level M3 may require too much memory and cpu + cycles. The new API needs to provide a generic way of listing these + formats. + +- Rendering/Grabbing only. This API does not provide any means of + hardware acceleration, where PCM samples are provided back to + user-space for additional processing. This API focuses instead on + streaming compressed data to a DSP, with the assumption that the + decoded samples are routed to a physical output or logical back-end. + + - Complexity hiding. Existing user-space multimedia frameworks all + have existing enums/structures for each compressed format. This new + API assumes the existence of a platform-specific compatibility layer + to expose, translate and make use of the capabilities of the audio + DSP, eg. Android HAL or PulseAudio sinks. By construction, regular + applications are not supposed to make use of this API. + + +Design + +The new API shares a number of concepts with with the PCM API for flow +control. Start, pause, resume, drain and stop commands have the same +semantics no matter what the content is. + +The concept of memory ring buffer divided in a set of fragments is +borrowed from the ALSA PCM API. However, only sizes in bytes can be +specified. + +Seeks/trick modes are assumed to be handled by the host. + +The notion of rewinds/forwards is not supported. Data committed to the +ring buffer cannot be invalidated, except when dropping all buffers. + +The Compressed Data API does not make any assumptions on how the data +is transmitted to the audio DSP. DMA transfers from main memory to an +embedded audio cluster or to a SPI interface for external DSPs are +possible. As in the ALSA PCM case, a core set of routines is exposed; +each driver implementer will have to write support for a set of +mandatory routines and possibly make use of optional ones. + +The main additions are + +- get_caps +This routine returns the list of audio formats supported. Querying the +codecs on a capture stream will return encoders, decoders will be +listed for playback streams. + +- get_codec_caps For each codec, this routine returns a list of +capabilities. The intent is to make sure all the capabilities +correspond to valid settings, and to minimize the risks of +configuration failures. For example, for a complex codec such as AAC, +the number of channels supported may depend on a specific profile. If +the capabilities were exposed with a single descriptor, it may happen +that a specific combination of profiles/channels/formats may not be +supported. Likewise, embedded DSPs have limited memory and cpu cycles, +it is likely that some implementations make the list of capabilities +dynamic and dependent on existing workloads. In addition to codec +settings, this routine returns the minimum buffer size handled by the +implementation. This information can be a function of the DMA buffer +sizes, the number of bytes required to synchronize, etc, and can be +used by userspace to define how much needs to be written in the ring +buffer before playback can start. + +- set_params +This routine sets the configuration chosen for a specific codec. The +most important field in the parameters is the codec type; in most +cases decoders will ignore other fields, while encoders will strictly +comply to the settings + +- get_params +This routines returns the actual settings used by the DSP. Changes to +the settings should remain the exception. + +- get_timestamp +The timestamp becomes a multiple field structure. It lists the number +of bytes transferred, the number of samples processed and the number +of samples rendered/grabbed. All these values can be used to determine +the avarage bitrate, figure out if the ring buffer needs to be +refilled or the delay due to decoding/encoding/io on the DSP. + +Note that the list of codecs/profiles/modes was derived from the +OpenMAX AL specification instead of reinventing the wheel. +Modifications include: +- Addition of FLAC and IEC formats +- Merge of encoder/decoder capabilities +- Profiles/modes listed as bitmasks to make descriptors more compact +- Addition of set_params for decoders (missing in OpenMAX AL) +- Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL) +- Addition of format information for WMA +- Addition of encoding options when required (derived from OpenMAX IL) +- Addition of rateControlSupported (missing in OpenMAX AL) + +Not supported: + +- Support for VoIP/circuit-switched calls is not the target of this + API. Support for dynamic bit-rate changes would require a tight + coupling between the DSP and the host stack, limiting power savings. + +- Packet-loss concealment is not supported. This would require an + additional interface to let the decoder synthesize data when frames + are lost during transmission. This may be added in the future. + +- Volume control/routing is not handled by this API. Devices exposing a + compressed data interface will be considered as regular ALSA devices; + volume changes and routing information will be provided with regular + ALSA kcontrols. + +- Embedded audio effects. Such effects should be enabled in the same + manner, no matter if the input was PCM or compressed. + +- multichannel IEC encoding. Unclear if this is required. + +- Encoding/decoding acceleration is not supported as mentioned + above. It is possible to route the output of a decoder to a capture + stream, or even implement transcoding capabilities. This routing + would be enabled with ALSA kcontrols. + +- Audio policy/resource management. This API does not provide any + hooks to query the utilization of the audio DSP, nor any premption + mechanisms. + +- No notion of underun/overrun. Since the bytes written are compressed + in nature and data written/read doesn't translate directly to + rendered output in time, this does not deal with underrun/overun and + maybe dealt in user-library + +Credits: +- Mark Brown and Liam Girdwood for discussions on the need for this API +- Harsha Priya for her work on intel_sst compressed API +- Rakesh Ughreja for valuable feedback +- Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for + demonstrating and quantifying the benefits of audio offload on a + real platform. diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 1f2463671a1a..8c20fbd8b42d 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -49,6 +49,7 @@ show up in /proc/sys/kernel: - panic - panic_on_oops - panic_on_unrecovered_nmi +- panic_on_stackoverflow - pid_max - powersave-nap [ PPC only ] - printk @@ -393,6 +394,19 @@ Controls the kernel's behaviour when an oops or BUG is encountered. ============================================================== +panic_on_stackoverflow: + +Controls the kernel's behavior when detecting the overflows of +kernel, IRQ and exception stacks except a user stack. +This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled. + +0: try to continue operation. + +1: panic immediately. + +============================================================== + + pid_max: PID allocation wrap value. When the kernel's next PID value @@ -401,6 +415,14 @@ PIDs of value pid_max or larger are not allocated. ============================================================== +ns_last_pid: + +The last pid allocated in the current (the one task using this sysctl +lives in) pid namespace. When selecting a pid for a next task on fork +kernel tries to allocate a number starting from this one. + +============================================================== + powersave-nap: (PPC only) If set, Linux-PPC will use the 'nap' mode of powersaving, diff --git a/Documentation/trace/events-kmem.txt b/Documentation/trace/events-kmem.txt index aa82ee4a5a87..194800410061 100644 --- a/Documentation/trace/events-kmem.txt +++ b/Documentation/trace/events-kmem.txt @@ -40,8 +40,8 @@ but the call_site can usually be used to extrapolate that information. ================== mm_page_alloc page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d -mm_page_free_direct page=%p pfn=%lu order=%d -mm_pagevec_free page=%p pfn=%lu order=%d cold=%d +mm_page_free page=%p pfn=%lu order=%d +mm_page_free_batched page=%p pfn=%lu order=%d cold=%d These four events deal with page allocation and freeing. mm_page_alloc is a simple indicator of page allocator activity. Pages may be allocated from @@ -53,13 +53,13 @@ amounts of activity imply high activity on the zone->lock. Taking this lock impairs performance by disabling interrupts, dirtying cache lines between CPUs and serialising many CPUs. -When a page is freed directly by the caller, the mm_page_free_direct event +When a page is freed directly by the caller, the only mm_page_free event is triggered. Significant amounts of activity here could indicate that the callers should be batching their activities. -When pages are freed using a pagevec, the mm_pagevec_free is -triggered. Broadly speaking, pages are taken off the LRU lock in bulk and -freed in batch with a pagevec. Significant amounts of activity here could +When pages are freed in batch, the also mm_page_free_batched is triggered. +Broadly speaking, pages are taken off the LRU lock in bulk and +freed in batch with a page list. Significant amounts of activity here could indicate that the system is under memory pressure and can also indicate contention on the zone->lru_lock. diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt index b510564aac7e..bb24c2a0e870 100644 --- a/Documentation/trace/events.txt +++ b/Documentation/trace/events.txt @@ -191,8 +191,6 @@ And for string fields they are: Currently, only exact string matches are supported. -Currently, the maximum number of predicates in a filter is 16. - 5.2 Setting filters ------------------- diff --git a/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl b/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl index 7df50e8cf4d9..0a120aae33ce 100644 --- a/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl +++ b/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl @@ -17,8 +17,8 @@ use Getopt::Long; # Tracepoint events use constant MM_PAGE_ALLOC => 1; -use constant MM_PAGE_FREE_DIRECT => 2; -use constant MM_PAGEVEC_FREE => 3; +use constant MM_PAGE_FREE => 2; +use constant MM_PAGE_FREE_BATCHED => 3; use constant MM_PAGE_PCPU_DRAIN => 4; use constant MM_PAGE_ALLOC_ZONE_LOCKED => 5; use constant MM_PAGE_ALLOC_EXTFRAG => 6; @@ -223,10 +223,10 @@ EVENT_PROCESS: # Perl Switch() sucks majorly if ($tracepoint eq "mm_page_alloc") { $perprocesspid{$process_pid}->{MM_PAGE_ALLOC}++; - } elsif ($tracepoint eq "mm_page_free_direct") { - $perprocesspid{$process_pid}->{MM_PAGE_FREE_DIRECT}++; - } elsif ($tracepoint eq "mm_pagevec_free") { - $perprocesspid{$process_pid}->{MM_PAGEVEC_FREE}++; + } elsif ($tracepoint eq "mm_page_free") { + $perprocesspid{$process_pid}->{MM_PAGE_FREE}++ + } elsif ($tracepoint eq "mm_page_free_batched") { + $perprocesspid{$process_pid}->{MM_PAGE_FREE_BATCHED}++; } elsif ($tracepoint eq "mm_page_pcpu_drain") { $perprocesspid{$process_pid}->{MM_PAGE_PCPU_DRAIN}++; $perprocesspid{$process_pid}->{STATE_PCPU_PAGES_DRAINED}++; @@ -336,8 +336,8 @@ sub dump_stats { $process_pid, $stats{$process_pid}->{MM_PAGE_ALLOC}, $stats{$process_pid}->{MM_PAGE_ALLOC_ZONE_LOCKED}, - $stats{$process_pid}->{MM_PAGE_FREE_DIRECT}, - $stats{$process_pid}->{MM_PAGEVEC_FREE}, + $stats{$process_pid}->{MM_PAGE_FREE}, + $stats{$process_pid}->{MM_PAGE_FREE_BATCHED}, $stats{$process_pid}->{MM_PAGE_PCPU_DRAIN}, $stats{$process_pid}->{HIGH_PCPU_DRAINS}, $stats{$process_pid}->{HIGH_PCPU_REFILLS}, @@ -364,8 +364,8 @@ sub aggregate_perprocesspid() { $perprocess{$process}->{MM_PAGE_ALLOC} += $perprocesspid{$process_pid}->{MM_PAGE_ALLOC}; $perprocess{$process}->{MM_PAGE_ALLOC_ZONE_LOCKED} += $perprocesspid{$process_pid}->{MM_PAGE_ALLOC_ZONE_LOCKED}; - $perprocess{$process}->{MM_PAGE_FREE_DIRECT} += $perprocesspid{$process_pid}->{MM_PAGE_FREE_DIRECT}; - $perprocess{$process}->{MM_PAGEVEC_FREE} += $perprocesspid{$process_pid}->{MM_PAGEVEC_FREE}; + $perprocess{$process}->{MM_PAGE_FREE} += $perprocesspid{$process_pid}->{MM_PAGE_FREE}; + $perprocess{$process}->{MM_PAGE_FREE_BATCHED} += $perprocesspid{$process_pid}->{MM_PAGE_FREE_BATCHED}; $perprocess{$process}->{MM_PAGE_PCPU_DRAIN} += $perprocesspid{$process_pid}->{MM_PAGE_PCPU_DRAIN}; $perprocess{$process}->{HIGH_PCPU_DRAINS} += $perprocesspid{$process_pid}->{HIGH_PCPU_DRAINS}; $perprocess{$process}->{HIGH_PCPU_REFILLS} += $perprocesspid{$process_pid}->{HIGH_PCPU_REFILLS}; diff --git a/Documentation/trace/tracepoint-analysis.txt b/Documentation/trace/tracepoint-analysis.txt index 87bee3c129ba..058cc6c9dc56 100644 --- a/Documentation/trace/tracepoint-analysis.txt +++ b/Documentation/trace/tracepoint-analysis.txt @@ -93,14 +93,14 @@ By specifying the -a switch and analysing sleep, the system-wide events for a duration of time can be examined. $ perf stat -a \ - -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ - -e kmem:mm_pagevec_free \ + -e kmem:mm_page_alloc -e kmem:mm_page_free \ + -e kmem:mm_page_free_batched \ sleep 10 Performance counter stats for 'sleep 10': 9630 kmem:mm_page_alloc - 2143 kmem:mm_page_free_direct - 7424 kmem:mm_pagevec_free + 2143 kmem:mm_page_free + 7424 kmem:mm_page_free_batched 10.002577764 seconds time elapsed @@ -119,15 +119,15 @@ basis using set_ftrace_pid. Events can be activated and tracked for the duration of a process on a local basis using PCL such as follows. - $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ - -e kmem:mm_pagevec_free ./hackbench 10 + $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \ + -e kmem:mm_page_free_batched ./hackbench 10 Time: 0.909 Performance counter stats for './hackbench 10': 17803 kmem:mm_page_alloc - 12398 kmem:mm_page_free_direct - 4827 kmem:mm_pagevec_free + 12398 kmem:mm_page_free + 4827 kmem:mm_page_free_batched 0.973913387 seconds time elapsed @@ -146,8 +146,8 @@ to know what the standard deviation is. By and large, this is left to the performance analyst to do it by hand. In the event that the discrete event occurrences are useful to the performance analyst, then perf can be used. - $ perf stat --repeat 5 -e kmem:mm_page_alloc -e kmem:mm_page_free_direct - -e kmem:mm_pagevec_free ./hackbench 10 + $ perf stat --repeat 5 -e kmem:mm_page_alloc -e kmem:mm_page_free + -e kmem:mm_page_free_batched ./hackbench 10 Time: 0.890 Time: 0.895 Time: 0.915 @@ -157,8 +157,8 @@ occurrences are useful to the performance analyst, then perf can be used. Performance counter stats for './hackbench 10' (5 runs): 16630 kmem:mm_page_alloc ( +- 3.542% ) - 11486 kmem:mm_page_free_direct ( +- 4.771% ) - 4730 kmem:mm_pagevec_free ( +- 2.325% ) + 11486 kmem:mm_page_free ( +- 4.771% ) + 4730 kmem:mm_page_free_batched ( +- 2.325% ) 0.982653002 seconds time elapsed ( +- 1.448% ) @@ -168,15 +168,15 @@ aggregation of discrete events, then a script would need to be developed. Using --repeat, it is also possible to view how events are fluctuating over time on a system-wide basis using -a and sleep. - $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ - -e kmem:mm_pagevec_free \ + $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \ + -e kmem:mm_page_free_batched \ -a --repeat 10 \ sleep 1 Performance counter stats for 'sleep 1' (10 runs): 1066 kmem:mm_page_alloc ( +- 26.148% ) - 182 kmem:mm_page_free_direct ( +- 5.464% ) - 890 kmem:mm_pagevec_free ( +- 30.079% ) + 182 kmem:mm_page_free ( +- 5.464% ) + 890 kmem:mm_page_free_batched ( +- 30.079% ) 1.002251757 seconds time elapsed ( +- 0.005% ) @@ -220,8 +220,8 @@ were generating events within the kernel. To begin this sort of analysis, the data must be recorded. At the time of writing, this required root: $ perf record -c 1 \ - -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ - -e kmem:mm_pagevec_free \ + -e kmem:mm_page_alloc -e kmem:mm_page_free \ + -e kmem:mm_page_free_batched \ ./hackbench 10 Time: 0.894 [ perf record: Captured and wrote 0.733 MB perf.data (~32010 samples) ] @@ -260,8 +260,8 @@ noticed that X was generating an insane amount of page allocations so let's look at it: $ perf record -c 1 -f \ - -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ - -e kmem:mm_pagevec_free \ + -e kmem:mm_page_alloc -e kmem:mm_page_free \ + -e kmem:mm_page_free_batched \ -p `pidof X` This was interrupted after a few seconds and diff --git a/Documentation/usb/usbmon.txt b/Documentation/usb/usbmon.txt index a4efa0462f05..5335fa8b06eb 100644 --- a/Documentation/usb/usbmon.txt +++ b/Documentation/usb/usbmon.txt @@ -47,10 +47,11 @@ This allows to filter away annoying devices that talk continuously. 2. Find which bus connects to the desired device -Run "cat /proc/bus/usb/devices", and find the T-line which corresponds to -the device. Usually you do it by looking for the vendor string. If you have -many similar devices, unplug one and compare two /proc/bus/usb/devices outputs. -The T-line will have a bus number. Example: +Run "cat /sys/kernel/debug/usb/devices", and find the T-line which corresponds +to the device. Usually you do it by looking for the vendor string. If you have +many similar devices, unplug one and compare the two +/sys/kernel/debug/usb/devices outputs. The T-line will have a bus number. +Example: T: Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1 @@ -58,7 +59,10 @@ P: Vendor=0557 ProdID=2004 Rev= 1.00 S: Manufacturer=ATEN S: Product=UC100KM V2.00 -Bus=03 means it's bus 3. +"Bus=03" means it's bus 3. Alternatively, you can look at the output from +"lsusb" and get the bus number from the appropriate line. Example: + +Bus 003 Device 002: ID 0557:2004 ATEN UC100KM V2.00 3. Start 'cat' diff --git a/Documentation/vgaarbiter.txt b/Documentation/vgaarbiter.txt index b7d401e0eae9..014423e2824c 100644 --- a/Documentation/vgaarbiter.txt +++ b/Documentation/vgaarbiter.txt @@ -177,7 +177,7 @@ II. Credits Benjamin Herrenschmidt (IBM?) started this work when he discussed such design with the Xorg community in 2005 [1, 2]. In the end of 2007, Paulo Zanoni and -Tiago Vignatti (both of C3SL/Federal University of Paraná) proceeded his work +Tiago Vignatti (both of C3SL/Federal University of Paraná) proceeded his work enhancing the kernel code to adapt as a kernel module and also did the implementation of the user space side [3]. Now (2009) Tiago Vignatti and Dave Airlie finally put this work in shape and queued to Jesse Barnes' PCI tree. diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7945b0bd35e2..e1d94bf4056e 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1100,6 +1100,15 @@ emulate them efficiently. The fields in each entry are defined as follows: eax, ebx, ecx, edx: the values returned by the cpuid instruction for this function/index combination +The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned +as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC +support. Instead it is reported via + + ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER) + +if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the +feature in userspace, then you can enable the feature for KVM_SET_CPUID2. + 4.47 KVM_PPC_GET_PVINFO Capability: KVM_CAP_PPC_GET_PVINFO @@ -1151,6 +1160,13 @@ following flags are specified: /* Depends on KVM_CAP_IOMMU */ #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) +The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure +isolation of the device. Usages not specifying this flag are deprecated. + +Only PCI header type 0 devices with PCI BAR resources are supported by +device assignment. The user requesting this ioctl must have read/write +access to the PCI sysfs resource files associated with the device. + 4.49 KVM_DEASSIGN_PCI_DEVICE Capability: KVM_CAP_DEVICE_DEASSIGNMENT @@ -1450,6 +1466,31 @@ is supported; 2 if the processor requires all virtual machines to have an RMA, or 1 if the processor can use an RMA but doesn't require it, because it supports the Virtual RMA (VRMA) facility. +4.64 KVM_NMI + +Capability: KVM_CAP_USER_NMI +Architectures: x86 +Type: vcpu ioctl +Parameters: none +Returns: 0 on success, -1 on error + +Queues an NMI on the thread's vcpu. Note this is well defined only +when KVM_CREATE_IRQCHIP has not been called, since this is an interface +between the virtual cpu core and virtual local APIC. After KVM_CREATE_IRQCHIP +has been called, this interface is completely emulated within the kernel. + +To use this to emulate the LINT1 input with KVM_CREATE_IRQCHIP, use the +following algorithm: + + - pause the vpcu + - read the local APIC's state (KVM_GET_LAPIC) + - check whether changing LINT1 will queue an NMI (see the LVT entry for LINT1) + - if so, issue KVM_NMI + - resume the vcpu + +Some guests configure the LINT1 NMI input to cause a panic, aiding in +debugging. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by diff --git a/Documentation/virtual/lguest/.gitignore b/Documentation/virtual/lguest/.gitignore deleted file mode 100644 index 115587fd5f65..000000000000 --- a/Documentation/virtual/lguest/.gitignore +++ /dev/null @@ -1 +0,0 @@ -lguest diff --git a/Documentation/virtual/lguest/Makefile b/Documentation/virtual/lguest/Makefile deleted file mode 100644 index 0ac34206f7a7..000000000000 --- a/Documentation/virtual/lguest/Makefile +++ /dev/null @@ -1,8 +0,0 @@ -# This creates the demonstration utility "lguest" which runs a Linux guest. -# Missing headers? Add "-I../../../include -I../../../arch/x86/include" -CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -U_FORTIFY_SOURCE - -all: lguest - -clean: - rm -f lguest diff --git a/Documentation/virtual/lguest/extract b/Documentation/virtual/lguest/extract deleted file mode 100644 index 7730bb6e4b94..000000000000 --- a/Documentation/virtual/lguest/extract +++ /dev/null @@ -1,58 +0,0 @@ -#! /bin/sh - -set -e - -PREFIX=$1 -shift - -trap 'rm -r $TMPDIR' 0 -TMPDIR=`mktemp -d` - -exec 3>/dev/null -for f; do - while IFS=" -" read -r LINE; do - case "$LINE" in - *$PREFIX:[0-9]*:\**) - NUM=`echo "$LINE" | sed "s/.*$PREFIX:\([0-9]*\).*/\1/"` - if [ -f $TMPDIR/$NUM ]; then - echo "$TMPDIR/$NUM already exits prior to $f" - exit 1 - fi - exec 3>>$TMPDIR/$NUM - echo $f | sed 's,\.\./,,g' > $TMPDIR/.$NUM - /bin/echo "$LINE" | sed -e "s/$PREFIX:[0-9]*//" -e "s/:\*/*/" >&3 - ;; - *$PREFIX:[0-9]*) - NUM=`echo "$LINE" | sed "s/.*$PREFIX:\([0-9]*\).*/\1/"` - if [ -f $TMPDIR/$NUM ]; then - echo "$TMPDIR/$NUM already exits prior to $f" - exit 1 - fi - exec 3>>$TMPDIR/$NUM - echo $f | sed 's,\.\./,,g' > $TMPDIR/.$NUM - /bin/echo "$LINE" | sed "s/$PREFIX:[0-9]*//" >&3 - ;; - *:\**) - /bin/echo "$LINE" | sed -e "s/:\*/*/" -e "s,/\*\*/,," >&3 - echo >&3 - exec 3>/dev/null - ;; - *) - /bin/echo "$LINE" >&3 - ;; - esac - done < $f - echo >&3 - exec 3>/dev/null -done - -LASTFILE="" -for f in $TMPDIR/*; do - if [ "$LASTFILE" != $(cat $TMPDIR/.$(basename $f) ) ]; then - LASTFILE=$(cat $TMPDIR/.$(basename $f) ) - echo "[ $LASTFILE ]" - fi - cat $f -done - diff --git a/Documentation/virtual/lguest/lguest.c b/Documentation/virtual/lguest/lguest.c deleted file mode 100644 index c095d79cae73..000000000000 --- a/Documentation/virtual/lguest/lguest.c +++ /dev/null @@ -1,2065 +0,0 @@ -/*P:100 - * This is the Launcher code, a simple program which lays out the "physical" - * memory for the new Guest by mapping the kernel image and the virtual - * devices, then opens /dev/lguest to tell the kernel about the Guest and - * control it. -:*/ -#define _LARGEFILE64_SOURCE -#define _GNU_SOURCE -#include <stdio.h> -#include <string.h> -#include <unistd.h> -#include <err.h> -#include <stdint.h> -#include <stdlib.h> -#include <elf.h> -#include <sys/mman.h> -#include <sys/param.h> -#include <sys/types.h> -#include <sys/stat.h> -#include <sys/wait.h> -#include <sys/eventfd.h> -#include <fcntl.h> -#include <stdbool.h> -#include <errno.h> -#include <ctype.h> -#include <sys/socket.h> -#include <sys/ioctl.h> -#include <sys/time.h> -#include <time.h> -#include <netinet/in.h> -#include <net/if.h> -#include <linux/sockios.h> -#include <linux/if_tun.h> -#include <sys/uio.h> -#include <termios.h> -#include <getopt.h> -#include <assert.h> -#include <sched.h> -#include <limits.h> -#include <stddef.h> -#include <signal.h> -#include <pwd.h> -#include <grp.h> - -#include <linux/virtio_config.h> -#include <linux/virtio_net.h> -#include <linux/virtio_blk.h> -#include <linux/virtio_console.h> -#include <linux/virtio_rng.h> -#include <linux/virtio_ring.h> -#include <asm/bootparam.h> -#include "../../../include/linux/lguest_launcher.h" -/*L:110 - * We can ignore the 43 include files we need for this program, but I do want - * to draw attention to the use of kernel-style types. - * - * As Linus said, "C is a Spartan language, and so should your naming be." I - * like these abbreviations, so we define them here. Note that u64 is always - * unsigned long long, which works on all Linux systems: this means that we can - * use %llu in printf for any u64. - */ -typedef unsigned long long u64; -typedef uint32_t u32; -typedef uint16_t u16; -typedef uint8_t u8; -/*:*/ - -#define BRIDGE_PFX "bridge:" -#ifndef SIOCBRADDIF -#define SIOCBRADDIF 0x89a2 /* add interface to bridge */ -#endif -/* We can have up to 256 pages for devices. */ -#define DEVICE_PAGES 256 -/* This will occupy 3 pages: it must be a power of 2. */ -#define VIRTQUEUE_NUM 256 - -/*L:120 - * verbose is both a global flag and a macro. The C preprocessor allows - * this, and although I wouldn't recommend it, it works quite nicely here. - */ -static bool verbose; -#define verbose(args...) \ - do { if (verbose) printf(args); } while(0) -/*:*/ - -/* The pointer to the start of guest memory. */ -static void *guest_base; -/* The maximum guest physical address allowed, and maximum possible. */ -static unsigned long guest_limit, guest_max; -/* The /dev/lguest file descriptor. */ -static int lguest_fd; - -/* a per-cpu variable indicating whose vcpu is currently running */ -static unsigned int __thread cpu_id; - -/* This is our list of devices. */ -struct device_list { - /* Counter to assign interrupt numbers. */ - unsigned int next_irq; - - /* Counter to print out convenient device numbers. */ - unsigned int device_num; - - /* The descriptor page for the devices. */ - u8 *descpage; - - /* A single linked list of devices. */ - struct device *dev; - /* And a pointer to the last device for easy append. */ - struct device *lastdev; -}; - -/* The list of Guest devices, based on command line arguments. */ -static struct device_list devices; - -/* The device structure describes a single device. */ -struct device { - /* The linked-list pointer. */ - struct device *next; - - /* The device's descriptor, as mapped into the Guest. */ - struct lguest_device_desc *desc; - - /* We can't trust desc values once Guest has booted: we use these. */ - unsigned int feature_len; - unsigned int num_vq; - - /* The name of this device, for --verbose. */ - const char *name; - - /* Any queues attached to this device */ - struct virtqueue *vq; - - /* Is it operational */ - bool running; - - /* Device-specific data. */ - void *priv; -}; - -/* The virtqueue structure describes a queue attached to a device. */ -struct virtqueue { - struct virtqueue *next; - - /* Which device owns me. */ - struct device *dev; - - /* The configuration for this queue. */ - struct lguest_vqconfig config; - - /* The actual ring of buffers. */ - struct vring vring; - - /* Last available index we saw. */ - u16 last_avail_idx; - - /* How many are used since we sent last irq? */ - unsigned int pending_used; - - /* Eventfd where Guest notifications arrive. */ - int eventfd; - - /* Function for the thread which is servicing this virtqueue. */ - void (*service)(struct virtqueue *vq); - pid_t thread; -}; - -/* Remember the arguments to the program so we can "reboot" */ -static char **main_args; - -/* The original tty settings to restore on exit. */ -static struct termios orig_term; - -/* - * We have to be careful with barriers: our devices are all run in separate - * threads and so we need to make sure that changes visible to the Guest happen - * in precise order. - */ -#define wmb() __asm__ __volatile__("" : : : "memory") -#define mb() __asm__ __volatile__("" : : : "memory") - -/* - * Convert an iovec element to the given type. - * - * This is a fairly ugly trick: we need to know the size of the type and - * alignment requirement to check the pointer is kosher. It's also nice to - * have the name of the type in case we report failure. - * - * Typing those three things all the time is cumbersome and error prone, so we - * have a macro which sets them all up and passes to the real function. - */ -#define convert(iov, type) \ - ((type *)_convert((iov), sizeof(type), __alignof__(type), #type)) - -static void *_convert(struct iovec *iov, size_t size, size_t align, - const char *name) -{ - if (iov->iov_len != size) - errx(1, "Bad iovec size %zu for %s", iov->iov_len, name); - if ((unsigned long)iov->iov_base % align != 0) - errx(1, "Bad alignment %p for %s", iov->iov_base, name); - return iov->iov_base; -} - -/* Wrapper for the last available index. Makes it easier to change. */ -#define lg_last_avail(vq) ((vq)->last_avail_idx) - -/* - * The virtio configuration space is defined to be little-endian. x86 is - * little-endian too, but it's nice to be explicit so we have these helpers. - */ -#define cpu_to_le16(v16) (v16) -#define cpu_to_le32(v32) (v32) -#define cpu_to_le64(v64) (v64) -#define le16_to_cpu(v16) (v16) -#define le32_to_cpu(v32) (v32) -#define le64_to_cpu(v64) (v64) - -/* Is this iovec empty? */ -static bool iov_empty(const struct iovec iov[], unsigned int num_iov) -{ - unsigned int i; - - for (i = 0; i < num_iov; i++) - if (iov[i].iov_len) - return false; - return true; -} - -/* Take len bytes from the front of this iovec. */ -static void iov_consume(struct iovec iov[], unsigned num_iov, unsigned len) -{ - unsigned int i; - - for (i = 0; i < num_iov; i++) { - unsigned int used; - - used = iov[i].iov_len < len ? iov[i].iov_len : len; - iov[i].iov_base += used; - iov[i].iov_len -= used; - len -= used; - } - assert(len == 0); -} - -/* The device virtqueue descriptors are followed by feature bitmasks. */ -static u8 *get_feature_bits(struct device *dev) -{ - return (u8 *)(dev->desc + 1) - + dev->num_vq * sizeof(struct lguest_vqconfig); -} - -/*L:100 - * The Launcher code itself takes us out into userspace, that scary place where - * pointers run wild and free! Unfortunately, like most userspace programs, - * it's quite boring (which is why everyone likes to hack on the kernel!). - * Perhaps if you make up an Lguest Drinking Game at this point, it will get - * you through this section. Or, maybe not. - * - * The Launcher sets up a big chunk of memory to be the Guest's "physical" - * memory and stores it in "guest_base". In other words, Guest physical == - * Launcher virtual with an offset. - * - * This can be tough to get your head around, but usually it just means that we - * use these trivial conversion functions when the Guest gives us its - * "physical" addresses: - */ -static void *from_guest_phys(unsigned long addr) -{ - return guest_base + addr; -} - -static unsigned long to_guest_phys(const void *addr) -{ - return (addr - guest_base); -} - -/*L:130 - * Loading the Kernel. - * - * We start with couple of simple helper routines. open_or_die() avoids - * error-checking code cluttering the callers: - */ -static int open_or_die(const char *name, int flags) -{ - int fd = open(name, flags); - if (fd < 0) - err(1, "Failed to open %s", name); - return fd; -} - -/* map_zeroed_pages() takes a number of pages. */ -static void *map_zeroed_pages(unsigned int num) -{ - int fd = open_or_die("/dev/zero", O_RDONLY); - void *addr; - - /* - * We use a private mapping (ie. if we write to the page, it will be - * copied). We allocate an extra two pages PROT_NONE to act as guard - * pages against read/write attempts that exceed allocated space. - */ - addr = mmap(NULL, getpagesize() * (num+2), - PROT_NONE, MAP_PRIVATE, fd, 0); - - if (addr == MAP_FAILED) - err(1, "Mmapping %u pages of /dev/zero", num); - - if (mprotect(addr + getpagesize(), getpagesize() * num, - PROT_READ|PROT_WRITE) == -1) - err(1, "mprotect rw %u pages failed", num); - - /* - * One neat mmap feature is that you can close the fd, and it - * stays mapped. - */ - close(fd); - - /* Return address after PROT_NONE page */ - return addr + getpagesize(); -} - -/* Get some more pages for a device. */ -static void *get_pages(unsigned int num) -{ - void *addr = from_guest_phys(guest_limit); - - guest_limit += num * getpagesize(); - if (guest_limit > guest_max) - errx(1, "Not enough memory for devices"); - return addr; -} - -/* - * This routine is used to load the kernel or initrd. It tries mmap, but if - * that fails (Plan 9's kernel file isn't nicely aligned on page boundaries), - * it falls back to reading the memory in. - */ -static void map_at(int fd, void *addr, unsigned long offset, unsigned long len) -{ - ssize_t r; - - /* - * We map writable even though for some segments are marked read-only. - * The kernel really wants to be writable: it patches its own - * instructions. - * - * MAP_PRIVATE means that the page won't be copied until a write is - * done to it. This allows us to share untouched memory between - * Guests. - */ - if (mmap(addr, len, PROT_READ|PROT_WRITE, - MAP_FIXED|MAP_PRIVATE, fd, offset) != MAP_FAILED) - return; - - /* pread does a seek and a read in one shot: saves a few lines. */ - r = pread(fd, addr, len, offset); - if (r != len) - err(1, "Reading offset %lu len %lu gave %zi", offset, len, r); -} - -/* - * This routine takes an open vmlinux image, which is in ELF, and maps it into - * the Guest memory. ELF = Embedded Linking Format, which is the format used - * by all modern binaries on Linux including the kernel. - * - * The ELF headers give *two* addresses: a physical address, and a virtual - * address. We use the physical address; the Guest will map itself to the - * virtual address. - * - * We return the starting address. - */ -static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr) -{ - Elf32_Phdr phdr[ehdr->e_phnum]; - unsigned int i; - - /* - * Sanity checks on the main ELF header: an x86 executable with a - * reasonable number of correctly-sized program headers. - */ - if (ehdr->e_type != ET_EXEC - || ehdr->e_machine != EM_386 - || ehdr->e_phentsize != sizeof(Elf32_Phdr) - || ehdr->e_phnum < 1 || ehdr->e_phnum > 65536U/sizeof(Elf32_Phdr)) - errx(1, "Malformed elf header"); - - /* - * An ELF executable contains an ELF header and a number of "program" - * headers which indicate which parts ("segments") of the program to - * load where. - */ - - /* We read in all the program headers at once: */ - if (lseek(elf_fd, ehdr->e_phoff, SEEK_SET) < 0) - err(1, "Seeking to program headers"); - if (read(elf_fd, phdr, sizeof(phdr)) != sizeof(phdr)) - err(1, "Reading program headers"); - - /* - * Try all the headers: there are usually only three. A read-only one, - * a read-write one, and a "note" section which we don't load. - */ - for (i = 0; i < ehdr->e_phnum; i++) { - /* If this isn't a loadable segment, we ignore it */ - if (phdr[i].p_type != PT_LOAD) - continue; - - verbose("Section %i: size %i addr %p\n", - i, phdr[i].p_memsz, (void *)phdr[i].p_paddr); - - /* We map this section of the file at its physical address. */ - map_at(elf_fd, from_guest_phys(phdr[i].p_paddr), - phdr[i].p_offset, phdr[i].p_filesz); - } - - /* The entry point is given in the ELF header. */ - return ehdr->e_entry; -} - -/*L:150 - * A bzImage, unlike an ELF file, is not meant to be loaded. You're supposed - * to jump into it and it will unpack itself. We used to have to perform some - * hairy magic because the unpacking code scared me. - * - * Fortunately, Jeremy Fitzhardinge convinced me it wasn't that hard and wrote - * a small patch to jump over the tricky bits in the Guest, so now we just read - * the funky header so we know where in the file to load, and away we go! - */ -static unsigned long load_bzimage(int fd) -{ - struct boot_params boot; - int r; - /* Modern bzImages get loaded at 1M. */ - void *p = from_guest_phys(0x100000); - - /* - * Go back to the start of the file and read the header. It should be - * a Linux boot header (see Documentation/x86/boot.txt) - */ - lseek(fd, 0, SEEK_SET); - read(fd, &boot, sizeof(boot)); - - /* Inside the setup_hdr, we expect the magic "HdrS" */ - if (memcmp(&boot.hdr.header, "HdrS", 4) != 0) - errx(1, "This doesn't look like a bzImage to me"); - - /* Skip over the extra sectors of the header. */ - lseek(fd, (boot.hdr.setup_sects+1) * 512, SEEK_SET); - - /* Now read everything into memory. in nice big chunks. */ - while ((r = read(fd, p, 65536)) > 0) - p += r; - - /* Finally, code32_start tells us where to enter the kernel. */ - return boot.hdr.code32_start; -} - -/*L:140 - * Loading the kernel is easy when it's a "vmlinux", but most kernels - * come wrapped up in the self-decompressing "bzImage" format. With a little - * work, we can load those, too. - */ -static unsigned long load_kernel(int fd) -{ - Elf32_Ehdr hdr; - - /* Read in the first few bytes. */ - if (read(fd, &hdr, sizeof(hdr)) != sizeof(hdr)) - err(1, "Reading kernel"); - - /* If it's an ELF file, it starts with "\177ELF" */ - if (memcmp(hdr.e_ident, ELFMAG, SELFMAG) == 0) - return map_elf(fd, &hdr); - - /* Otherwise we assume it's a bzImage, and try to load it. */ - return load_bzimage(fd); -} - -/* - * This is a trivial little helper to align pages. Andi Kleen hated it because - * it calls getpagesize() twice: "it's dumb code." - * - * Kernel guys get really het up about optimization, even when it's not - * necessary. I leave this code as a reaction against that. - */ -static inline unsigned long page_align(unsigned long addr) -{ - /* Add upwards and truncate downwards. */ - return ((addr + getpagesize()-1) & ~(getpagesize()-1)); -} - -/*L:180 - * An "initial ram disk" is a disk image loaded into memory along with the - * kernel which the kernel can use to boot from without needing any drivers. - * Most distributions now use this as standard: the initrd contains the code to - * load the appropriate driver modules for the current machine. - * - * Importantly, James Morris works for RedHat, and Fedora uses initrds for its - * kernels. He sent me this (and tells me when I break it). - */ -static unsigned long load_initrd(const char *name, unsigned long mem) -{ - int ifd; - struct stat st; - unsigned long len; - - ifd = open_or_die(name, O_RDONLY); - /* fstat() is needed to get the file size. */ - if (fstat(ifd, &st) < 0) - err(1, "fstat() on initrd '%s'", name); - - /* - * We map the initrd at the top of memory, but mmap wants it to be - * page-aligned, so we round the size up for that. - */ - len = page_align(st.st_size); - map_at(ifd, from_guest_phys(mem - len), 0, st.st_size); - /* - * Once a file is mapped, you can close the file descriptor. It's a - * little odd, but quite useful. - */ - close(ifd); - verbose("mapped initrd %s size=%lu @ %p\n", name, len, (void*)mem-len); - - /* We return the initrd size. */ - return len; -} -/*:*/ - -/* - * Simple routine to roll all the commandline arguments together with spaces - * between them. - */ -static void concat(char *dst, char *args[]) -{ - unsigned int i, len = 0; - - for (i = 0; args[i]; i++) { - if (i) { - strcat(dst+len, " "); - len++; - } - strcpy(dst+len, args[i]); - len += strlen(args[i]); - } - /* In case it's empty. */ - dst[len] = '\0'; -} - -/*L:185 - * This is where we actually tell the kernel to initialize the Guest. We - * saw the arguments it expects when we looked at initialize() in lguest_user.c: - * the base of Guest "physical" memory, the top physical page to allow and the - * entry point for the Guest. - */ -static void tell_kernel(unsigned long start) -{ - unsigned long args[] = { LHREQ_INITIALIZE, - (unsigned long)guest_base, - guest_limit / getpagesize(), start }; - verbose("Guest: %p - %p (%#lx)\n", - guest_base, guest_base + guest_limit, guest_limit); - lguest_fd = open_or_die("/dev/lguest", O_RDWR); - if (write(lguest_fd, args, sizeof(args)) < 0) - err(1, "Writing to /dev/lguest"); -} -/*:*/ - -/*L:200 - * Device Handling. - * - * When the Guest gives us a buffer, it sends an array of addresses and sizes. - * We need to make sure it's not trying to reach into the Launcher itself, so - * we have a convenient routine which checks it and exits with an error message - * if something funny is going on: - */ -static void *_check_pointer(unsigned long addr, unsigned int size, - unsigned int line) -{ - /* - * Check if the requested address and size exceeds the allocated memory, - * or addr + size wraps around. - */ - if ((addr + size) > guest_limit || (addr + size) < addr) - errx(1, "%s:%i: Invalid address %#lx", __FILE__, line, addr); - /* - * We return a pointer for the caller's convenience, now we know it's - * safe to use. - */ - return from_guest_phys(addr); -} -/* A macro which transparently hands the line number to the real function. */ -#define check_pointer(addr,size) _check_pointer(addr, size, __LINE__) - -/* - * Each buffer in the virtqueues is actually a chain of descriptors. This - * function returns the next descriptor in the chain, or vq->vring.num if we're - * at the end. - */ -static unsigned next_desc(struct vring_desc *desc, - unsigned int i, unsigned int max) -{ - unsigned int next; - - /* If this descriptor says it doesn't chain, we're done. */ - if (!(desc[i].flags & VRING_DESC_F_NEXT)) - return max; - - /* Check they're not leading us off end of descriptors. */ - next = desc[i].next; - /* Make sure compiler knows to grab that: we don't want it changing! */ - wmb(); - - if (next >= max) - errx(1, "Desc next is %u", next); - - return next; -} - -/* - * This actually sends the interrupt for this virtqueue, if we've used a - * buffer. - */ -static void trigger_irq(struct virtqueue *vq) -{ - unsigned long buf[] = { LHREQ_IRQ, vq->config.irq }; - - /* Don't inform them if nothing used. */ - if (!vq->pending_used) - return; - vq->pending_used = 0; - - /* If they don't want an interrupt, don't send one... */ - if (vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT) { - return; - } - - /* Send the Guest an interrupt tell them we used something up. */ - if (write(lguest_fd, buf, sizeof(buf)) != 0) - err(1, "Triggering irq %i", vq->config.irq); -} - -/* - * This looks in the virtqueue for the first available buffer, and converts - * it to an iovec for convenient access. Since descriptors consist of some - * number of output then some number of input descriptors, it's actually two - * iovecs, but we pack them into one and note how many of each there were. - * - * This function waits if necessary, and returns the descriptor number found. - */ -static unsigned wait_for_vq_desc(struct virtqueue *vq, - struct iovec iov[], - unsigned int *out_num, unsigned int *in_num) -{ - unsigned int i, head, max; - struct vring_desc *desc; - u16 last_avail = lg_last_avail(vq); - - /* There's nothing available? */ - while (last_avail == vq->vring.avail->idx) { - u64 event; - - /* - * Since we're about to sleep, now is a good time to tell the - * Guest about what we've used up to now. - */ - trigger_irq(vq); - - /* OK, now we need to know about added descriptors. */ - vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY; - - /* - * They could have slipped one in as we were doing that: make - * sure it's written, then check again. - */ - mb(); - if (last_avail != vq->vring.avail->idx) { - vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY; - break; - } - - /* Nothing new? Wait for eventfd to tell us they refilled. */ - if (read(vq->eventfd, &event, sizeof(event)) != sizeof(event)) - errx(1, "Event read failed?"); - - /* We don't need to be notified again. */ - vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY; - } - - /* Check it isn't doing very strange things with descriptor numbers. */ - if ((u16)(vq->vring.avail->idx - last_avail) > vq->vring.num) - errx(1, "Guest moved used index from %u to %u", - last_avail, vq->vring.avail->idx); - - /* - * Grab the next descriptor number they're advertising, and increment - * the index we've seen. - */ - head = vq->vring.avail->ring[last_avail % vq->vring.num]; - lg_last_avail(vq)++; - - /* If their number is silly, that's a fatal mistake. */ - if (head >= vq->vring.num) - errx(1, "Guest says index %u is available", head); - - /* When we start there are none of either input nor output. */ - *out_num = *in_num = 0; - - max = vq->vring.num; - desc = vq->vring.desc; - i = head; - - /* - * If this is an indirect entry, then this buffer contains a descriptor - * table which we handle as if it's any normal descriptor chain. - */ - if (desc[i].flags & VRING_DESC_F_INDIRECT) { - if (desc[i].len % sizeof(struct vring_desc)) - errx(1, "Invalid size for indirect buffer table"); - - max = desc[i].len / sizeof(struct vring_desc); - desc = check_pointer(desc[i].addr, desc[i].len); - i = 0; - } - - do { - /* Grab the first descriptor, and check it's OK. */ - iov[*out_num + *in_num].iov_len = desc[i].len; - iov[*out_num + *in_num].iov_base - = check_pointer(desc[i].addr, desc[i].len); - /* If this is an input descriptor, increment that count. */ - if (desc[i].flags & VRING_DESC_F_WRITE) - (*in_num)++; - else { - /* - * If it's an output descriptor, they're all supposed - * to come before any input descriptors. - */ - if (*in_num) - errx(1, "Descriptor has out after in"); - (*out_num)++; - } - - /* If we've got too many, that implies a descriptor loop. */ - if (*out_num + *in_num > max) - errx(1, "Looped descriptor"); - } while ((i = next_desc(desc, i, max)) != max); - - return head; -} - -/* - * After we've used one of their buffers, we tell the Guest about it. Sometime - * later we'll want to send them an interrupt using trigger_irq(); note that - * wait_for_vq_desc() does that for us if it has to wait. - */ -static void add_used(struct virtqueue *vq, unsigned int head, int len) -{ - struct vring_used_elem *used; - - /* - * The virtqueue contains a ring of used buffers. Get a pointer to the - * next entry in that used ring. - */ - used = &vq->vring.used->ring[vq->vring.used->idx % vq->vring.num]; - used->id = head; - used->len = len; - /* Make sure buffer is written before we update index. */ - wmb(); - vq->vring.used->idx++; - vq->pending_used++; -} - -/* And here's the combo meal deal. Supersize me! */ -static void add_used_and_trigger(struct virtqueue *vq, unsigned head, int len) -{ - add_used(vq, head, len); - trigger_irq(vq); -} - -/* - * The Console - * - * We associate some data with the console for our exit hack. - */ -struct console_abort { - /* How many times have they hit ^C? */ - int count; - /* When did they start? */ - struct timeval start; -}; - -/* This is the routine which handles console input (ie. stdin). */ -static void console_input(struct virtqueue *vq) -{ - int len; - unsigned int head, in_num, out_num; - struct console_abort *abort = vq->dev->priv; - struct iovec iov[vq->vring.num]; - - /* Make sure there's a descriptor available. */ - head = wait_for_vq_desc(vq, iov, &out_num, &in_num); - if (out_num) - errx(1, "Output buffers in console in queue?"); - - /* Read into it. This is where we usually wait. */ - len = readv(STDIN_FILENO, iov, in_num); - if (len <= 0) { - /* Ran out of input? */ - warnx("Failed to get console input, ignoring console."); - /* - * For simplicity, dying threads kill the whole Launcher. So - * just nap here. - */ - for (;;) - pause(); - } - - /* Tell the Guest we used a buffer. */ - add_used_and_trigger(vq, head, len); - - /* - * Three ^C within one second? Exit. - * - * This is such a hack, but works surprisingly well. Each ^C has to - * be in a buffer by itself, so they can't be too fast. But we check - * that we get three within about a second, so they can't be too - * slow. - */ - if (len != 1 || ((char *)iov[0].iov_base)[0] != 3) { - abort->count = 0; - return; - } - - abort->count++; - if (abort->count == 1) - gettimeofday(&abort->start, NULL); - else if (abort->count == 3) { - struct timeval now; - gettimeofday(&now, NULL); - /* Kill all Launcher processes with SIGINT, like normal ^C */ - if (now.tv_sec <= abort->start.tv_sec+1) - kill(0, SIGINT); - abort->count = 0; - } -} - -/* This is the routine which handles console output (ie. stdout). */ -static void console_output(struct virtqueue *vq) -{ - unsigned int head, out, in; - struct iovec iov[vq->vring.num]; - - /* We usually wait in here, for the Guest to give us something. */ - head = wait_for_vq_desc(vq, iov, &out, &in); - if (in) - errx(1, "Input buffers in console output queue?"); - - /* writev can return a partial write, so we loop here. */ - while (!iov_empty(iov, out)) { - int len = writev(STDOUT_FILENO, iov, out); - if (len <= 0) { - warn("Write to stdout gave %i (%d)", len, errno); - break; - } - iov_consume(iov, out, len); - } - - /* - * We're finished with that buffer: if we're going to sleep, - * wait_for_vq_desc() will prod the Guest with an interrupt. - */ - add_used(vq, head, 0); -} - -/* - * The Network - * - * Handling output for network is also simple: we get all the output buffers - * and write them to /dev/net/tun. - */ -struct net_info { - int tunfd; -}; - -static void net_output(struct virtqueue *vq) -{ - struct net_info *net_info = vq->dev->priv; - unsigned int head, out, in; - struct iovec iov[vq->vring.num]; - - /* We usually wait in here for the Guest to give us a packet. */ - head = wait_for_vq_desc(vq, iov, &out, &in); - if (in) - errx(1, "Input buffers in net output queue?"); - /* - * Send the whole thing through to /dev/net/tun. It expects the exact - * same format: what a coincidence! - */ - if (writev(net_info->tunfd, iov, out) < 0) - warnx("Write to tun failed (%d)?", errno); - - /* - * Done with that one; wait_for_vq_desc() will send the interrupt if - * all packets are processed. - */ - add_used(vq, head, 0); -} - -/* - * Handling network input is a bit trickier, because I've tried to optimize it. - * - * First we have a helper routine which tells is if from this file descriptor - * (ie. the /dev/net/tun device) will block: - */ -static bool will_block(int fd) -{ - fd_set fdset; - struct timeval zero = { 0, 0 }; - FD_ZERO(&fdset); - FD_SET(fd, &fdset); - return select(fd+1, &fdset, NULL, NULL, &zero) != 1; -} - -/* - * This handles packets coming in from the tun device to our Guest. Like all - * service routines, it gets called again as soon as it returns, so you don't - * see a while(1) loop here. - */ -static void net_input(struct virtqueue *vq) -{ - int len; - unsigned int head, out, in; - struct iovec iov[vq->vring.num]; - struct net_info *net_info = vq->dev->priv; - - /* - * Get a descriptor to write an incoming packet into. This will also - * send an interrupt if they're out of descriptors. - */ - head = wait_for_vq_desc(vq, iov, &out, &in); - if (out) - errx(1, "Output buffers in net input queue?"); - - /* - * If it looks like we'll block reading from the tun device, send them - * an interrupt. - */ - if (vq->pending_used && will_block(net_info->tunfd)) - trigger_irq(vq); - - /* - * Read in the packet. This is where we normally wait (when there's no - * incoming network traffic). - */ - len = readv(net_info->tunfd, iov, in); - if (len <= 0) - warn("Failed to read from tun (%d).", errno); - - /* - * Mark that packet buffer as used, but don't interrupt here. We want - * to wait until we've done as much work as we can. - */ - add_used(vq, head, len); -} -/*:*/ - -/* This is the helper to create threads: run the service routine in a loop. */ -static int do_thread(void *_vq) -{ - struct virtqueue *vq = _vq; - - for (;;) - vq->service(vq); - return 0; -} - -/* - * When a child dies, we kill our entire process group with SIGTERM. This - * also has the side effect that the shell restores the console for us! - */ -static void kill_launcher(int signal) -{ - kill(0, SIGTERM); -} - -static void reset_device(struct device *dev) -{ - struct virtqueue *vq; - - verbose("Resetting device %s\n", dev->name); - - /* Clear any features they've acked. */ - memset(get_feature_bits(dev) + dev->feature_len, 0, dev->feature_len); - - /* We're going to be explicitly killing threads, so ignore them. */ - signal(SIGCHLD, SIG_IGN); - - /* Zero out the virtqueues, get rid of their threads */ - for (vq = dev->vq; vq; vq = vq->next) { - if (vq->thread != (pid_t)-1) { - kill(vq->thread, SIGTERM); - waitpid(vq->thread, NULL, 0); - vq->thread = (pid_t)-1; - } - memset(vq->vring.desc, 0, - vring_size(vq->config.num, LGUEST_VRING_ALIGN)); - lg_last_avail(vq) = 0; - } - dev->running = false; - - /* Now we care if threads die. */ - signal(SIGCHLD, (void *)kill_launcher); -} - -/*L:216 - * This actually creates the thread which services the virtqueue for a device. - */ -static void create_thread(struct virtqueue *vq) -{ - /* - * Create stack for thread. Since the stack grows upwards, we point - * the stack pointer to the end of this region. - */ - char *stack = malloc(32768); - unsigned long args[] = { LHREQ_EVENTFD, - vq->config.pfn*getpagesize(), 0 }; - - /* Create a zero-initialized eventfd. */ - vq->eventfd = eventfd(0, 0); - if (vq->eventfd < 0) - err(1, "Creating eventfd"); - args[2] = vq->eventfd; - - /* - * Attach an eventfd to this virtqueue: it will go off when the Guest - * does an LHCALL_NOTIFY for this vq. - */ - if (write(lguest_fd, &args, sizeof(args)) != 0) - err(1, "Attaching eventfd"); - - /* - * CLONE_VM: because it has to access the Guest memory, and SIGCHLD so - * we get a signal if it dies. - */ - vq->thread = clone(do_thread, stack + 32768, CLONE_VM | SIGCHLD, vq); - if (vq->thread == (pid_t)-1) - err(1, "Creating clone"); - - /* We close our local copy now the child has it. */ - close(vq->eventfd); -} - -static void start_device(struct device *dev) -{ - unsigned int i; - struct virtqueue *vq; - - verbose("Device %s OK: offered", dev->name); - for (i = 0; i < dev->feature_len; i++) - verbose(" %02x", get_feature_bits(dev)[i]); - verbose(", accepted"); - for (i = 0; i < dev->feature_len; i++) - verbose(" %02x", get_feature_bits(dev) - [dev->feature_len+i]); - - for (vq = dev->vq; vq; vq = vq->next) { - if (vq->service) - create_thread(vq); - } - dev->running = true; -} - -static void cleanup_devices(void) -{ - struct device *dev; - - for (dev = devices.dev; dev; dev = dev->next) - reset_device(dev); - - /* If we saved off the original terminal settings, restore them now. */ - if (orig_term.c_lflag & (ISIG|ICANON|ECHO)) - tcsetattr(STDIN_FILENO, TCSANOW, &orig_term); -} - -/* When the Guest tells us they updated the status field, we handle it. */ -static void update_device_status(struct device *dev) -{ - /* A zero status is a reset, otherwise it's a set of flags. */ - if (dev->desc->status == 0) - reset_device(dev); - else if (dev->desc->status & VIRTIO_CONFIG_S_FAILED) { - warnx("Device %s configuration FAILED", dev->name); - if (dev->running) - reset_device(dev); - } else { - if (dev->running) - err(1, "Device %s features finalized twice", dev->name); - start_device(dev); - } -} - -/*L:215 - * This is the generic routine we call when the Guest uses LHCALL_NOTIFY. In - * particular, it's used to notify us of device status changes during boot. - */ -static void handle_output(unsigned long addr) -{ - struct device *i; - - /* Check each device. */ - for (i = devices.dev; i; i = i->next) { - struct virtqueue *vq; - - /* - * Notifications to device descriptors mean they updated the - * device status. - */ - if (from_guest_phys(addr) == i->desc) { - update_device_status(i); - return; - } - - /* Devices should not be used before features are finalized. */ - for (vq = i->vq; vq; vq = vq->next) { - if (addr != vq->config.pfn*getpagesize()) - continue; - errx(1, "Notification on %s before setup!", i->name); - } - } - - /* - * Early console write is done using notify on a nul-terminated string - * in Guest memory. It's also great for hacking debugging messages - * into a Guest. - */ - if (addr >= guest_limit) - errx(1, "Bad NOTIFY %#lx", addr); - - write(STDOUT_FILENO, from_guest_phys(addr), - strnlen(from_guest_phys(addr), guest_limit - addr)); -} - -/*L:190 - * Device Setup - * - * All devices need a descriptor so the Guest knows it exists, and a "struct - * device" so the Launcher can keep track of it. We have common helper - * routines to allocate and manage them. - */ - -/* - * The layout of the device page is a "struct lguest_device_desc" followed by a - * number of virtqueue descriptors, then two sets of feature bits, then an - * array of configuration bytes. This routine returns the configuration - * pointer. - */ -static u8 *device_config(const struct device *dev) -{ - return (void *)(dev->desc + 1) - + dev->num_vq * sizeof(struct lguest_vqconfig) - + dev->feature_len * 2; -} - -/* - * This routine allocates a new "struct lguest_device_desc" from descriptor - * table page just above the Guest's normal memory. It returns a pointer to - * that descriptor. - */ -static struct lguest_device_desc *new_dev_desc(u16 type) -{ - struct lguest_device_desc d = { .type = type }; - void *p; - - /* Figure out where the next device config is, based on the last one. */ - if (devices.lastdev) - p = device_config(devices.lastdev) - + devices.lastdev->desc->config_len; - else - p = devices.descpage; - - /* We only have one page for all the descriptors. */ - if (p + sizeof(d) > (void *)devices.descpage + getpagesize()) - errx(1, "Too many devices"); - - /* p might not be aligned, so we memcpy in. */ - return memcpy(p, &d, sizeof(d)); -} - -/* - * Each device descriptor is followed by the description of its virtqueues. We - * specify how many descriptors the virtqueue is to have. - */ -static void add_virtqueue(struct device *dev, unsigned int num_descs, - void (*service)(struct virtqueue *)) -{ - unsigned int pages; - struct virtqueue **i, *vq = malloc(sizeof(*vq)); - void *p; - - /* First we need some memory for this virtqueue. */ - pages = (vring_size(num_descs, LGUEST_VRING_ALIGN) + getpagesize() - 1) - / getpagesize(); - p = get_pages(pages); - - /* Initialize the virtqueue */ - vq->next = NULL; - vq->last_avail_idx = 0; - vq->dev = dev; - - /* - * This is the routine the service thread will run, and its Process ID - * once it's running. - */ - vq->service = service; - vq->thread = (pid_t)-1; - - /* Initialize the configuration. */ - vq->config.num = num_descs; - vq->config.irq = devices.next_irq++; - vq->config.pfn = to_guest_phys(p) / getpagesize(); - - /* Initialize the vring. */ - vring_init(&vq->vring, num_descs, p, LGUEST_VRING_ALIGN); - - /* - * Append virtqueue to this device's descriptor. We use - * device_config() to get the end of the device's current virtqueues; - * we check that we haven't added any config or feature information - * yet, otherwise we'd be overwriting them. - */ - assert(dev->desc->config_len == 0 && dev->desc->feature_len == 0); - memcpy(device_config(dev), &vq->config, sizeof(vq->config)); - dev->num_vq++; - dev->desc->num_vq++; - - verbose("Virtqueue page %#lx\n", to_guest_phys(p)); - - /* - * Add to tail of list, so dev->vq is first vq, dev->vq->next is - * second. - */ - for (i = &dev->vq; *i; i = &(*i)->next); - *i = vq; -} - -/* - * The first half of the feature bitmask is for us to advertise features. The - * second half is for the Guest to accept features. - */ -static void add_feature(struct device *dev, unsigned bit) -{ - u8 *features = get_feature_bits(dev); - - /* We can't extend the feature bits once we've added config bytes */ - if (dev->desc->feature_len <= bit / CHAR_BIT) { - assert(dev->desc->config_len == 0); - dev->feature_len = dev->desc->feature_len = (bit/CHAR_BIT) + 1; - } - - features[bit / CHAR_BIT] |= (1 << (bit % CHAR_BIT)); -} - -/* - * This routine sets the configuration fields for an existing device's - * descriptor. It only works for the last device, but that's OK because that's - * how we use it. - */ -static void set_config(struct device *dev, unsigned len, const void *conf) -{ - /* Check we haven't overflowed our single page. */ - if (device_config(dev) + len > devices.descpage + getpagesize()) - errx(1, "Too many devices"); - - /* Copy in the config information, and store the length. */ - memcpy(device_config(dev), conf, len); - dev->desc->config_len = len; - - /* Size must fit in config_len field (8 bits)! */ - assert(dev->desc->config_len == len); -} - -/* - * This routine does all the creation and setup of a new device, including - * calling new_dev_desc() to allocate the descriptor and device memory. We - * don't actually start the service threads until later. - * - * See what I mean about userspace being boring? - */ -static struct device *new_device(const char *name, u16 type) -{ - struct device *dev = malloc(sizeof(*dev)); - - /* Now we populate the fields one at a time. */ - dev->desc = new_dev_desc(type); - dev->name = name; - dev->vq = NULL; - dev->feature_len = 0; - dev->num_vq = 0; - dev->running = false; - - /* - * Append to device list. Prepending to a single-linked list is - * easier, but the user expects the devices to be arranged on the bus - * in command-line order. The first network device on the command line - * is eth0, the first block device /dev/vda, etc. - */ - if (devices.lastdev) - devices.lastdev->next = dev; - else - devices.dev = dev; - devices.lastdev = dev; - - return dev; -} - -/* - * Our first setup routine is the console. It's a fairly simple device, but - * UNIX tty handling makes it uglier than it could be. - */ -static void setup_console(void) -{ - struct device *dev; - - /* If we can save the initial standard input settings... */ - if (tcgetattr(STDIN_FILENO, &orig_term) == 0) { - struct termios term = orig_term; - /* - * Then we turn off echo, line buffering and ^C etc: We want a - * raw input stream to the Guest. - */ - term.c_lflag &= ~(ISIG|ICANON|ECHO); - tcsetattr(STDIN_FILENO, TCSANOW, &term); - } - - dev = new_device("console", VIRTIO_ID_CONSOLE); - - /* We store the console state in dev->priv, and initialize it. */ - dev->priv = malloc(sizeof(struct console_abort)); - ((struct console_abort *)dev->priv)->count = 0; - - /* - * The console needs two virtqueues: the input then the output. When - * they put something the input queue, we make sure we're listening to - * stdin. When they put something in the output queue, we write it to - * stdout. - */ - add_virtqueue(dev, VIRTQUEUE_NUM, console_input); - add_virtqueue(dev, VIRTQUEUE_NUM, console_output); - - verbose("device %u: console\n", ++devices.device_num); -} -/*:*/ - -/*M:010 - * Inter-guest networking is an interesting area. Simplest is to have a - * --sharenet=<name> option which opens or creates a named pipe. This can be - * used to send packets to another guest in a 1:1 manner. - * - * More sophisticated is to use one of the tools developed for project like UML - * to do networking. - * - * Faster is to do virtio bonding in kernel. Doing this 1:1 would be - * completely generic ("here's my vring, attach to your vring") and would work - * for any traffic. Of course, namespace and permissions issues need to be - * dealt with. A more sophisticated "multi-channel" virtio_net.c could hide - * multiple inter-guest channels behind one interface, although it would - * require some manner of hotplugging new virtio channels. - * - * Finally, we could use a virtio network switch in the kernel, ie. vhost. -:*/ - -static u32 str2ip(const char *ipaddr) -{ - unsigned int b[4]; - - if (sscanf(ipaddr, "%u.%u.%u.%u", &b[0], &b[1], &b[2], &b[3]) != 4) - errx(1, "Failed to parse IP address '%s'", ipaddr); - return (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | b[3]; -} - -static void str2mac(const char *macaddr, unsigned char mac[6]) -{ - unsigned int m[6]; - if (sscanf(macaddr, "%02x:%02x:%02x:%02x:%02x:%02x", - &m[0], &m[1], &m[2], &m[3], &m[4], &m[5]) != 6) - errx(1, "Failed to parse mac address '%s'", macaddr); - mac[0] = m[0]; - mac[1] = m[1]; - mac[2] = m[2]; - mac[3] = m[3]; - mac[4] = m[4]; - mac[5] = m[5]; -} - -/* - * This code is "adapted" from libbridge: it attaches the Host end of the - * network device to the bridge device specified by the command line. - * - * This is yet another James Morris contribution (I'm an IP-level guy, so I - * dislike bridging), and I just try not to break it. - */ -static void add_to_bridge(int fd, const char *if_name, const char *br_name) -{ - int ifidx; - struct ifreq ifr; - - if (!*br_name) - errx(1, "must specify bridge name"); - - ifidx = if_nametoindex(if_name); - if (!ifidx) - errx(1, "interface %s does not exist!", if_name); - - strncpy(ifr.ifr_name, br_name, IFNAMSIZ); - ifr.ifr_name[IFNAMSIZ-1] = '\0'; - ifr.ifr_ifindex = ifidx; - if (ioctl(fd, SIOCBRADDIF, &ifr) < 0) - err(1, "can't add %s to bridge %s", if_name, br_name); -} - -/* - * This sets up the Host end of the network device with an IP address, brings - * it up so packets will flow, the copies the MAC address into the hwaddr - * pointer. - */ -static void configure_device(int fd, const char *tapif, u32 ipaddr) -{ - struct ifreq ifr; - struct sockaddr_in sin; - - memset(&ifr, 0, sizeof(ifr)); - strcpy(ifr.ifr_name, tapif); - - /* Don't read these incantations. Just cut & paste them like I did! */ - sin.sin_family = AF_INET; - sin.sin_addr.s_addr = htonl(ipaddr); - memcpy(&ifr.ifr_addr, &sin, sizeof(sin)); - if (ioctl(fd, SIOCSIFADDR, &ifr) != 0) - err(1, "Setting %s interface address", tapif); - ifr.ifr_flags = IFF_UP; - if (ioctl(fd, SIOCSIFFLAGS, &ifr) != 0) - err(1, "Bringing interface %s up", tapif); -} - -static int get_tun_device(char tapif[IFNAMSIZ]) -{ - struct ifreq ifr; - int netfd; - - /* Start with this zeroed. Messy but sure. */ - memset(&ifr, 0, sizeof(ifr)); - - /* - * We open the /dev/net/tun device and tell it we want a tap device. A - * tap device is like a tun device, only somehow different. To tell - * the truth, I completely blundered my way through this code, but it - * works now! - */ - netfd = open_or_die("/dev/net/tun", O_RDWR); - ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR; - strcpy(ifr.ifr_name, "tap%d"); - if (ioctl(netfd, TUNSETIFF, &ifr) != 0) - err(1, "configuring /dev/net/tun"); - - if (ioctl(netfd, TUNSETOFFLOAD, - TUN_F_CSUM|TUN_F_TSO4|TUN_F_TSO6|TUN_F_TSO_ECN) != 0) - err(1, "Could not set features for tun device"); - - /* - * We don't need checksums calculated for packets coming in this - * device: trust us! - */ - ioctl(netfd, TUNSETNOCSUM, 1); - - memcpy(tapif, ifr.ifr_name, IFNAMSIZ); - return netfd; -} - -/*L:195 - * Our network is a Host<->Guest network. This can either use bridging or - * routing, but the principle is the same: it uses the "tun" device to inject - * packets into the Host as if they came in from a normal network card. We - * just shunt packets between the Guest and the tun device. - */ -static void setup_tun_net(char *arg) -{ - struct device *dev; - struct net_info *net_info = malloc(sizeof(*net_info)); - int ipfd; - u32 ip = INADDR_ANY; - bool bridging = false; - char tapif[IFNAMSIZ], *p; - struct virtio_net_config conf; - - net_info->tunfd = get_tun_device(tapif); - - /* First we create a new network device. */ - dev = new_device("net", VIRTIO_ID_NET); - dev->priv = net_info; - - /* Network devices need a recv and a send queue, just like console. */ - add_virtqueue(dev, VIRTQUEUE_NUM, net_input); - add_virtqueue(dev, VIRTQUEUE_NUM, net_output); - - /* - * We need a socket to perform the magic network ioctls to bring up the - * tap interface, connect to the bridge etc. Any socket will do! - */ - ipfd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); - if (ipfd < 0) - err(1, "opening IP socket"); - - /* If the command line was --tunnet=bridge:<name> do bridging. */ - if (!strncmp(BRIDGE_PFX, arg, strlen(BRIDGE_PFX))) { - arg += strlen(BRIDGE_PFX); - bridging = true; - } - - /* A mac address may follow the bridge name or IP address */ - p = strchr(arg, ':'); - if (p) { - str2mac(p+1, conf.mac); - add_feature(dev, VIRTIO_NET_F_MAC); - *p = '\0'; - } - - /* arg is now either an IP address or a bridge name */ - if (bridging) - add_to_bridge(ipfd, tapif, arg); - else - ip = str2ip(arg); - - /* Set up the tun device. */ - configure_device(ipfd, tapif, ip); - - /* Expect Guest to handle everything except UFO */ - add_feature(dev, VIRTIO_NET_F_CSUM); - add_feature(dev, VIRTIO_NET_F_GUEST_CSUM); - add_feature(dev, VIRTIO_NET_F_GUEST_TSO4); - add_feature(dev, VIRTIO_NET_F_GUEST_TSO6); - add_feature(dev, VIRTIO_NET_F_GUEST_ECN); - add_feature(dev, VIRTIO_NET_F_HOST_TSO4); - add_feature(dev, VIRTIO_NET_F_HOST_TSO6); - add_feature(dev, VIRTIO_NET_F_HOST_ECN); - /* We handle indirect ring entries */ - add_feature(dev, VIRTIO_RING_F_INDIRECT_DESC); - set_config(dev, sizeof(conf), &conf); - - /* We don't need the socket any more; setup is done. */ - close(ipfd); - - devices.device_num++; - - if (bridging) - verbose("device %u: tun %s attached to bridge: %s\n", - devices.device_num, tapif, arg); - else - verbose("device %u: tun %s: %s\n", - devices.device_num, tapif, arg); -} -/*:*/ - -/* This hangs off device->priv. */ -struct vblk_info { - /* The size of the file. */ - off64_t len; - - /* The file descriptor for the file. */ - int fd; - -}; - -/*L:210 - * The Disk - * - * The disk only has one virtqueue, so it only has one thread. It is really - * simple: the Guest asks for a block number and we read or write that position - * in the file. - * - * Before we serviced each virtqueue in a separate thread, that was unacceptably - * slow: the Guest waits until the read is finished before running anything - * else, even if it could have been doing useful work. - * - * We could have used async I/O, except it's reputed to suck so hard that - * characters actually go missing from your code when you try to use it. - */ -static void blk_request(struct virtqueue *vq) -{ - struct vblk_info *vblk = vq->dev->priv; - unsigned int head, out_num, in_num, wlen; - int ret; - u8 *in; - struct virtio_blk_outhdr *out; - struct iovec iov[vq->vring.num]; - off64_t off; - - /* - * Get the next request, where we normally wait. It triggers the - * interrupt to acknowledge previously serviced requests (if any). - */ - head = wait_for_vq_desc(vq, iov, &out_num, &in_num); - - /* - * Every block request should contain at least one output buffer - * (detailing the location on disk and the type of request) and one - * input buffer (to hold the result). - */ - if (out_num == 0 || in_num == 0) - errx(1, "Bad virtblk cmd %u out=%u in=%u", - head, out_num, in_num); - - out = convert(&iov[0], struct virtio_blk_outhdr); - in = convert(&iov[out_num+in_num-1], u8); - /* - * For historical reasons, block operations are expressed in 512 byte - * "sectors". - */ - off = out->sector * 512; - - /* - * In general the virtio block driver is allowed to try SCSI commands. - * It'd be nice if we supported eject, for example, but we don't. - */ - if (out->type & VIRTIO_BLK_T_SCSI_CMD) { - fprintf(stderr, "Scsi commands unsupported\n"); - *in = VIRTIO_BLK_S_UNSUPP; - wlen = sizeof(*in); - } else if (out->type & VIRTIO_BLK_T_OUT) { - /* - * Write - * - * Move to the right location in the block file. This can fail - * if they try to write past end. - */ - if (lseek64(vblk->fd, off, SEEK_SET) != off) - err(1, "Bad seek to sector %llu", out->sector); - - ret = writev(vblk->fd, iov+1, out_num-1); - verbose("WRITE to sector %llu: %i\n", out->sector, ret); - - /* - * Grr... Now we know how long the descriptor they sent was, we - * make sure they didn't try to write over the end of the block - * file (possibly extending it). - */ - if (ret > 0 && off + ret > vblk->len) { - /* Trim it back to the correct length */ - ftruncate64(vblk->fd, vblk->len); - /* Die, bad Guest, die. */ - errx(1, "Write past end %llu+%u", off, ret); - } - - wlen = sizeof(*in); - *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR); - } else if (out->type & VIRTIO_BLK_T_FLUSH) { - /* Flush */ - ret = fdatasync(vblk->fd); - verbose("FLUSH fdatasync: %i\n", ret); - wlen = sizeof(*in); - *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR); - } else { - /* - * Read - * - * Move to the right location in the block file. This can fail - * if they try to read past end. - */ - if (lseek64(vblk->fd, off, SEEK_SET) != off) - err(1, "Bad seek to sector %llu", out->sector); - - ret = readv(vblk->fd, iov+1, in_num-1); - verbose("READ from sector %llu: %i\n", out->sector, ret); - if (ret >= 0) { - wlen = sizeof(*in) + ret; - *in = VIRTIO_BLK_S_OK; - } else { - wlen = sizeof(*in); - *in = VIRTIO_BLK_S_IOERR; - } - } - - /* Finished that request. */ - add_used(vq, head, wlen); -} - -/*L:198 This actually sets up a virtual block device. */ -static void setup_block_file(const char *filename) -{ - struct device *dev; - struct vblk_info *vblk; - struct virtio_blk_config conf; - - /* Creat the device. */ - dev = new_device("block", VIRTIO_ID_BLOCK); - - /* The device has one virtqueue, where the Guest places requests. */ - add_virtqueue(dev, VIRTQUEUE_NUM, blk_request); - - /* Allocate the room for our own bookkeeping */ - vblk = dev->priv = malloc(sizeof(*vblk)); - - /* First we open the file and store the length. */ - vblk->fd = open_or_die(filename, O_RDWR|O_LARGEFILE); - vblk->len = lseek64(vblk->fd, 0, SEEK_END); - - /* We support FLUSH. */ - add_feature(dev, VIRTIO_BLK_F_FLUSH); - - /* Tell Guest how many sectors this device has. */ - conf.capacity = cpu_to_le64(vblk->len / 512); - - /* - * Tell Guest not to put in too many descriptors at once: two are used - * for the in and out elements. - */ - add_feature(dev, VIRTIO_BLK_F_SEG_MAX); - conf.seg_max = cpu_to_le32(VIRTQUEUE_NUM - 2); - - /* Don't try to put whole struct: we have 8 bit limit. */ - set_config(dev, offsetof(struct virtio_blk_config, geometry), &conf); - - verbose("device %u: virtblock %llu sectors\n", - ++devices.device_num, le64_to_cpu(conf.capacity)); -} - -/*L:211 - * Our random number generator device reads from /dev/random into the Guest's - * input buffers. The usual case is that the Guest doesn't want random numbers - * and so has no buffers although /dev/random is still readable, whereas - * console is the reverse. - * - * The same logic applies, however. - */ -struct rng_info { - int rfd; -}; - -static void rng_input(struct virtqueue *vq) -{ - int len; - unsigned int head, in_num, out_num, totlen = 0; - struct rng_info *rng_info = vq->dev->priv; - struct iovec iov[vq->vring.num]; - - /* First we need a buffer from the Guests's virtqueue. */ - head = wait_for_vq_desc(vq, iov, &out_num, &in_num); - if (out_num) - errx(1, "Output buffers in rng?"); - - /* - * Just like the console write, we loop to cover the whole iovec. - * In this case, short reads actually happen quite a bit. - */ - while (!iov_empty(iov, in_num)) { - len = readv(rng_info->rfd, iov, in_num); - if (len <= 0) - err(1, "Read from /dev/random gave %i", len); - iov_consume(iov, in_num, len); - totlen += len; - } - - /* Tell the Guest about the new input. */ - add_used(vq, head, totlen); -} - -/*L:199 - * This creates a "hardware" random number device for the Guest. - */ -static void setup_rng(void) -{ - struct device *dev; - struct rng_info *rng_info = malloc(sizeof(*rng_info)); - - /* Our device's privat info simply contains the /dev/random fd. */ - rng_info->rfd = open_or_die("/dev/random", O_RDONLY); - - /* Create the new device. */ - dev = new_device("rng", VIRTIO_ID_RNG); - dev->priv = rng_info; - - /* The device has one virtqueue, where the Guest places inbufs. */ - add_virtqueue(dev, VIRTQUEUE_NUM, rng_input); - - verbose("device %u: rng\n", devices.device_num++); -} -/* That's the end of device setup. */ - -/*L:230 Reboot is pretty easy: clean up and exec() the Launcher afresh. */ -static void __attribute__((noreturn)) restart_guest(void) -{ - unsigned int i; - - /* - * Since we don't track all open fds, we simply close everything beyond - * stderr. - */ - for (i = 3; i < FD_SETSIZE; i++) - close(i); - - /* Reset all the devices (kills all threads). */ - cleanup_devices(); - - execv(main_args[0], main_args); - err(1, "Could not exec %s", main_args[0]); -} - -/*L:220 - * Finally we reach the core of the Launcher which runs the Guest, serves - * its input and output, and finally, lays it to rest. - */ -static void __attribute__((noreturn)) run_guest(void) -{ - for (;;) { - unsigned long notify_addr; - int readval; - - /* We read from the /dev/lguest device to run the Guest. */ - readval = pread(lguest_fd, ¬ify_addr, - sizeof(notify_addr), cpu_id); - - /* One unsigned long means the Guest did HCALL_NOTIFY */ - if (readval == sizeof(notify_addr)) { - verbose("Notify on address %#lx\n", notify_addr); - handle_output(notify_addr); - /* ENOENT means the Guest died. Reading tells us why. */ - } else if (errno == ENOENT) { - char reason[1024] = { 0 }; - pread(lguest_fd, reason, sizeof(reason)-1, cpu_id); - errx(1, "%s", reason); - /* ERESTART means that we need to reboot the guest */ - } else if (errno == ERESTART) { - restart_guest(); - /* Anything else means a bug or incompatible change. */ - } else - err(1, "Running guest failed"); - } -} -/*L:240 - * This is the end of the Launcher. The good news: we are over halfway - * through! The bad news: the most fiendish part of the code still lies ahead - * of us. - * - * Are you ready? Take a deep breath and join me in the core of the Host, in - * "make Host". -:*/ - -static struct option opts[] = { - { "verbose", 0, NULL, 'v' }, - { "tunnet", 1, NULL, 't' }, - { "block", 1, NULL, 'b' }, - { "rng", 0, NULL, 'r' }, - { "initrd", 1, NULL, 'i' }, - { "username", 1, NULL, 'u' }, - { "chroot", 1, NULL, 'c' }, - { NULL }, -}; -static void usage(void) -{ - errx(1, "Usage: lguest [--verbose] " - "[--tunnet=(<ipaddr>:<macaddr>|bridge:<bridgename>:<macaddr>)\n" - "|--block=<filename>|--initrd=<filename>]...\n" - "<mem-in-mb> vmlinux [args...]"); -} - -/*L:105 The main routine is where the real work begins: */ -int main(int argc, char *argv[]) -{ - /* Memory, code startpoint and size of the (optional) initrd. */ - unsigned long mem = 0, start, initrd_size = 0; - /* Two temporaries. */ - int i, c; - /* The boot information for the Guest. */ - struct boot_params *boot; - /* If they specify an initrd file to load. */ - const char *initrd_name = NULL; - - /* Password structure for initgroups/setres[gu]id */ - struct passwd *user_details = NULL; - - /* Directory to chroot to */ - char *chroot_path = NULL; - - /* Save the args: we "reboot" by execing ourselves again. */ - main_args = argv; - - /* - * First we initialize the device list. We keep a pointer to the last - * device, and the next interrupt number to use for devices (1: - * remember that 0 is used by the timer). - */ - devices.lastdev = NULL; - devices.next_irq = 1; - - /* We're CPU 0. In fact, that's the only CPU possible right now. */ - cpu_id = 0; - - /* - * We need to know how much memory so we can set up the device - * descriptor and memory pages for the devices as we parse the command - * line. So we quickly look through the arguments to find the amount - * of memory now. - */ - for (i = 1; i < argc; i++) { - if (argv[i][0] != '-') { - mem = atoi(argv[i]) * 1024 * 1024; - /* - * We start by mapping anonymous pages over all of - * guest-physical memory range. This fills it with 0, - * and ensures that the Guest won't be killed when it - * tries to access it. - */ - guest_base = map_zeroed_pages(mem / getpagesize() - + DEVICE_PAGES); - guest_limit = mem; - guest_max = mem + DEVICE_PAGES*getpagesize(); - devices.descpage = get_pages(1); - break; - } - } - - /* The options are fairly straight-forward */ - while ((c = getopt_long(argc, argv, "v", opts, NULL)) != EOF) { - switch (c) { - case 'v': - verbose = true; - break; - case 't': - setup_tun_net(optarg); - break; - case 'b': - setup_block_file(optarg); - break; - case 'r': - setup_rng(); - break; - case 'i': - initrd_name = optarg; - break; - case 'u': - user_details = getpwnam(optarg); - if (!user_details) - err(1, "getpwnam failed, incorrect username?"); - break; - case 'c': - chroot_path = optarg; - break; - default: - warnx("Unknown argument %s", argv[optind]); - usage(); - } - } - /* - * After the other arguments we expect memory and kernel image name, - * followed by command line arguments for the kernel. - */ - if (optind + 2 > argc) - usage(); - - verbose("Guest base is at %p\n", guest_base); - - /* We always have a console device */ - setup_console(); - - /* Now we load the kernel */ - start = load_kernel(open_or_die(argv[optind+1], O_RDONLY)); - - /* Boot information is stashed at physical address 0 */ - boot = from_guest_phys(0); - - /* Map the initrd image if requested (at top of physical memory) */ - if (initrd_name) { - initrd_size = load_initrd(initrd_name, mem); - /* - * These are the location in the Linux boot header where the - * start and size of the initrd are expected to be found. - */ - boot->hdr.ramdisk_image = mem - initrd_size; - boot->hdr.ramdisk_size = initrd_size; - /* The bootloader type 0xFF means "unknown"; that's OK. */ - boot->hdr.type_of_loader = 0xFF; - } - - /* - * The Linux boot header contains an "E820" memory map: ours is a - * simple, single region. - */ - boot->e820_entries = 1; - boot->e820_map[0] = ((struct e820entry) { 0, mem, E820_RAM }); - /* - * The boot header contains a command line pointer: we put the command - * line after the boot header. - */ - boot->hdr.cmd_line_ptr = to_guest_phys(boot + 1); - /* We use a simple helper to copy the arguments separated by spaces. */ - concat((char *)(boot + 1), argv+optind+2); - - /* Set kernel alignment to 16M (CONFIG_PHYSICAL_ALIGN) */ - boot->hdr.kernel_alignment = 0x1000000; - - /* Boot protocol version: 2.07 supports the fields for lguest. */ - boot->hdr.version = 0x207; - - /* The hardware_subarch value of "1" tells the Guest it's an lguest. */ - boot->hdr.hardware_subarch = 1; - - /* Tell the entry path not to try to reload segment registers. */ - boot->hdr.loadflags |= KEEP_SEGMENTS; - - /* We tell the kernel to initialize the Guest. */ - tell_kernel(start); - - /* Ensure that we terminate if a device-servicing child dies. */ - signal(SIGCHLD, kill_launcher); - - /* If we exit via err(), this kills all the threads, restores tty. */ - atexit(cleanup_devices); - - /* If requested, chroot to a directory */ - if (chroot_path) { - if (chroot(chroot_path) != 0) - err(1, "chroot(\"%s\") failed", chroot_path); - - if (chdir("/") != 0) - err(1, "chdir(\"/\") failed"); - - verbose("chroot done\n"); - } - - /* If requested, drop privileges */ - if (user_details) { - uid_t u; - gid_t g; - - u = user_details->pw_uid; - g = user_details->pw_gid; - - if (initgroups(user_details->pw_name, g) != 0) - err(1, "initgroups failed"); - - if (setresgid(g, g, g) != 0) - err(1, "setresgid failed"); - - if (setresuid(u, u, u) != 0) - err(1, "setresuid failed"); - - verbose("Dropping privileges completed\n"); - } - - /* Finally, run the Guest. This doesn't return. */ - run_guest(); -} -/*:*/ - -/*M:999 - * Mastery is done: you now know everything I do. - * - * But surely you have seen code, features and bugs in your wanderings which - * you now yearn to attack? That is the real game, and I look forward to you - * patching and forking lguest into the Your-Name-Here-visor. - * - * Farewell, and good coding! - * Rusty Russell. - */ diff --git a/Documentation/virtual/lguest/lguest.txt b/Documentation/virtual/lguest/lguest.txt deleted file mode 100644 index bff0c554485d..000000000000 --- a/Documentation/virtual/lguest/lguest.txt +++ /dev/null @@ -1,129 +0,0 @@ - __ - (___()'`; Rusty's Remarkably Unreliable Guide to Lguest - /, /` - or, A Young Coder's Illustrated Hypervisor - \\"--\\ http://lguest.ozlabs.org - -Lguest is designed to be a minimal 32-bit x86 hypervisor for the Linux kernel, -for Linux developers and users to experiment with virtualization with the -minimum of complexity. Nonetheless, it should have sufficient features to -make it useful for specific tasks, and, of course, you are encouraged to fork -and enhance it (see drivers/lguest/README). - -Features: - -- Kernel module which runs in a normal kernel. -- Simple I/O model for communication. -- Simple program to create new guests. -- Logo contains cute puppies: http://lguest.ozlabs.org - -Developer features: - -- Fun to hack on. -- No ABI: being tied to a specific kernel anyway, you can change anything. -- Many opportunities for improvement or feature implementation. - -Running Lguest: - -- The easiest way to run lguest is to use same kernel as guest and host. - You can configure them differently, but usually it's easiest not to. - - You will need to configure your kernel with the following options: - - "General setup": - "Prompt for development and/or incomplete code/drivers" = Y - (CONFIG_EXPERIMENTAL=y) - - "Processor type and features": - "Paravirtualized guest support" = Y - "Lguest guest support" = Y - "High Memory Support" = off/4GB - "Alignment value to which kernel should be aligned" = 0x100000 - (CONFIG_PARAVIRT=y, CONFIG_LGUEST_GUEST=y, CONFIG_HIGHMEM64G=n and - CONFIG_PHYSICAL_ALIGN=0x100000) - - "Device Drivers": - "Block devices" - "Virtio block driver (EXPERIMENTAL)" = M/Y - "Network device support" - "Universal TUN/TAP device driver support" = M/Y - "Virtio network driver (EXPERIMENTAL)" = M/Y - (CONFIG_VIRTIO_BLK=m, CONFIG_VIRTIO_NET=m and CONFIG_TUN=m) - - "Virtualization" - "Linux hypervisor example code" = M/Y - (CONFIG_LGUEST=m) - -- A tool called "lguest" is available in this directory: type "make" - to build it. If you didn't build your kernel in-tree, use "make - O=<builddir>". - -- Create or find a root disk image. There are several useful ones - around, such as the xm-test tiny root image at - http://xm-test.xensource.com/ramdisks/initrd-1.1-i386.img - - For more serious work, I usually use a distribution ISO image and - install it under qemu, then make multiple copies: - - dd if=/dev/zero of=rootfile bs=1M count=2048 - qemu -cdrom image.iso -hda rootfile -net user -net nic -boot d - - Make sure that you install a getty on /dev/hvc0 if you want to log in on the - console! - -- "modprobe lg" if you built it as a module. - -- Run an lguest as root: - - Documentation/virtual/lguest/lguest 64 vmlinux --tunnet=192.168.19.1 \ - --block=rootfile root=/dev/vda - - Explanation: - 64: the amount of memory to use, in MB. - - vmlinux: the kernel image found in the top of your build directory. You - can also use a standard bzImage. - - --tunnet=192.168.19.1: configures a "tap" device for networking with this - IP address. - - --block=rootfile: a file or block device which becomes /dev/vda - inside the guest. - - root=/dev/vda: this (and anything else on the command line) are - kernel boot parameters. - -- Configuring networking. I usually have the host masquerade, using - "iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE" and "echo 1 > - /proc/sys/net/ipv4/ip_forward". In this example, I would configure - eth0 inside the guest at 192.168.19.2. - - Another method is to bridge the tap device to an external interface - using --tunnet=bridge:<bridgename>, and perhaps run dhcp on the guest - to obtain an IP address. The bridge needs to be configured first: - this option simply adds the tap interface to it. - - A simple example on my system: - - ifconfig eth0 0.0.0.0 - brctl addbr lg0 - ifconfig lg0 up - brctl addif lg0 eth0 - dhclient lg0 - - Then use --tunnet=bridge:lg0 when launching the guest. - - See: - - http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge - - for general information on how to get bridging to work. - -- Random number generation. Using the --rng option will provide a - /dev/hwrng in the guest that will read from the host's /dev/random. - Use this option in conjunction with rng-tools (see ../hw_random.txt) - to provide entropy to the guest kernel's /dev/random. - -There is a helpful mailing list at http://ozlabs.org/mailman/listinfo/lguest - -Good luck! -Rusty Russell rusty@rustcorp.com.au. diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt index f464f47bc60d..6752870c4970 100644 --- a/Documentation/vm/slub.txt +++ b/Documentation/vm/slub.txt @@ -117,7 +117,7 @@ can be influenced by kernel parameters: slub_min_objects=x (default 4) slub_min_order=x (default 0) -slub_max_order=x (default 1) +slub_max_order=x (default 3 (PAGE_ALLOC_COSTLY_ORDER)) slub_min_objects allows to specify how many objects must at least fit into one slab in order for the allocation order to be acceptable. @@ -131,7 +131,10 @@ slub_min_objects. slub_max_order specified the order at which slub_min_objects should no longer be checked. This is useful to avoid SLUB trying to generate super large order pages to fit slub_min_objects of a slab cache with -large object sizes into one high order page. +large object sizes into one high order page. Setting command line +parameter debug_guardpage_minorder=N (N > 0), forces setting +slub_max_order to 0, what cause minimum possible order of slabs +allocation. SLUB Debug output ----------------- diff --git a/Documentation/watchdog/00-INDEX b/Documentation/watchdog/00-INDEX index fc51128071c2..fc9082a1477a 100644 --- a/Documentation/watchdog/00-INDEX +++ b/Documentation/watchdog/00-INDEX @@ -1,5 +1,7 @@ 00-INDEX - this file. +convert_drivers_to_kernel_api.txt + - how-to for converting old watchdog drivers to the new kernel API. hpwdt.txt - information on the HP iLO2 NMI watchdog pcwd-watchdog.txt diff --git a/Documentation/watchdog/convert_drivers_to_kernel_api.txt b/Documentation/watchdog/convert_drivers_to_kernel_api.txt index ae1e90036d06..be8119bb15d2 100644 --- a/Documentation/watchdog/convert_drivers_to_kernel_api.txt +++ b/Documentation/watchdog/convert_drivers_to_kernel_api.txt @@ -163,6 +163,25 @@ Here is a simple example for a watchdog device: +}; +Handle the 'nowayout' feature +----------------------------- + +A few drivers use nowayout statically, i.e. there is no module parameter for it +and only CONFIG_WATCHDOG_NOWAYOUT determines if the feature is going to be +used. This needs to be converted by initializing the status variable of the +watchdog_device like this: + + .status = WATCHDOG_NOWAYOUT_INIT_STATUS, + +Most drivers, however, also allow runtime configuration of nowayout, usually +by adding a module parameter. The conversion for this would be something like: + + watchdog_set_nowayout(&s3c2410_wdd, nowayout); + +The module parameter itself needs to stay, everything else related to nowayout +can go, though. This will likely be some code in open(), close() or write(). + + Register the watchdog device ---------------------------- diff --git a/Documentation/watchdog/watchdog-kernel-api.txt b/Documentation/watchdog/watchdog-kernel-api.txt index 4f7c894244d2..4b93c28e35c6 100644 --- a/Documentation/watchdog/watchdog-kernel-api.txt +++ b/Documentation/watchdog/watchdog-kernel-api.txt @@ -1,6 +1,6 @@ The Linux WatchDog Timer Driver Core kernel API. =============================================== -Last reviewed: 22-Jul-2011 +Last reviewed: 29-Nov-2011 Wim Van Sebroeck <wim@iguana.be> @@ -142,6 +142,14 @@ bit-operations. The status bits that are defined are: * WDOG_NO_WAY_OUT: this bit stores the nowayout setting for the watchdog. If this bit is set then the watchdog timer will not be able to stop. + To set the WDOG_NO_WAY_OUT status bit (before registering your watchdog + timer device) you can either: + * set it statically in your watchdog_device struct with + .status = WATCHDOG_NOWAYOUT_INIT_STATUS, + (this will set the value the same as CONFIG_WATCHDOG_NOWAYOUT) or + * use the following helper function: + static inline void watchdog_set_nowayout(struct watchdog_device *wdd, int nowayout) + Note: The WatchDog Timer Driver Core supports the magic close feature and the nowayout feature. To use the magic close feature you must set the WDIOF_MAGICCLOSE bit in the options field of the watchdog's info structure. |