summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2013-09-05 18:07:32 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2013-09-05 18:07:32 -0700
commit2e032852245b3dcfe5461d7353e34eb6da095ccf (patch)
tree69f9fdf03b54d76bb539096e0ec96e91ea8216b1 /Documentation
parent356f9e74ffaafd11741589a9aa21d6c9d2721417 (diff)
parent141b97433d77e39ac3ac111a7b3852192035259c (diff)
downloadlinux-2e032852245b3dcfe5461d7353e34eb6da095ccf.tar.bz2
Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM updates from Russell King: "This set includes adding support for Neon acceleration of RAID6 XOR code from Ard Biesheuvel, cache flushing and barrier updates from Will Deacon, and a cleanup to the ARM debug code which reduces the amount of code by about 500 lines. A few other cleanups, such as constifying the machine descriptors which already shouldn't be written to, cleaning up the printing of the L2 cache size" * 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (55 commits) ARM: 7826/1: debug: support debug ll on hisilicon soc ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo ARM: 7829/1: Add ".text.unlikely" and ".text.hot" to arm unwind tables ARM: 7828/1: ARMv7-M: implement restart routine common to all v7-M machines ARM: 7827/1: highbank: fix debug uart virtual address for LPAE ARM: 7823/1: errata: workaround Cortex-A15 erratum 773022 ARM: 7806/1: allow DEBUG_UNCOMPRESS for Tegra ARM: 7793/1: debug: use generic option for ep93xx PL10x debug port ARM: debug: move SPEAr debug to generic PL01x code ARM: debug: move davinci debug to generic 8250 code ARM: debug: move keystone debug to generic 8250 code ARM: debug: remove DEBUG_ROCKCHIP_UART ARM: debug: provide generic option choices for 8250 and PL01x ports ARM: debug: move PL01X debug include into arch/arm/include/debug/ ARM: debug: provide PL01x debug uart phys/virt address configuration options ARM: debug: add support for word accesses to debug/8250.S ARM: debug: move 8250 debug include into arch/arm/include/debug/ ARM: debug: provide 8250 debug uart phys/virt address configuration options ARM: debug: provide 8250 debug uart register shift configuration option ARM: debug: provide 8250 debug uart flow control configuration option ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/arm/Booting42
-rw-r--r--Documentation/arm/kernel_mode_neon.txt121
-rw-r--r--Documentation/devicetree/bindings/arm/l2cc.txt4
3 files changed, 156 insertions, 11 deletions
diff --git a/Documentation/arm/Booting b/Documentation/arm/Booting
index 0c1f475fdf36..371814a36719 100644
--- a/Documentation/arm/Booting
+++ b/Documentation/arm/Booting
@@ -18,7 +18,8 @@ following:
2. Initialise one serial port.
3. Detect the machine type.
4. Setup the kernel tagged list.
-5. Call the kernel image.
+5. Load initramfs.
+6. Call the kernel image.
1. Setup and initialise RAM
@@ -120,12 +121,27 @@ tagged list.
The boot loader must pass at a minimum the size and location of the
system memory, and the root filesystem location. The dtb must be
placed in a region of memory where the kernel decompressor will not
-overwrite it. The recommended placement is in the first 16KiB of RAM
-with the caveat that it may not be located at physical address 0 since
-the kernel interprets a value of 0 in r2 to mean neither a tagged list
-nor a dtb were passed.
+overwrite it, whilst remaining within the region which will be covered
+by the kernel's low-memory mapping.
-5. Calling the kernel image
+A safe location is just above the 128MiB boundary from start of RAM.
+
+5. Load initramfs.
+------------------
+
+Existing boot loaders: OPTIONAL
+New boot loaders: OPTIONAL
+
+If an initramfs is in use then, as with the dtb, it must be placed in
+a region of memory where the kernel decompressor will not overwrite it
+while also with the region which will be covered by the kernel's
+low-memory mapping.
+
+A safe location is just above the device tree blob which itself will
+be loaded just above the 128MiB boundary from the start of RAM as
+recommended above.
+
+6. Calling the kernel image
---------------------------
Existing boot loaders: MANDATORY
@@ -136,11 +152,17 @@ is stored in flash, and is linked correctly to be run from flash,
then it is legal for the boot loader to call the zImage in flash
directly.
-The zImage may also be placed in system RAM (at any location) and
-called there. Note that the kernel uses 16K of RAM below the image
-to store page tables. The recommended placement is 32KiB into RAM.
+The zImage may also be placed in system RAM and called there. The
+kernel should be placed in the first 128MiB of RAM. It is recommended
+that it is loaded above 32MiB in order to avoid the need to relocate
+prior to decompression, which will make the boot process slightly
+faster.
+
+When booting a raw (non-zImage) kernel the constraints are tighter.
+In this case the kernel must be loaded at an offset into system equal
+to TEXT_OFFSET - PAGE_OFFSET.
-In either case, the following conditions must be met:
+In any case, the following conditions must be met:
- Quiesce all DMA capable devices so that memory does not get
corrupted by bogus network packets or disk data. This will save
diff --git a/Documentation/arm/kernel_mode_neon.txt b/Documentation/arm/kernel_mode_neon.txt
new file mode 100644
index 000000000000..525452726d31
--- /dev/null
+++ b/Documentation/arm/kernel_mode_neon.txt
@@ -0,0 +1,121 @@
+Kernel mode NEON
+================
+
+TL;DR summary
+-------------
+* Use only NEON instructions, or VFP instructions that don't rely on support
+ code
+* Isolate your NEON code in a separate compilation unit, and compile it with
+ '-mfpu=neon -mfloat-abi=softfp'
+* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
+ NEON code
+* Don't sleep in your NEON code, and be aware that it will be executed with
+ preemption disabled
+
+
+Introduction
+------------
+It is possible to use NEON instructions (and in some cases, VFP instructions) in
+code that runs in kernel mode. However, for performance reasons, the NEON/VFP
+register file is not preserved and restored at every context switch or taken
+exception like the normal register file is, so some manual intervention is
+required. Furthermore, special care is required for code that may sleep [i.e.,
+may call schedule()], as NEON or VFP instructions will be executed in a
+non-preemptible section for reasons outlined below.
+
+
+Lazy preserve and restore
+-------------------------
+The NEON/VFP register file is managed using lazy preserve (on UP systems) and
+lazy restore (on both SMP and UP systems). This means that the register file is
+kept 'live', and is only preserved and restored when multiple tasks are
+contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to
+another core). Lazy restore is implemented by disabling the NEON/VFP unit after
+every context switch, resulting in a trap when subsequently a NEON/VFP
+instruction is issued, allowing the kernel to step in and perform the restore if
+necessary.
+
+Any use of the NEON/VFP unit in kernel mode should not interfere with this, so
+it is required to do an 'eager' preserve of the NEON/VFP register file, and
+enable the NEON/VFP unit explicitly so no exceptions are generated on first
+subsequent use. This is handled by the function kernel_neon_begin(), which
+should be called before any kernel mode NEON or VFP instructions are issued.
+Likewise, the NEON/VFP unit should be disabled again after use to make sure user
+mode will hit the lazy restore trap upon next use. This is handled by the
+function kernel_neon_end().
+
+
+Interruptions in kernel mode
+----------------------------
+For reasons of performance and simplicity, it was decided that there shall be no
+preserve/restore mechanism for the kernel mode NEON/VFP register contents. This
+implies that interruptions of a kernel mode NEON section can only be allowed if
+they are guaranteed not to touch the NEON/VFP registers. For this reason, the
+following rules and restrictions apply in the kernel:
+* NEON/VFP code is not allowed in interrupt context;
+* NEON/VFP code is not allowed to sleep;
+* NEON/VFP code is executed with preemption disabled.
+
+If latency is a concern, it is possible to put back to back calls to
+kernel_neon_end() and kernel_neon_begin() in places in your code where none of
+the NEON registers are live. (Additional calls to kernel_neon_begin() should be
+reasonably cheap if no context switch occurred in the meantime)
+
+
+VFP and support code
+--------------------
+Earlier versions of VFP (prior to version 3) rely on software support for things
+like IEEE-754 compliant underflow handling etc. When the VFP unit needs such
+software assistance, it signals the kernel by raising an undefined instruction
+exception. The kernel responds by inspecting the VFP control registers and the
+current instruction and arguments, and emulates the instruction in software.
+
+Such software assistance is currently not implemented for VFP instructions
+executed in kernel mode. If such a condition is encountered, the kernel will
+fail and generate an OOPS.
+
+
+Separating NEON code from ordinary code
+---------------------------------------
+The compiler is not aware of the special significance of kernel_neon_begin() and
+kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions
+between calls to these respective functions. Furthermore, GCC may generate NEON
+instructions of its own at -O3 level if -mfpu=neon is selected, and even if the
+kernel is currently compiled at -O2, future changes may result in NEON/VFP
+instructions appearing in unexpected places if no special care is taken.
+
+Therefore, the recommended and only supported way of using NEON/VFP in the
+kernel is by adhering to the following rules:
+* isolate the NEON code in a separate compilation unit and compile it with
+ '-mfpu=neon -mfloat-abi=softfp';
+* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
+ into the unit containing the NEON code from a compilation unit which is *not*
+ built with the GCC flag '-mfpu=neon' set.
+
+As the kernel is compiled with '-msoft-float', the above will guarantee that
+both NEON and VFP instructions will only ever appear in designated compilation
+units at any optimization level.
+
+
+NEON assembler
+--------------
+NEON assembler is supported with no additional caveats as long as the rules
+above are followed.
+
+
+NEON code generated by GCC
+--------------------------
+The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit
+parallelism, and generates NEON code from ordinary C source code. This is fully
+supported as long as the rules above are followed.
+
+
+NEON intrinsics
+---------------
+NEON intrinsics are also supported. However, as code using NEON intrinsics
+relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should
+observe the following in addition to the rules above:
+* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
+ uses its builtin version of <stdint.h> (this is a C99 header which the kernel
+ does not supply);
+* Include <arm_neon.h> last, or at least after <linux/types.h>
diff --git a/Documentation/devicetree/bindings/arm/l2cc.txt b/Documentation/devicetree/bindings/arm/l2cc.txt
index 69ddf9fad2dc..c0c7626fd0ff 100644
--- a/Documentation/devicetree/bindings/arm/l2cc.txt
+++ b/Documentation/devicetree/bindings/arm/l2cc.txt
@@ -16,9 +16,11 @@ Required properties:
performs the same operation).
"marvell,"aurora-outer-cache: Marvell Controller designed to be
compatible with the ARM one with outer cache mode.
- "bcm,bcm11351-a2-pl310-cache": For Broadcom bcm11351 chipset where an
+ "brcm,bcm11351-a2-pl310-cache": For Broadcom bcm11351 chipset where an
offset needs to be added to the address before passing down to the L2
cache controller
+ "bcm,bcm11351-a2-pl310-cache": DEPRECATED by
+ "brcm,bcm11351-a2-pl310-cache"
- cache-unified : Specifies the cache is a unified cache.
- cache-level : Should be set to 2 for a level 2 cache.
- reg : Physical base address and size of cache controller's memory mapped