diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2013-09-05 18:07:32 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2013-09-05 18:07:32 -0700 |
commit | 2e032852245b3dcfe5461d7353e34eb6da095ccf (patch) | |
tree | 69f9fdf03b54d76bb539096e0ec96e91ea8216b1 /Documentation | |
parent | 356f9e74ffaafd11741589a9aa21d6c9d2721417 (diff) | |
parent | 141b97433d77e39ac3ac111a7b3852192035259c (diff) | |
download | linux-2e032852245b3dcfe5461d7353e34eb6da095ccf.tar.bz2 |
Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM updates from Russell King:
"This set includes adding support for Neon acceleration of RAID6 XOR
code from Ard Biesheuvel, cache flushing and barrier updates from Will
Deacon, and a cleanup to the ARM debug code which reduces the amount
of code by about 500 lines.
A few other cleanups, such as constifying the machine descriptors
which already shouldn't be written to, cleaning up the printing of the
L2 cache size"
* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (55 commits)
ARM: 7826/1: debug: support debug ll on hisilicon soc
ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo
ARM: 7829/1: Add ".text.unlikely" and ".text.hot" to arm unwind tables
ARM: 7828/1: ARMv7-M: implement restart routine common to all v7-M machines
ARM: 7827/1: highbank: fix debug uart virtual address for LPAE
ARM: 7823/1: errata: workaround Cortex-A15 erratum 773022
ARM: 7806/1: allow DEBUG_UNCOMPRESS for Tegra
ARM: 7793/1: debug: use generic option for ep93xx PL10x debug port
ARM: debug: move SPEAr debug to generic PL01x code
ARM: debug: move davinci debug to generic 8250 code
ARM: debug: move keystone debug to generic 8250 code
ARM: debug: remove DEBUG_ROCKCHIP_UART
ARM: debug: provide generic option choices for 8250 and PL01x ports
ARM: debug: move PL01X debug include into arch/arm/include/debug/
ARM: debug: provide PL01x debug uart phys/virt address configuration options
ARM: debug: add support for word accesses to debug/8250.S
ARM: debug: move 8250 debug include into arch/arm/include/debug/
ARM: debug: provide 8250 debug uart phys/virt address configuration options
ARM: debug: provide 8250 debug uart register shift configuration option
ARM: debug: provide 8250 debug uart flow control configuration option
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/arm/Booting | 42 | ||||
-rw-r--r-- | Documentation/arm/kernel_mode_neon.txt | 121 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/arm/l2cc.txt | 4 |
3 files changed, 156 insertions, 11 deletions
diff --git a/Documentation/arm/Booting b/Documentation/arm/Booting index 0c1f475fdf36..371814a36719 100644 --- a/Documentation/arm/Booting +++ b/Documentation/arm/Booting @@ -18,7 +18,8 @@ following: 2. Initialise one serial port. 3. Detect the machine type. 4. Setup the kernel tagged list. -5. Call the kernel image. +5. Load initramfs. +6. Call the kernel image. 1. Setup and initialise RAM @@ -120,12 +121,27 @@ tagged list. The boot loader must pass at a minimum the size and location of the system memory, and the root filesystem location. The dtb must be placed in a region of memory where the kernel decompressor will not -overwrite it. The recommended placement is in the first 16KiB of RAM -with the caveat that it may not be located at physical address 0 since -the kernel interprets a value of 0 in r2 to mean neither a tagged list -nor a dtb were passed. +overwrite it, whilst remaining within the region which will be covered +by the kernel's low-memory mapping. -5. Calling the kernel image +A safe location is just above the 128MiB boundary from start of RAM. + +5. Load initramfs. +------------------ + +Existing boot loaders: OPTIONAL +New boot loaders: OPTIONAL + +If an initramfs is in use then, as with the dtb, it must be placed in +a region of memory where the kernel decompressor will not overwrite it +while also with the region which will be covered by the kernel's +low-memory mapping. + +A safe location is just above the device tree blob which itself will +be loaded just above the 128MiB boundary from the start of RAM as +recommended above. + +6. Calling the kernel image --------------------------- Existing boot loaders: MANDATORY @@ -136,11 +152,17 @@ is stored in flash, and is linked correctly to be run from flash, then it is legal for the boot loader to call the zImage in flash directly. -The zImage may also be placed in system RAM (at any location) and -called there. Note that the kernel uses 16K of RAM below the image -to store page tables. The recommended placement is 32KiB into RAM. +The zImage may also be placed in system RAM and called there. The +kernel should be placed in the first 128MiB of RAM. It is recommended +that it is loaded above 32MiB in order to avoid the need to relocate +prior to decompression, which will make the boot process slightly +faster. + +When booting a raw (non-zImage) kernel the constraints are tighter. +In this case the kernel must be loaded at an offset into system equal +to TEXT_OFFSET - PAGE_OFFSET. -In either case, the following conditions must be met: +In any case, the following conditions must be met: - Quiesce all DMA capable devices so that memory does not get corrupted by bogus network packets or disk data. This will save diff --git a/Documentation/arm/kernel_mode_neon.txt b/Documentation/arm/kernel_mode_neon.txt new file mode 100644 index 000000000000..525452726d31 --- /dev/null +++ b/Documentation/arm/kernel_mode_neon.txt @@ -0,0 +1,121 @@ +Kernel mode NEON +================ + +TL;DR summary +------------- +* Use only NEON instructions, or VFP instructions that don't rely on support + code +* Isolate your NEON code in a separate compilation unit, and compile it with + '-mfpu=neon -mfloat-abi=softfp' +* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your + NEON code +* Don't sleep in your NEON code, and be aware that it will be executed with + preemption disabled + + +Introduction +------------ +It is possible to use NEON instructions (and in some cases, VFP instructions) in +code that runs in kernel mode. However, for performance reasons, the NEON/VFP +register file is not preserved and restored at every context switch or taken +exception like the normal register file is, so some manual intervention is +required. Furthermore, special care is required for code that may sleep [i.e., +may call schedule()], as NEON or VFP instructions will be executed in a +non-preemptible section for reasons outlined below. + + +Lazy preserve and restore +------------------------- +The NEON/VFP register file is managed using lazy preserve (on UP systems) and +lazy restore (on both SMP and UP systems). This means that the register file is +kept 'live', and is only preserved and restored when multiple tasks are +contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to +another core). Lazy restore is implemented by disabling the NEON/VFP unit after +every context switch, resulting in a trap when subsequently a NEON/VFP +instruction is issued, allowing the kernel to step in and perform the restore if +necessary. + +Any use of the NEON/VFP unit in kernel mode should not interfere with this, so +it is required to do an 'eager' preserve of the NEON/VFP register file, and +enable the NEON/VFP unit explicitly so no exceptions are generated on first +subsequent use. This is handled by the function kernel_neon_begin(), which +should be called before any kernel mode NEON or VFP instructions are issued. +Likewise, the NEON/VFP unit should be disabled again after use to make sure user +mode will hit the lazy restore trap upon next use. This is handled by the +function kernel_neon_end(). + + +Interruptions in kernel mode +---------------------------- +For reasons of performance and simplicity, it was decided that there shall be no +preserve/restore mechanism for the kernel mode NEON/VFP register contents. This +implies that interruptions of a kernel mode NEON section can only be allowed if +they are guaranteed not to touch the NEON/VFP registers. For this reason, the +following rules and restrictions apply in the kernel: +* NEON/VFP code is not allowed in interrupt context; +* NEON/VFP code is not allowed to sleep; +* NEON/VFP code is executed with preemption disabled. + +If latency is a concern, it is possible to put back to back calls to +kernel_neon_end() and kernel_neon_begin() in places in your code where none of +the NEON registers are live. (Additional calls to kernel_neon_begin() should be +reasonably cheap if no context switch occurred in the meantime) + + +VFP and support code +-------------------- +Earlier versions of VFP (prior to version 3) rely on software support for things +like IEEE-754 compliant underflow handling etc. When the VFP unit needs such +software assistance, it signals the kernel by raising an undefined instruction +exception. The kernel responds by inspecting the VFP control registers and the +current instruction and arguments, and emulates the instruction in software. + +Such software assistance is currently not implemented for VFP instructions +executed in kernel mode. If such a condition is encountered, the kernel will +fail and generate an OOPS. + + +Separating NEON code from ordinary code +--------------------------------------- +The compiler is not aware of the special significance of kernel_neon_begin() and +kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions +between calls to these respective functions. Furthermore, GCC may generate NEON +instructions of its own at -O3 level if -mfpu=neon is selected, and even if the +kernel is currently compiled at -O2, future changes may result in NEON/VFP +instructions appearing in unexpected places if no special care is taken. + +Therefore, the recommended and only supported way of using NEON/VFP in the +kernel is by adhering to the following rules: +* isolate the NEON code in a separate compilation unit and compile it with + '-mfpu=neon -mfloat-abi=softfp'; +* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls + into the unit containing the NEON code from a compilation unit which is *not* + built with the GCC flag '-mfpu=neon' set. + +As the kernel is compiled with '-msoft-float', the above will guarantee that +both NEON and VFP instructions will only ever appear in designated compilation +units at any optimization level. + + +NEON assembler +-------------- +NEON assembler is supported with no additional caveats as long as the rules +above are followed. + + +NEON code generated by GCC +-------------------------- +The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit +parallelism, and generates NEON code from ordinary C source code. This is fully +supported as long as the rules above are followed. + + +NEON intrinsics +--------------- +NEON intrinsics are also supported. However, as code using NEON intrinsics +relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should +observe the following in addition to the rules above: +* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC + uses its builtin version of <stdint.h> (this is a C99 header which the kernel + does not supply); +* Include <arm_neon.h> last, or at least after <linux/types.h> diff --git a/Documentation/devicetree/bindings/arm/l2cc.txt b/Documentation/devicetree/bindings/arm/l2cc.txt index 69ddf9fad2dc..c0c7626fd0ff 100644 --- a/Documentation/devicetree/bindings/arm/l2cc.txt +++ b/Documentation/devicetree/bindings/arm/l2cc.txt @@ -16,9 +16,11 @@ Required properties: performs the same operation). "marvell,"aurora-outer-cache: Marvell Controller designed to be compatible with the ARM one with outer cache mode. - "bcm,bcm11351-a2-pl310-cache": For Broadcom bcm11351 chipset where an + "brcm,bcm11351-a2-pl310-cache": For Broadcom bcm11351 chipset where an offset needs to be added to the address before passing down to the L2 cache controller + "bcm,bcm11351-a2-pl310-cache": DEPRECATED by + "brcm,bcm11351-a2-pl310-cache" - cache-unified : Specifies the cache is a unified cache. - cache-level : Should be set to 2 for a level 2 cache. - reg : Physical base address and size of cache controller's memory mapped |