summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt2
-rw-r--r--Documentation/admin-guide/perf/arm-cmn.rst65
-rw-r--r--Documentation/admin-guide/perf/index.rst1
-rw-r--r--Documentation/arm64/cpu-feature-registers.rst2
-rw-r--r--Documentation/arm64/elf_hwcaps.rst4
-rw-r--r--Documentation/arm64/index.rst1
-rw-r--r--Documentation/arm64/memory-tagging-extension.rst305
-rw-r--r--Documentation/devicetree/bindings/edac/amazon,al-mc-edac.yaml67
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/actions,owl-sirq.yaml65
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/mstar,mst-intc.yaml64
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/snps,dw-apb-ictl.txt14
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/ti,pruss-intc.yaml158
-rw-r--r--Documentation/devicetree/bindings/perf/arm,cmn.yaml57
-rw-r--r--Documentation/devicetree/bindings/timer/renesas,cmt.yaml4
-rw-r--r--Documentation/devicetree/bindings/trivial-devices.yaml2
-rw-r--r--Documentation/virt/kvm/arm/hyp-abi.rst6
-rw-r--r--Documentation/x86/boot.rst6
-rw-r--r--Documentation/x86/cpuinfo.rst155
-rw-r--r--Documentation/x86/index.rst2
-rw-r--r--Documentation/x86/resctrl_ui.rst18
-rw-r--r--Documentation/x86/sva.rst257
21 files changed, 1245 insertions, 10 deletions
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a1068742a6df..ffe864390c5a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -577,7 +577,7 @@
loops can be debugged more effectively on production
systems.
- clearcpuid=BITNUM [X86]
+ clearcpuid=BITNUM[,BITNUM...] [X86]
Disable CPUID feature X for the kernel. See
arch/x86/include/asm/cpufeatures.h for the valid bit
numbers. Note the Linux specific bits are not necessarily
diff --git a/Documentation/admin-guide/perf/arm-cmn.rst b/Documentation/admin-guide/perf/arm-cmn.rst
new file mode 100644
index 000000000000..0e4809346014
--- /dev/null
+++ b/Documentation/admin-guide/perf/arm-cmn.rst
@@ -0,0 +1,65 @@
+=============================
+Arm Coherent Mesh Network PMU
+=============================
+
+CMN-600 is a configurable mesh interconnect consisting of a rectangular
+grid of crosspoints (XPs), with each crosspoint supporting up to two
+device ports to which various AMBA CHI agents are attached.
+
+CMN implements a distributed PMU design as part of its debug and trace
+functionality. This consists of a local monitor (DTM) at every XP, which
+counts up to 4 event signals from the connected device nodes and/or the
+XP itself. Overflow from these local counters is accumulated in up to 8
+global counters implemented by the main controller (DTC), which provides
+overall PMU control and interrupts for global counter overflow.
+
+PMU events
+----------
+
+The PMU driver registers a single PMU device for the whole interconnect,
+see /sys/bus/event_source/devices/arm_cmn. Multi-chip systems may link
+more than one CMN together via external CCIX links - in this situation,
+each mesh counts its own events entirely independently, and additional
+PMU devices will be named arm_cmn_{1..n}.
+
+Most events are specified in a format based directly on the TRM
+definitions - "type" selects the respective node type, and "eventid" the
+event number. Some events require an additional occupancy ID, which is
+specified by "occupid".
+
+* Since RN-D nodes do not have any distinct events from RN-I nodes, they
+ are treated as the same type (0xa), and the common event templates are
+ named "rnid_*".
+
+* The cycle counter is treated as a synthetic event belonging to the DTC
+ node ("type" == 0x3, "eventid" is ignored).
+
+* XP events also encode the port and channel in the "eventid" field, to
+ match the underlying pmu_event0_id encoding for the pmu_event_sel
+ register. The event templates are named with prefixes to cover all
+ permutations.
+
+By default each event provides an aggregate count over all nodes of the
+given type. To target a specific node, "bynodeid" must be set to 1 and
+"nodeid" to the appropriate value derived from the CMN configuration
+(as defined in the "Node ID Mapping" section of the TRM).
+
+Watchpoints
+-----------
+
+The PMU can also count watchpoint events to monitor specific flit
+traffic. Watchpoints are treated as a synthetic event type, and like PMU
+events can be global or targeted with a particular XP's "nodeid" value.
+Since the watchpoint direction is otherwise implicit in the underlying
+register selection, separate events are provided for flit uploads and
+downloads.
+
+The flit match value and mask are passed in config1 and config2 ("val"
+and "mask" respectively). "wp_dev_sel", "wp_chn_sel", "wp_grp" and
+"wp_exclusive" are specified per the TRM definitions for dtm_wp_config0.
+Where a watchpoint needs to match fields from both match groups on the
+REQ or SNP channel, it can be specified as two events - one for each
+group - with the same nonzero "combine" value. The count for such a
+pair of combined events will be attributed to the primary match.
+Watchpoint events with a "combine" value of 0 are considered independent
+and will count individually.
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index 47c99f40cc16..5a8f2529a033 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -12,6 +12,7 @@ Performance monitor support
qcom_l2_pmu
qcom_l3_pmu
arm-ccn
+ arm-cmn
xgene-pmu
arm_dsu_pmu
thunderx2-pmu
diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
index f28853f80089..328e0c454fbd 100644
--- a/Documentation/arm64/cpu-feature-registers.rst
+++ b/Documentation/arm64/cpu-feature-registers.rst
@@ -175,6 +175,8 @@ infrastructure:
+------------------------------+---------+---------+
| Name | bits | visible |
+------------------------------+---------+---------+
+ | MTE | [11-8] | y |
+ +------------------------------+---------+---------+
| SSBS | [7-4] | y |
+------------------------------+---------+---------+
| BT | [3-0] | y |
diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst
index 84a9fd2d41b4..bbd9cf54db6c 100644
--- a/Documentation/arm64/elf_hwcaps.rst
+++ b/Documentation/arm64/elf_hwcaps.rst
@@ -240,6 +240,10 @@ HWCAP2_BTI
Functionality implied by ID_AA64PFR0_EL1.BT == 0b0001.
+HWCAP2_MTE
+
+ Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described
+ by Documentation/arm64/memory-tagging-extension.rst.
4. Unused AT_HWCAP bits
-----------------------
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index d9665d83c53a..43b0939d384e 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -14,6 +14,7 @@ ARM64 Architecture
hugetlbpage
legacy_instructions
memory
+ memory-tagging-extension
perf
pointer-authentication
silicon-errata
diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst
new file mode 100644
index 000000000000..034d37c605e8
--- /dev/null
+++ b/Documentation/arm64/memory-tagging-extension.rst
@@ -0,0 +1,305 @@
+===============================================
+Memory Tagging Extension (MTE) in AArch64 Linux
+===============================================
+
+Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
+ Catalin Marinas <catalin.marinas@arm.com>
+
+Date: 2020-02-25
+
+This document describes the provision of the Memory Tagging Extension
+functionality in AArch64 Linux.
+
+Introduction
+============
+
+ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)
+feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI
+(Top Byte Ignore) feature and allows software to access a 4-bit
+allocation tag for each 16-byte granule in the physical address space.
+Such memory range must be mapped with the Normal-Tagged memory
+attribute. A logical tag is derived from bits 59-56 of the virtual
+address used for the memory access. A CPU with MTE enabled will compare
+the logical tag against the allocation tag and potentially raise an
+exception on mismatch, subject to system registers configuration.
+
+Userspace Support
+=================
+
+When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
+supported by the hardware, the kernel advertises the feature to
+userspace via ``HWCAP2_MTE``.
+
+PROT_MTE
+--------
+
+To access the allocation tags, a user process must enable the Tagged
+memory attribute on an address range using a new ``prot`` flag for
+``mmap()`` and ``mprotect()``:
+
+``PROT_MTE`` - Pages allow access to the MTE allocation tags.
+
+The allocation tag is set to 0 when such pages are first mapped in the
+user address space and preserved on copy-on-write. ``MAP_SHARED`` is
+supported and the allocation tags can be shared between processes.
+
+**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
+RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
+types of mapping will result in ``-EINVAL`` returned by these system
+calls.
+
+**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
+be cleared by ``mprotect()``.
+
+**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and
+``MADV_FREE`` may have the allocation tags cleared (set to 0) at any
+point after the system call.
+
+Tag Check Faults
+----------------
+
+When ``PROT_MTE`` is enabled on an address range and a mismatch between
+the logical and allocation tags occurs on access, there are three
+configurable behaviours:
+
+- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
+ tag check fault.
+
+- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
+ ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
+ memory access is not performed. If ``SIGSEGV`` is ignored or blocked
+ by the offending thread, the containing process is terminated with a
+ ``coredump``.
+
+- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending
+ thread, asynchronously following one or multiple tag check faults,
+ with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting
+ address is unknown).
+
+The user can select the above modes, per thread, using the
+``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
+``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
+bit-field:
+
+- ``PR_MTE_TCF_NONE`` - *Ignore* tag check faults
+- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode
+- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
+
+The current tag check fault mode can be read using the
+``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
+
+Tag checking can also be disabled for a user thread by setting the
+``PSTATE.TCO`` bit with ``MSR TCO, #1``.
+
+**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
+irrespective of the interrupted context. ``PSTATE.TCO`` is restored on
+``sigreturn()``.
+
+**Note**: There are no *match-all* logical tags available for user
+applications.
+
+**Note**: Kernel accesses to the user address space (e.g. ``read()``
+system call) are not checked if the user thread tag checking mode is
+``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is
+``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user
+address accesses, however it cannot always guarantee it.
+
+Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
+-----------------------------------------------------------------
+
+The architecture allows excluding certain tags to be randomly generated
+via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux
+excludes all tags other than 0. A user thread can enable specific tags
+in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
+flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
+in the ``PR_MTE_TAG_MASK`` bit-field.
+
+**Note**: The hardware uses an exclude mask but the ``prctl()``
+interface provides an include mask. An include mask of ``0`` (exclusion
+mask ``0xffff``) results in the CPU always generating tag ``0``.
+
+Initial process state
+---------------------
+
+On ``execve()``, the new process has the following configuration:
+
+- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
+- Tag checking mode set to ``PR_MTE_TCF_NONE``
+- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
+- ``PSTATE.TCO`` set to 0
+- ``PROT_MTE`` not set on any of the initial memory maps
+
+On ``fork()``, the new process inherits the parent's configuration and
+memory map attributes with the exception of the ``madvise()`` ranges
+with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set
+to 0).
+
+The ``ptrace()`` interface
+--------------------------
+
+``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
+the tags from or set the tags to a tracee's address space. The
+``ptrace()`` system call is invoked as ``ptrace(request, pid, addr,
+data)`` where:
+
+- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``.
+- ``pid`` - the tracee's PID.
+- ``addr`` - address in the tracee's address space.
+- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
+ a buffer of ``iov_len`` length in the tracer's address space.
+
+The tags in the tracer's ``iov_base`` buffer are represented as one
+4-bit tag per byte and correspond to a 16-byte MTE tag granule in the
+tracee's address space.
+
+**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel
+will use the corresponding aligned address.
+
+``ptrace()`` return value:
+
+- 0 - tags were copied, the tracer's ``iov_len`` was updated to the
+ number of tags transferred. This may be smaller than the requested
+ ``iov_len`` if the requested address range in the tracee's or the
+ tracer's space cannot be accessed or does not have valid tags.
+- ``-EPERM`` - the specified process cannot be traced.
+- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid
+ address) and no tags copied. ``iov_len`` not updated.
+- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec``
+ or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated.
+- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never
+ mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated.
+
+**Note**: There are no transient errors for the requests above, so user
+programs should not retry in case of a non-zero system call return.
+
+``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr ==
+``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged
+address ABI control and MTE configuration of a process as per the
+``prctl()`` options described in
+Documentation/arm64/tagged-address-abi.rst and above. The corresponding
+``regset`` is 1 element of 8 bytes (``sizeof(long))``).
+
+Example of correct usage
+========================
+
+*MTE Example code*
+
+.. code-block:: c
+
+ /*
+ * To be compiled with -march=armv8.5-a+memtag
+ */
+ #include <errno.h>
+ #include <stdint.h>
+ #include <stdio.h>
+ #include <stdlib.h>
+ #include <unistd.h>
+ #include <sys/auxv.h>
+ #include <sys/mman.h>
+ #include <sys/prctl.h>
+
+ /*
+ * From arch/arm64/include/uapi/asm/hwcap.h
+ */
+ #define HWCAP2_MTE (1 << 18)
+
+ /*
+ * From arch/arm64/include/uapi/asm/mman.h
+ */
+ #define PROT_MTE 0x20
+
+ /*
+ * From include/uapi/linux/prctl.h
+ */
+ #define PR_SET_TAGGED_ADDR_CTRL 55
+ #define PR_GET_TAGGED_ADDR_CTRL 56
+ # define PR_TAGGED_ADDR_ENABLE (1UL << 0)
+ # define PR_MTE_TCF_SHIFT 1
+ # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT)
+ # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT)
+ # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT)
+ # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT)
+ # define PR_MTE_TAG_SHIFT 3
+ # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT)
+
+ /*
+ * Insert a random logical tag into the given pointer.
+ */
+ #define insert_random_tag(ptr) ({ \
+ uint64_t __val; \
+ asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \
+ __val; \
+ })
+
+ /*
+ * Set the allocation tag on the destination address.
+ */
+ #define set_tag(tagged_addr) do { \
+ asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
+ } while (0)
+
+ int main()
+ {
+ unsigned char *a;
+ unsigned long page_sz = sysconf(_SC_PAGESIZE);
+ unsigned long hwcap2 = getauxval(AT_HWCAP2);
+
+ /* check if MTE is present */
+ if (!(hwcap2 & HWCAP2_MTE))
+ return EXIT_FAILURE;
+
+ /*
+ * Enable the tagged address ABI, synchronous MTE tag check faults and
+ * allow all non-zero tags in the randomly generated set.
+ */
+ if (prctl(PR_SET_TAGGED_ADDR_CTRL,
+ PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | (0xfffe << PR_MTE_TAG_SHIFT),
+ 0, 0, 0)) {
+ perror("prctl() failed");
+ return EXIT_FAILURE;
+ }
+
+ a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (a == MAP_FAILED) {
+ perror("mmap() failed");
+ return EXIT_FAILURE;
+ }
+
+ /*
+ * Enable MTE on the above anonymous mmap. The flag could be passed
+ * directly to mmap() and skip this step.
+ */
+ if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
+ perror("mprotect() failed");
+ return EXIT_FAILURE;
+ }
+
+ /* access with the default tag (0) */
+ a[0] = 1;
+ a[1] = 2;
+
+ printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
+
+ /* set the logical and allocation tags */
+ a = (unsigned char *)insert_random_tag(a);
+ set_tag(a);
+
+ printf("%p\n", a);
+
+ /* non-zero tag access */
+ a[0] = 3;
+ printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
+
+ /*
+ * If MTE is enabled correctly the next instruction will generate an
+ * exception.
+ */
+ printf("Expecting SIGSEGV...\n");
+ a[16] = 0xdd;
+
+ /* this should not be printed in the PR_MTE_TCF_SYNC mode */
+ printf("...haven't got one\n");
+
+ return EXIT_FAILURE;
+ }
diff --git a/Documentation/devicetree/bindings/edac/amazon,al-mc-edac.yaml b/Documentation/devicetree/bindings/edac/amazon,al-mc-edac.yaml
new file mode 100644
index 000000000000..a25387df0865
--- /dev/null
+++ b/Documentation/devicetree/bindings/edac/amazon,al-mc-edac.yaml
@@ -0,0 +1,67 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/edac/amazon,al-mc-edac.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Amazon's Annapurna Labs Memory Controller EDAC
+
+maintainers:
+ - Talel Shenhar <talel@amazon.com>
+ - Talel Shenhar <talelshenhar@gmail.com>
+
+description: |
+ EDAC node is defined to describe on-chip error detection and correction for
+ Amazon's Annapurna Labs Memory Controller.
+
+properties:
+
+ compatible:
+ const: amazon,al-mc-edac
+
+ reg:
+ maxItems: 1
+
+ "#address-cells":
+ const: 2
+
+ "#size-cells":
+ const: 2
+
+ interrupts:
+ minItems: 1
+ maxItems: 2
+ items:
+ - description: uncorrectable error interrupt
+ - description: correctable error interrupt
+
+ interrupt-names:
+ minItems: 1
+ maxItems: 2
+ items:
+ - const: ue
+ - const: ce
+
+required:
+ - compatible
+ - reg
+ - "#address-cells"
+ - "#size-cells"
+
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/irq.h>
+ soc {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ edac@f0080000 {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ compatible = "amazon,al-mc-edac";
+ reg = <0x0 0xf0080000 0x0 0x00010000>;
+ interrupt-parent = <&amazon_al_system_fabric>;
+ interrupt-names = "ue";
+ interrupts = <20 IRQ_TYPE_LEVEL_HIGH>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/interrupt-controller/actions,owl-sirq.yaml b/Documentation/devicetree/bindings/interrupt-controller/actions,owl-sirq.yaml
new file mode 100644
index 000000000000..5da333c644c9
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/actions,owl-sirq.yaml
@@ -0,0 +1,65 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/interrupt-controller/actions,owl-sirq.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Actions Semi Owl SoCs SIRQ interrupt controller
+
+maintainers:
+ - Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
+ - Cristian Ciocaltea <cristian.ciocaltea@gmail.com>
+
+description: |
+ This interrupt controller is found in the Actions Semi Owl SoCs (S500, S700
+ and S900) and provides support for handling up to 3 external interrupt lines.
+
+properties:
+ compatible:
+ enum:
+ - actions,s500-sirq
+ - actions,s700-sirq
+ - actions,s900-sirq
+
+ reg:
+ maxItems: 1
+
+ interrupt-controller: true
+
+ '#interrupt-cells':
+ const: 2
+ description:
+ The first cell is the input IRQ number, between 0 and 2, while the second
+ cell is the trigger type as defined in interrupt.txt in this directory.
+
+ 'interrupts':
+ description: |
+ Contains the GIC SPI IRQs mapped to the external interrupt lines.
+ They shall be specified sequentially from output 0 to 2.
+ minItems: 3
+ maxItems: 3
+
+required:
+ - compatible
+ - reg
+ - interrupt-controller
+ - '#interrupt-cells'
+ - 'interrupts'
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+ sirq: interrupt-controller@b01b0200 {
+ compatible = "actions,s500-sirq";
+ reg = <0xb01b0200 0x4>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
+ interrupts = <GIC_SPI 13 IRQ_TYPE_LEVEL_HIGH>, /* SIRQ0 */
+ <GIC_SPI 14 IRQ_TYPE_LEVEL_HIGH>, /* SIRQ1 */
+ <GIC_SPI 15 IRQ_TYPE_LEVEL_HIGH>; /* SIRQ2 */
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/interrupt-controller/mstar,mst-intc.yaml b/Documentation/devicetree/bindings/interrupt-controller/mstar,mst-intc.yaml
new file mode 100644
index 000000000000..bbf0f26cd008
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/mstar,mst-intc.yaml
@@ -0,0 +1,64 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/interrupt-controller/mstar,mst-intc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: MStar Interrupt Controller
+
+maintainers:
+ - Mark-PK Tsai <mark-pk.tsai@mediatek.com>
+
+description: |+
+ MStar, SigmaStar and Mediatek TV SoCs contain multiple legacy
+ interrupt controllers that routes interrupts to the GIC.
+
+ The HW block exposes a number of interrupt controllers, each
+ can support up to 64 interrupts.
+
+properties:
+ compatible:
+ const: mstar,mst-intc
+
+ interrupt-controller: true
+
+ "#interrupt-cells":
+ const: 3
+ description: |
+ Use the same format as specified by GIC in arm,gic.yaml.
+
+ reg:
+ maxItems: 1
+
+ mstar,irqs-map-range:
+ description: |
+ The range <start, end> of parent interrupt controller's interrupt
+ lines that are hardwired to mstar interrupt controller.
+ $ref: /schemas/types.yaml#/definitions/uint32-matrix
+ items:
+ minItems: 2
+ maxItems: 2
+
+ mstar,intc-no-eoi:
+ description:
+ Mark this controller has no End Of Interrupt(EOI) implementation.
+ type: boolean
+
+required:
+ - compatible
+ - reg
+ - mstar,irqs-map-range
+
+additionalProperties: false
+
+examples:
+ - |
+ mst_intc0: interrupt-controller@1f2032d0 {
+ compatible = "mstar,mst-intc";
+ interrupt-controller;
+ #interrupt-cells = <3>;
+ interrupt-parent = <&gic>;
+ reg = <0x1f2032d0 0x30>;
+ mstar,irqs-map-range = <0 63>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/interrupt-controller/snps,dw-apb-ictl.txt b/Documentation/devicetree/bindings/interrupt-controller/snps,dw-apb-ictl.txt
index 086ff08322db..2db59df9408f 100644
--- a/Documentation/devicetree/bindings/interrupt-controller/snps,dw-apb-ictl.txt
+++ b/Documentation/devicetree/bindings/interrupt-controller/snps,dw-apb-ictl.txt
@@ -2,7 +2,8 @@ Synopsys DesignWare APB interrupt controller (dw_apb_ictl)
Synopsys DesignWare provides interrupt controller IP for APB known as
dw_apb_ictl. The IP is used as secondary interrupt controller in some SoCs with
-APB bus, e.g. Marvell Armada 1500.
+APB bus, e.g. Marvell Armada 1500. It can also be used as primary interrupt
+controller in some SoCs, e.g. Hisilicon SD5203.
Required properties:
- compatible: shall be "snps,dw-apb-ictl"
@@ -10,6 +11,8 @@ Required properties:
region starting with ENABLE_LOW register
- interrupt-controller: identifies the node as an interrupt controller
- #interrupt-cells: number of cells to encode an interrupt-specifier, shall be 1
+
+Additional required property when it's used as secondary interrupt controller:
- interrupts: interrupt reference to primary interrupt controller
The interrupt sources map to the corresponding bits in the interrupt
@@ -21,6 +24,7 @@ registers, i.e.
- (optional) fast interrupts start at 64.
Example:
+ /* dw_apb_ictl is used as secondary interrupt controller */
aic: interrupt-controller@3000 {
compatible = "snps,dw-apb-ictl";
reg = <0x3000 0xc00>;
@@ -29,3 +33,11 @@ Example:
interrupt-parent = <&gic>;
interrupts = <GIC_SPI 3 IRQ_TYPE_LEVEL_HIGH>;
};
+
+ /* dw_apb_ictl is used as primary interrupt controller */
+ vic: interrupt-controller@10130000 {
+ compatible = "snps,dw-apb-ictl";
+ reg = <0x10130000 0x1000>;
+ interrupt-controller;
+ #interrupt-cells = <1>;
+ };
diff --git a/Documentation/devicetree/bindings/interrupt-controller/ti,pruss-intc.yaml b/Documentation/devicetree/bindings/interrupt-controller/ti,pruss-intc.yaml
new file mode 100644
index 000000000000..bbf79d125675
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/ti,pruss-intc.yaml
@@ -0,0 +1,158 @@
+# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/interrupt-controller/ti,pruss-intc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: TI PRU-ICSS Local Interrupt Controller
+
+maintainers:
+ - Suman Anna <s-anna@ti.com>
+
+description: |
+ Each PRU-ICSS has a single interrupt controller instance that is common
+ to all the PRU cores. Most interrupt controllers can route 64 input events
+ which are then mapped to 10 possible output interrupts through two levels
+ of mapping. The input events can be triggered by either the PRUs and/or
+ various other PRUSS internal and external peripherals. The first 2 output
+ interrupts (0, 1) are fed exclusively to the internal PRU cores, with the
+ remaining 8 (2 through 9) connected to external interrupt controllers
+ including the MPU and/or other PRUSS instances, DSPs or devices.
+
+ The property "ti,irqs-reserved" is used for denoting the connection
+ differences on the output interrupts 2 through 9. If this property is not
+ defined, it implies that all the PRUSS INTC output interrupts 2 through 9
+ (host_intr0 through host_intr7) are connected exclusively to the Arm interrupt
+ controller.
+
+ The K3 family of SoCs can handle 160 input events that can be mapped to 20
+ different possible output interrupts. The additional output interrupts (10
+ through 19) are connected to new sub-modules within the ICSSG instances.
+
+ This interrupt-controller node should be defined as a child node of the
+ corresponding PRUSS node. The node should be named "interrupt-controller".
+
+properties:
+ compatible:
+ enum:
+ - ti,pruss-intc
+ - ti,icssg-intc
+ description: |
+ Use "ti,pruss-intc" for OMAP-L13x/AM18x/DA850 SoCs,
+ AM335x family of SoCs,
+ AM437x family of SoCs,
+ AM57xx family of SoCs
+ 66AK2G family of SoCs
+ Use "ti,icssg-intc" for K3 AM65x & J721E family of SoCs
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ minItems: 1
+ maxItems: 8
+ description: |
+ All the interrupts generated towards the main host processor in the SoC.
+ A shared interrupt can be skipped if the desired destination and usage is
+ by a different processor/device.
+
+ interrupt-names:
+ minItems: 1
+ maxItems: 8
+ items:
+ pattern: host_intr[0-7]
+ description: |
+ Should use one of the above names for each valid host event interrupt
+ connected to Arm interrupt controller, the name should match the
+ corresponding host event interrupt number.
+
+ interrupt-controller: true
+
+ "#interrupt-cells":
+ const: 3
+ description: |
+ Client users shall use the PRU System event number (the interrupt source
+ that the client is interested in) [cell 1], PRU channel [cell 2] and PRU
+ host_event (target) [cell 3] as the value of the interrupts property in
+ their node. The system events can be mapped to some output host
+ interrupts through 2 levels of many-to-one mapping i.e. events to channel
+ mapping and channels to host interrupts so through this property entire
+ mapping is provided.
+
+ ti,irqs-reserved:
+ $ref: /schemas/types.yaml#definitions/uint8
+ description: |
+ Bitmask of host interrupts between 0 and 7 (corresponding to PRUSS INTC
+ output interrupts 2 through 9) that are not connected to the Arm interrupt
+ controller or are shared and used by other devices or processors in the
+ SoC. Define this property when any of 8 interrupts should not be handled
+ by Arm interrupt controller.
+ Eg: - AM437x and 66AK2G SoCs do not have "host_intr5" interrupt
+ connected to MPU
+ - AM65x and J721E SoCs have "host_intr5", "host_intr6" and
+ "host_intr7" interrupts connected to MPU, and other ICSSG
+ instances.
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - interrupt-names
+ - interrupt-controller
+ - "#interrupt-cells"
+
+additionalProperties: false
+
+examples:
+ - |
+ /* AM33xx PRU-ICSS */
+ pruss: pruss@0 {
+ compatible = "ti,am3356-pruss";
+ reg = <0x0 0x80000>;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges;
+
+ pruss_intc: interrupt-controller@20000 {
+ compatible = "ti,pruss-intc";
+ reg = <0x20000 0x2000>;
+ interrupts = <20 21 22 23 24 25 26 27>;
+ interrupt-names = "host_intr0", "host_intr1",
+ "host_intr2", "host_intr3",
+ "host_intr4", "host_intr5",
+ "host_intr6", "host_intr7";
+ interrupt-controller;
+ #interrupt-cells = <3>;
+ };
+ };
+
+ - |
+
+ /* AM4376 PRU-ICSS */
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ pruss@0 {
+ compatible = "ti,am4376-pruss";
+ reg = <0x0 0x40000>;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges;
+
+ interrupt-controller@20000 {
+ compatible = "ti,pruss-intc";
+ reg = <0x20000 0x2000>;
+ interrupt-controller;
+ #interrupt-cells = <3>;
+ interrupts = <GIC_SPI 20 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 21 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 22 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 23 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 24 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 26 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 27 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "host_intr0", "host_intr1",
+ "host_intr2", "host_intr3",
+ "host_intr4",
+ "host_intr6", "host_intr7";
+ ti,irqs-reserved = /bits/ 8 <0x20>; /* BIT(5) */
+ };
+ };
diff --git a/Documentation/devicetree/bindings/perf/arm,cmn.yaml b/Documentation/devicetree/bindings/perf/arm,cmn.yaml
new file mode 100644
index 000000000000..e4fcc0de25e2
--- /dev/null
+++ b/Documentation/devicetree/bindings/perf/arm,cmn.yaml
@@ -0,0 +1,57 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+# Copyright 2020 Arm Ltd.
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/perf/arm,cmn.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Arm CMN (Coherent Mesh Network) Performance Monitors
+
+maintainers:
+ - Robin Murphy <robin.murphy@arm.com>
+
+properties:
+ compatible:
+ const: arm,cmn-600
+
+ reg:
+ items:
+ - description: Physical address of the base (PERIPHBASE) and
+ size (up to 64MB) of the configuration address space.
+
+ interrupts:
+ minItems: 1
+ maxItems: 4
+ items:
+ - description: Overflow interrupt for DTC0
+ - description: Overflow interrupt for DTC1
+ - description: Overflow interrupt for DTC2
+ - description: Overflow interrupt for DTC3
+ description: One interrupt for each DTC domain implemented must
+ be specified, in order. DTC0 is always present.
+
+ arm,root-node:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description: Offset from PERIPHBASE of the configuration
+ discovery node (see TRM definition of ROOTNODEBASE).
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - arm,root-node
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/interrupt-controller/irq.h>
+ pmu@50000000 {
+ compatible = "arm,cmn-600";
+ reg = <0x50000000 0x4000000>;
+ /* 4x2 mesh with one DTC, and CFG node at 0,1,1,0 */
+ interrupts = <GIC_SPI 46 IRQ_TYPE_LEVEL_HIGH>;
+ arm,root-node = <0x104000>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/timer/renesas,cmt.yaml b/Documentation/devicetree/bindings/timer/renesas,cmt.yaml
index 7e4dc5623da8..428db3a21bb9 100644
--- a/Documentation/devicetree/bindings/timer/renesas,cmt.yaml
+++ b/Documentation/devicetree/bindings/timer/renesas,cmt.yaml
@@ -39,6 +39,7 @@ properties:
- items:
- enum:
- renesas,r8a73a4-cmt0 # 32-bit CMT0 on R-Mobile APE6
+ - renesas,r8a7742-cmt0 # 32-bit CMT0 on RZ/G1H
- renesas,r8a7743-cmt0 # 32-bit CMT0 on RZ/G1M
- renesas,r8a7744-cmt0 # 32-bit CMT0 on RZ/G1N
- renesas,r8a7745-cmt0 # 32-bit CMT0 on RZ/G1E
@@ -53,6 +54,7 @@ properties:
- items:
- enum:
- renesas,r8a73a4-cmt1 # 48-bit CMT1 on R-Mobile APE6
+ - renesas,r8a7742-cmt1 # 48-bit CMT1 on RZ/G1H
- renesas,r8a7743-cmt1 # 48-bit CMT1 on RZ/G1M
- renesas,r8a7744-cmt1 # 48-bit CMT1 on RZ/G1N
- renesas,r8a7745-cmt1 # 48-bit CMT1 on RZ/G1E
@@ -69,6 +71,7 @@ properties:
- renesas,r8a774a1-cmt0 # 32-bit CMT0 on RZ/G2M
- renesas,r8a774b1-cmt0 # 32-bit CMT0 on RZ/G2N
- renesas,r8a774c0-cmt0 # 32-bit CMT0 on RZ/G2E
+ - renesas,r8a774e1-cmt0 # 32-bit CMT0 on RZ/G2H
- renesas,r8a7795-cmt0 # 32-bit CMT0 on R-Car H3
- renesas,r8a7796-cmt0 # 32-bit CMT0 on R-Car M3-W
- renesas,r8a77965-cmt0 # 32-bit CMT0 on R-Car M3-N
@@ -83,6 +86,7 @@ properties:
- renesas,r8a774a1-cmt1 # 48-bit CMT on RZ/G2M
- renesas,r8a774b1-cmt1 # 48-bit CMT on RZ/G2N
- renesas,r8a774c0-cmt1 # 48-bit CMT on RZ/G2E
+ - renesas,r8a774e1-cmt1 # 48-bit CMT on RZ/G2H
- renesas,r8a7795-cmt1 # 48-bit CMT on R-Car H3
- renesas,r8a7796-cmt1 # 48-bit CMT on R-Car M3-W
- renesas,r8a77965-cmt1 # 48-bit CMT on R-Car M3-N
diff --git a/Documentation/devicetree/bindings/trivial-devices.yaml b/Documentation/devicetree/bindings/trivial-devices.yaml
index 4ace8039840a..25c4239ebbfb 100644
--- a/Documentation/devicetree/bindings/trivial-devices.yaml
+++ b/Documentation/devicetree/bindings/trivial-devices.yaml
@@ -326,6 +326,8 @@ properties:
- silabs,si7020
# Skyworks SKY81452: Six-Channel White LED Driver with Touch Panel Bias Supply
- skyworks,sky81452
+ # Socionext SynQuacer TPM MMIO module
+ - socionext,synquacer-tpm-mmio
# i2c serial eeprom (24cxx)
- st,24c256
# Ambient Light Sensor with SMBUS/Two Wire Serial Interface
diff --git a/Documentation/virt/kvm/arm/hyp-abi.rst b/Documentation/virt/kvm/arm/hyp-abi.rst
index d9eba93aa364..83cadd8186fa 100644
--- a/Documentation/virt/kvm/arm/hyp-abi.rst
+++ b/Documentation/virt/kvm/arm/hyp-abi.rst
@@ -54,9 +54,9 @@ these functions (see arch/arm{,64}/include/asm/virt.h):
x3 = x1's value when entering the next payload (arm64)
x4 = x2's value when entering the next payload (arm64)
- Mask all exceptions, disable the MMU, move the arguments into place
- (arm64 only), and jump to the restart address while at HYP/EL2. This
- hypercall is not expected to return to its caller.
+ Mask all exceptions, disable the MMU, clear I+D bits, move the arguments
+ into place (arm64 only), and jump to the restart address while at HYP/EL2.
+ This hypercall is not expected to return to its caller.
Any other value of r0/x0 triggers a hypervisor-specific handling,
which is not documented here.
diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst
index 7fafc7ac00d7..abb9fc164657 100644
--- a/Documentation/x86/boot.rst
+++ b/Documentation/x86/boot.rst
@@ -1342,8 +1342,8 @@ follow::
In addition to read/modify/write the setup header of the struct
boot_params as that of 16-bit boot protocol, the boot loader should
-also fill the additional fields of the struct boot_params as that
-described in zero-page.txt.
+also fill the additional fields of the struct boot_params as
+described in chapter :doc:`zero-page`.
After setting up the struct boot_params, the boot loader can load the
32/64-bit kernel in the same way as that of 16-bit boot protocol.
@@ -1379,7 +1379,7 @@ can be calculated as follows::
In addition to read/modify/write the setup header of the struct
boot_params as that of 16-bit boot protocol, the boot loader should
also fill the additional fields of the struct boot_params as described
-in zero-page.txt.
+in chapter :doc:`zero-page`.
After setting up the struct boot_params, the boot loader can load
64-bit kernel in the same way as that of 16-bit boot protocol, but
diff --git a/Documentation/x86/cpuinfo.rst b/Documentation/x86/cpuinfo.rst
new file mode 100644
index 000000000000..5d54c39a063f
--- /dev/null
+++ b/Documentation/x86/cpuinfo.rst
@@ -0,0 +1,155 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+x86 Feature Flags
+=================
+
+Introduction
+============
+
+On x86, flags appearing in /proc/cpuinfo have an X86_FEATURE definition
+in arch/x86/include/asm/cpufeatures.h. If the kernel cares about a feature
+or KVM want to expose the feature to a KVM guest, it can and should have
+an X86_FEATURE_* defined. These flags represent hardware features as
+well as software features.
+
+If users want to know if a feature is available on a given system, they
+try to find the flag in /proc/cpuinfo. If a given flag is present, it
+means that the kernel supports it and is currently making it available.
+If such flag represents a hardware feature, it also means that the
+hardware supports it.
+
+If the expected flag does not appear in /proc/cpuinfo, things are murkier.
+Users need to find out the reason why the flag is missing and find the way
+how to enable it, which is not always easy. There are several factors that
+can explain missing flags: the expected feature failed to enable, the feature
+is missing in hardware, platform firmware did not enable it, the feature is
+disabled at build or run time, an old kernel is in use, or the kernel does
+not support the feature and thus has not enabled it. In general, /proc/cpuinfo
+shows features which the kernel supports. For a full list of CPUID flags
+which the CPU supports, use tools/arch/x86/kcpuid.
+
+How are feature flags created?
+==============================
+
+a: Feature flags can be derived from the contents of CPUID leaves.
+------------------------------------------------------------------
+These feature definitions are organized mirroring the layout of CPUID
+leaves and grouped in words with offsets as mapped in enum cpuid_leafs
+in cpufeatures.h (see arch/x86/include/asm/cpufeatures.h for details).
+If a feature is defined with a X86_FEATURE_<name> definition in
+cpufeatures.h, and if it is detected at run time, the flags will be
+displayed accordingly in /proc/cpuinfo. For example, the flag "avx2"
+comes from X86_FEATURE_AVX2 in cpufeatures.h.
+
+b: Flags can be from scattered CPUID-based features.
+----------------------------------------------------
+Hardware features enumerated in sparsely populated CPUID leaves get
+software-defined values. Still, CPUID needs to be queried to determine
+if a given feature is present. This is done in init_scattered_cpuid_features().
+For instance, X86_FEATURE_CQM_LLC is defined as 11*32 + 0 and its presence is
+checked at runtime in the respective CPUID leaf [EAX=f, ECX=0] bit EDX[1].
+
+The intent of scattering CPUID leaves is to not bloat struct
+cpuinfo_x86.x86_capability[] unnecessarily. For instance, the CPUID leaf
+[EAX=7, ECX=0] has 30 features and is dense, but the CPUID leaf [EAX=7, EAX=1]
+has only one feature and would waste 31 bits of space in the x86_capability[]
+array. Since there is a struct cpuinfo_x86 for each possible CPU, the wasted
+memory is not trivial.
+
+c: Flags can be created synthetically under certain conditions for hardware features.
+-------------------------------------------------------------------------------------
+Examples of conditions include whether certain features are present in
+MSR_IA32_CORE_CAPS or specific CPU models are identified. If the needed
+conditions are met, the features are enabled by the set_cpu_cap or
+setup_force_cpu_cap macros. For example, if bit 5 is set in MSR_IA32_CORE_CAPS,
+the feature X86_FEATURE_SPLIT_LOCK_DETECT will be enabled and
+"split_lock_detect" will be displayed. The flag "ring3mwait" will be
+displayed only when running on INTEL_FAM6_XEON_PHI_[KNL|KNM] processors.
+
+d: Flags can represent purely software features.
+------------------------------------------------
+These flags do not represent hardware features. Instead, they represent a
+software feature implemented in the kernel. For example, Kernel Page Table
+Isolation is purely software feature and its feature flag X86_FEATURE_PTI is
+also defined in cpufeatures.h.
+
+Naming of Flags
+===============
+
+The script arch/x86/kernel/cpu/mkcapflags.sh processes the
+#define X86_FEATURE_<name> from cpufeatures.h and generates the
+x86_cap/bug_flags[] arrays in kernel/cpu/capflags.c. The names in the
+resulting x86_cap/bug_flags[] are used to populate /proc/cpuinfo. The naming
+of flags in the x86_cap/bug_flags[] are as follows:
+
+a: The name of the flag is from the string in X86_FEATURE_<name> by default.
+----------------------------------------------------------------------------
+By default, the flag <name> in /proc/cpuinfo is extracted from the respective
+X86_FEATURE_<name> in cpufeatures.h. For example, the flag "avx2" is from
+X86_FEATURE_AVX2.
+
+b: The naming can be overridden.
+--------------------------------
+If the comment on the line for the #define X86_FEATURE_* starts with a
+double-quote character (""), the string inside the double-quote characters
+will be the name of the flags. For example, the flag "sse4_1" comes from
+the comment "sse4_1" following the X86_FEATURE_XMM4_1 definition.
+
+There are situations in which overriding the displayed name of the flag is
+needed. For instance, /proc/cpuinfo is a userspace interface and must remain
+constant. If, for some reason, the naming of X86_FEATURE_<name> changes, one
+shall override the new naming with the name already used in /proc/cpuinfo.
+
+c: The naming override can be "", which means it will not appear in /proc/cpuinfo.
+----------------------------------------------------------------------------------
+The feature shall be omitted from /proc/cpuinfo if it does not make sense for
+the feature to be exposed to userspace. For example, X86_FEATURE_ALWAYS is
+defined in cpufeatures.h but that flag is an internal kernel feature used
+in the alternative runtime patching functionality. So, its name is overridden
+with "". Its flag will not appear in /proc/cpuinfo.
+
+Flags are missing when one or more of these happen
+==================================================
+
+a: The hardware does not enumerate support for it.
+--------------------------------------------------
+For example, when a new kernel is running on old hardware or the feature is
+not enabled by boot firmware. Even if the hardware is new, there might be a
+problem enabling the feature at run time, the flag will not be displayed.
+
+b: The kernel does not know about the flag.
+-------------------------------------------
+For example, when an old kernel is running on new hardware.
+
+c: The kernel disabled support for it at compile-time.
+------------------------------------------------------
+For example, if 5-level-paging is not enabled when building (i.e.,
+CONFIG_X86_5LEVEL is not selected) the flag "la57" will not show up [#f1]_.
+Even though the feature will still be detected via CPUID, the kernel disables
+it by clearing via setup_clear_cpu_cap(X86_FEATURE_LA57).
+
+d: The feature is disabled at boot-time.
+----------------------------------------
+A feature can be disabled either using a command-line parameter or because
+it failed to be enabled. The command-line parameter clearcpuid= can be used
+to disable features using the feature number as defined in
+/arch/x86/include/asm/cpufeatures.h. For instance, User Mode Instruction
+Protection can be disabled using clearcpuid=514. The number 514 is calculated
+from #define X86_FEATURE_UMIP (16*32 + 2).
+
+In addition, there exists a variety of custom command-line parameters that
+disable specific features. The list of parameters includes, but is not limited
+to, nofsgsbase, nosmap, and nosmep. 5-level paging can also be disabled using
+"no5lvl". SMAP and SMEP are disabled with the aforementioned parameters,
+respectively.
+
+e: The feature was known to be non-functional.
+----------------------------------------------
+The feature was known to be non-functional because a dependency was
+missing at runtime. For example, AVX flags will not show up if XSAVE feature
+is disabled since they depend on XSAVE feature. Another example would be broken
+CPUs and them missing microcode patches. Due to that, the kernel decides not to
+enable a feature.
+
+.. [#f1] 5-level paging uses linear address of 57 bits.
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 265d9e9a093b..740ee7f87898 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -9,6 +9,7 @@ x86-specific Documentation
:numbered:
boot
+ cpuinfo
topology
exception-tables
kernel-stacks
@@ -30,3 +31,4 @@ x86-specific Documentation
usb-legacy-support
i386/index
x86_64/index
+ sva
diff --git a/Documentation/x86/resctrl_ui.rst b/Documentation/x86/resctrl_ui.rst
index 5368cedfb530..e59b7b93a9b4 100644
--- a/Documentation/x86/resctrl_ui.rst
+++ b/Documentation/x86/resctrl_ui.rst
@@ -138,6 +138,18 @@ with respect to allocation:
non-linear. This field is purely informational
only.
+"thread_throttle_mode":
+ Indicator on Intel systems of how tasks running on threads
+ of a physical core are throttled in cases where they
+ request different memory bandwidth percentages:
+
+ "max":
+ the smallest percentage is applied
+ to all threads
+ "per-thread":
+ bandwidth percentages are directly applied to
+ the threads running on the core
+
If RDT monitoring is available there will be an "L3_MON" directory
with the following files:
@@ -364,8 +376,10 @@ to the next control step available on the hardware.
The bandwidth throttling is a core specific mechanism on some of Intel
SKUs. Using a high bandwidth and a low bandwidth setting on two threads
-sharing a core will result in both threads being throttled to use the
-low bandwidth. The fact that Memory bandwidth allocation(MBA) is a core
+sharing a core may result in both threads being throttled to use the
+low bandwidth (see "thread_throttle_mode").
+
+The fact that Memory bandwidth allocation(MBA) may be a core
specific mechanism where as memory bandwidth monitoring(MBM) is done at
the package level may lead to confusion when users try to apply control
via the MBA and then monitor the bandwidth to see if the controls are
diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
new file mode 100644
index 000000000000..076efd51ef1f
--- /dev/null
+++ b/Documentation/x86/sva.rst
@@ -0,0 +1,257 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================
+Shared Virtual Addressing (SVA) with ENQCMD
+===========================================
+
+Background
+==========
+
+Shared Virtual Addressing (SVA) allows the processor and device to use the
+same virtual addresses avoiding the need for software to translate virtual
+addresses to physical addresses. SVA is what PCIe calls Shared Virtual
+Memory (SVM).
+
+In addition to the convenience of using application virtual addresses
+by the device, it also doesn't require pinning pages for DMA.
+PCIe Address Translation Services (ATS) along with Page Request Interface
+(PRI) allow devices to function much the same way as the CPU handling
+application page-faults. For more information please refer to the PCIe
+specification Chapter 10: ATS Specification.
+
+Use of SVA requires IOMMU support in the platform. IOMMU is also
+required to support the PCIe features ATS and PRI. ATS allows devices
+to cache translations for virtual addresses. The IOMMU driver uses the
+mmu_notifier() support to keep the device TLB cache and the CPU cache in
+sync. When an ATS lookup fails for a virtual address, the device should
+use the PRI in order to request the virtual address to be paged into the
+CPU page tables. The device must use ATS again in order the fetch the
+translation before use.
+
+Shared Hardware Workqueues
+==========================
+
+Unlike Single Root I/O Virtualization (SR-IOV), Scalable IOV (SIOV) permits
+the use of Shared Work Queues (SWQ) by both applications and Virtual
+Machines (VM's). This allows better hardware utilization vs. hard
+partitioning resources that could result in under utilization. In order to
+allow the hardware to distinguish the context for which work is being
+executed in the hardware by SWQ interface, SIOV uses Process Address Space
+ID (PASID), which is a 20-bit number defined by the PCIe SIG.
+
+PASID value is encoded in all transactions from the device. This allows the
+IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
+Resource Identifier (RID) which is the Bus/Device/Function.
+
+
+ENQCMD
+======
+
+ENQCMD is a new instruction on Intel platforms that atomically submits a
+work descriptor to a device. The descriptor includes the operation to be
+performed, virtual addresses of all parameters, virtual address of a completion
+record, and the PASID (process address space ID) of the current process.
+
+ENQCMD works with non-posted semantics and carries a status back if the
+command was accepted by hardware. This allows the submitter to know if the
+submission needs to be retried or other device specific mechanisms to
+implement fairness or ensure forward progress should be provided.
+
+ENQCMD is the glue that ensures applications can directly submit commands
+to the hardware and also permits hardware to be aware of application context
+to perform I/O operations via use of PASID.
+
+Process Address Space Tagging
+=============================
+
+A new thread-scoped MSR (IA32_PASID) provides the connection between
+user processes and the rest of the hardware. When an application first
+accesses an SVA-capable device, this MSR is initialized with a newly
+allocated PASID. The driver for the device calls an IOMMU-specific API
+that sets up the routing for DMA and page-requests.
+
+For example, the Intel Data Streaming Accelerator (DSA) uses
+iommu_sva_bind_device(), which will do the following:
+
+- Allocate the PASID, and program the process page-table (%cr3 register) in the
+ PASID context entries.
+- Register for mmu_notifier() to track any page-table invalidations to keep
+ the device TLB in sync. For example, when a page-table entry is invalidated,
+ the IOMMU propagates the invalidation to the device TLB. This will force any
+ future access by the device to this virtual address to participate in
+ ATS. If the IOMMU responds with proper response that a page is not
+ present, the device would request the page to be paged in via the PCIe PRI
+ protocol before performing I/O.
+
+This MSR is managed with the XSAVE feature set as "supervisor state" to
+ensure the MSR is updated during context switch.
+
+PASID Management
+================
+
+The kernel must allocate a PASID on behalf of each process which will use
+ENQCMD and program it into the new MSR to communicate the process identity to
+platform hardware. ENQCMD uses the PASID stored in this MSR to tag requests
+from this process. When a user submits a work descriptor to a device using the
+ENQCMD instruction, the PASID field in the descriptor is auto-filled with the
+value from MSR_IA32_PASID. Requests for DMA from the device are also tagged
+with the same PASID. The platform IOMMU uses the PASID in the transaction to
+perform address translation. The IOMMU APIs setup the corresponding PASID
+entry in IOMMU with the process address used by the CPU (e.g. %cr3 register in
+x86).
+
+The MSR must be configured on each logical CPU before any application
+thread can interact with a device. Threads that belong to the same
+process share the same page tables, thus the same MSR value.
+
+PASID is cleared when a process is created. The PASID allocation and MSR
+programming may occur long after a process and its threads have been created.
+One thread must call iommu_sva_bind_device() to allocate the PASID for the
+process. If a thread uses ENQCMD without the MSR first being populated, a #GP
+will be raised. The kernel will update the PASID MSR with the PASID for all
+threads in the process. A single process PASID can be used simultaneously
+with multiple devices since they all share the same address space.
+
+One thread can call iommu_sva_unbind_device() to free the allocated PASID.
+The kernel will clear the PASID MSR for all threads belonging to the process.
+
+New threads inherit the MSR value from the parent.
+
+Relationships
+=============
+
+ * Each process has many threads, but only one PASID.
+ * Devices have a limited number (~10's to 1000's) of hardware workqueues.
+ The device driver manages allocating hardware workqueues.
+ * A single mmap() maps a single hardware workqueue as a "portal" and
+ each portal maps down to a single workqueue.
+ * For each device with which a process interacts, there must be
+ one or more mmap()'d portals.
+ * Many threads within a process can share a single portal to access
+ a single device.
+ * Multiple processes can separately mmap() the same portal, in
+ which case they still share one device hardware workqueue.
+ * The single process-wide PASID is used by all threads to interact
+ with all devices. There is not, for instance, a PASID for each
+ thread or each thread<->device pair.
+
+FAQ
+===
+
+* What is SVA/SVM?
+
+Shared Virtual Addressing (SVA) permits I/O hardware and the processor to
+work in the same address space, i.e., to share it. Some call it Shared
+Virtual Memory (SVM), but Linux community wanted to avoid confusing it with
+POSIX Shared Memory and Secure Virtual Machines which were terms already in
+circulation.
+
+* What is a PASID?
+
+A Process Address Space ID (PASID) is a PCIe-defined Transaction Layer Packet
+(TLP) prefix. A PASID is a 20-bit number allocated and managed by the OS.
+PASID is included in all transactions between the platform and the device.
+
+* How are shared workqueues different?
+
+Traditionally, in order for userspace applications to interact with hardware,
+there is a separate hardware instance required per process. For example,
+consider doorbells as a mechanism of informing hardware about work to process.
+Each doorbell is required to be spaced 4k (or page-size) apart for process
+isolation. This requires hardware to provision that space and reserve it in
+MMIO. This doesn't scale as the number of threads becomes quite large. The
+hardware also manages the queue depth for Shared Work Queues (SWQ), and
+consumers don't need to track queue depth. If there is no space to accept
+a command, the device will return an error indicating retry.
+
+A user should check Deferrable Memory Write (DMWr) capability on the device
+and only submits ENQCMD when the device supports it. In the new DMWr PCIe
+terminology, devices need to support DMWr completer capability. In addition,
+it requires all switch ports to support DMWr routing and must be enabled by
+the PCIe subsystem, much like how PCIe atomic operations are managed for
+instance.
+
+SWQ allows hardware to provision just a single address in the device. When
+used with ENQCMD to submit work, the device can distinguish the process
+submitting the work since it will include the PASID assigned to that
+process. This helps the device scale to a large number of processes.
+
+* Is this the same as a user space device driver?
+
+Communicating with the device via the shared workqueue is much simpler
+than a full blown user space driver. The kernel driver does all the
+initialization of the hardware. User space only needs to worry about
+submitting work and processing completions.
+
+* Is this the same as SR-IOV?
+
+Single Root I/O Virtualization (SR-IOV) focuses on providing independent
+hardware interfaces for virtualizing hardware. Hence, it's required to be
+almost fully functional interface to software supporting the traditional
+BARs, space for interrupts via MSI-X, its own register layout.
+Virtual Functions (VFs) are assisted by the Physical Function (PF)
+driver.
+
+Scalable I/O Virtualization builds on the PASID concept to create device
+instances for virtualization. SIOV requires host software to assist in
+creating virtual devices; each virtual device is represented by a PASID
+along with the bus/device/function of the device. This allows device
+hardware to optimize device resource creation and can grow dynamically on
+demand. SR-IOV creation and management is very static in nature. Consult
+references below for more details.
+
+* Why not just create a virtual function for each app?
+
+Creating PCIe SR-IOV type Virtual Functions (VF) is expensive. VFs require
+duplicated hardware for PCI config space and interrupts such as MSI-X.
+Resources such as interrupts have to be hard partitioned between VFs at
+creation time, and cannot scale dynamically on demand. The VFs are not
+completely independent from the Physical Function (PF). Most VFs require
+some communication and assistance from the PF driver. SIOV, in contrast,
+creates a software-defined device where all the configuration and control
+aspects are mediated via the slow path. The work submission and completion
+happen without any mediation.
+
+* Does this support virtualization?
+
+ENQCMD can be used from within a guest VM. In these cases, the VMM helps
+with setting up a translation table to translate from Guest PASID to Host
+PASID. Please consult the ENQCMD instruction set reference for more
+details.
+
+* Does memory need to be pinned?
+
+When devices support SVA along with platform hardware such as IOMMU
+supporting such devices, there is no need to pin memory for DMA purposes.
+Devices that support SVA also support other PCIe features that remove the
+pinning requirement for memory.
+
+Device TLB support - Device requests the IOMMU to lookup an address before
+use via Address Translation Service (ATS) requests. If the mapping exists
+but there is no page allocated by the OS, IOMMU hardware returns that no
+mapping exists.
+
+Device requests the virtual address to be mapped via Page Request
+Interface (PRI). Once the OS has successfully completed the mapping, it
+returns the response back to the device. The device requests again for
+a translation and continues.
+
+IOMMU works with the OS in managing consistency of page-tables with the
+device. When removing pages, it interacts with the device to remove any
+device TLB entry that might have been cached before removing the mappings from
+the OS.
+
+References
+==========
+
+VT-D:
+https://01.org/blogs/ashokraj/2018/recent-enhancements-intel-virtualization-technology-directed-i/o-intel-vt-d
+
+SIOV:
+https://01.org/blogs/2019/assignable-interfaces-intel-scalable-i/o-virtualization-linux
+
+ENQCMD in ISE:
+https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
+
+DSA spec:
+https://software.intel.com/sites/default/files/341204-intel-data-streaming-accelerator-spec.pdf