summaryrefslogtreecommitdiffstats
path: root/arch/arm64/include
AgeCommit message (Collapse)AuthorFilesLines
2023-01-23Merge tag 'efi-fixes-for-v6.2-2' of ↵Linus Torvalds2-0/+24
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: "Another couple of EFI fixes, of which the first two were already in -next when I sent out the previous PR, but they caused some issues on non-EFI boots so I let them simmer for a bit longer. - ensure the EFI ResetSystem and ACPI PRM calls are recognized as users of the EFI runtime, and therefore protected against exceptions - account for the EFI runtime stack in the stacktrace code - remove Matthew Garrett's MAINTAINERS entry for efivarfs" * tag 'efi-fixes-for-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: Remove Matthew Garrett as efivarfs maintainer arm64: efi: Account for the EFI runtime stack in stack unwinder arm64: efi: Avoid workqueue to check whether EFI runtime is live
2023-01-16arm64: efi: Account for the EFI runtime stack in stack unwinderArd Biesheuvel1-0/+15
The EFI runtime services run from a dedicated stack now, and so the stack unwinder needs to be informed about this. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-16arm64: efi: Avoid workqueue to check whether EFI runtime is liveArd Biesheuvel1-0/+9
Comparing current_work() against efi_rts_work.work is sufficient to decide whether current is currently running EFI runtime services code at any level in its call stack. However, there are other potential users of the EFI runtime stack, such as the ACPI subsystem, which may invoke efi_call_virt_pointer() directly, and so any sync exceptions occurring in firmware during those calls are currently misidentified. So instead, let's check whether the stashed value of the thread stack pointer points into current's thread stack. This can only be the case if current was interrupted while running EFI runtime code. Note that this implies that we should clear the stashed value after switching back, to avoid false positives. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-27/+43
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix the PMCR_EL0 reset value after the PMU rework - Correctly handle S2 fault triggered by a S1 page table walk by not always classifying it as a write, as this breaks on R/O memslots - Document why we cannot exit with KVM_EXIT_MMIO when taking a write fault from a S1 PTW on a R/O memslot - Put the Apple M2 on the naughty list for not being able to correctly implement the vgic SEIS feature, just like the M1 before it - Reviewer updates: Alex is stepping down, replaced by Zenghui x86: - Fix various rare locking issues in Xen emulation and teach lockdep to detect them - Documentation improvements - Do not return host topology information from KVM_GET_SUPPORTED_CPUID" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest() KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking Documentation: kvm: fix SRCU locking order docs KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID KVM: nSVM: clarify recalc_intercepts() wrt CR8 MAINTAINERS: Remove myself as a KVM/arm64 reviewer MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_* KVM: arm64: Document the behaviour of S1PTW faults on RO memslots KVM: arm64: Fix S1PTW handling on RO memslots KVM: arm64: PMU: Fix PMCR_EL0 reset value
2023-01-09arm64/mm: Define dummy pud_user_exec() when using 2-level page-tableWill Deacon1-0/+1
With only two levels of page-table, the generic 'pud_*' macros are implemented using dummy operations in pgtable-nopmd.h. Since commit 730a11f982e6 ("arm64/mm: add pud_user_exec() check in pud_user_accessible_page()"), pud_user_accessible_page() unconditionally calls pud_user_exec(), which is an arm64-specific helper and therefore isn't defined by pgtable-nopmd.h. This results in a build failure for configurations with only two levels of page table: arch/arm64/include/asm/pgtable.h: In function 'pud_user_accessible_page': >> arch/arm64/include/asm/pgtable.h:870:51: error: implicit declaration of function 'pud_user_exec'; did you mean 'pmd_user_exec'? [-Werror=implicit-function-declaration] 870 | return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud)); | ^~~~~~~~~~~~~ | pmd_user_exec Fix the problem by defining pud_user_exec() as pud_user() in this case. Link: https://lore.kernel.org/r/202301080515.z6zEksU4-lkp@intel.com Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Will Deacon <will@kernel.org>
2023-01-06arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruptionAnshuman Khandual2-0/+18
If a Cortex-A715 cpu sees a page mapping permissions change from executable to non-executable, it may corrupt the ESR_ELx and FAR_ELx registers, on the next instruction abort caused by permission fault. Only user-space does executable to non-executable permission transition via mprotect() system call which calls ptep_modify_prot_start() and ptep_modify _prot_commit() helpers, while changing the page mapping. The platform code can override these helpers via __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION. Work around the problem via doing a break-before-make TLB invalidation, for all executable user space mappings, that go through mprotect() system call. This overrides ptep_modify_prot_start() and ptep_modify_prot_commit(), via defining HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION on the platform thus giving an opportunity to intercept user space exec mappings, and do the necessary TLB invalidation. Similar interceptions are also implemented for HugeTLB. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20230102061651.34745-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-01-05arm64: cmpxchg_double*: hazard against entire exchange variableMark Rutland2-2/+2
The inline assembly for arm64's cmpxchg_double*() implementations use a +Q constraint to hazard against other accesses to the memory location being exchanged. However, the pointer passed to the constraint is a pointer to unsigned long, and thus the hazard only applies to the first 8 bytes of the location. GCC can take advantage of this, assuming that other portions of the location are unchanged, leading to a number of potential problems. This is similar to what we fixed back in commit: fee960bed5e857eb ("arm64: xchg: hazard against entire exchange variable") ... but we forgot to adjust cmpxchg_double*() similarly at the same time. The same problem applies, as demonstrated with the following test: | struct big { | u64 lo, hi; | } __aligned(128); | | unsigned long foo(struct big *b) | { | u64 hi_old, hi_new; | | hi_old = b->hi; | cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78); | hi_new = b->hi; | | return hi_old ^ hi_new; | } ... which GCC 12.1.0 compiles as: | 0000000000000000 <foo>: | 0: d503233f paciasp | 4: aa0003e4 mov x4, x0 | 8: 1400000e b 40 <foo+0x40> | c: d2800240 mov x0, #0x12 // #18 | 10: d2800681 mov x1, #0x34 // #52 | 14: aa0003e5 mov x5, x0 | 18: aa0103e6 mov x6, x1 | 1c: d2800ac2 mov x2, #0x56 // #86 | 20: d2800f03 mov x3, #0x78 // #120 | 24: 48207c82 casp x0, x1, x2, x3, [x4] | 28: ca050000 eor x0, x0, x5 | 2c: ca060021 eor x1, x1, x6 | 30: aa010000 orr x0, x0, x1 | 34: d2800000 mov x0, #0x0 // #0 <--- BANG | 38: d50323bf autiasp | 3c: d65f03c0 ret | 40: d2800240 mov x0, #0x12 // #18 | 44: d2800681 mov x1, #0x34 // #52 | 48: d2800ac2 mov x2, #0x56 // #86 | 4c: d2800f03 mov x3, #0x78 // #120 | 50: f9800091 prfm pstl1strm, [x4] | 54: c87f1885 ldxp x5, x6, [x4] | 58: ca0000a5 eor x5, x5, x0 | 5c: ca0100c6 eor x6, x6, x1 | 60: aa0600a6 orr x6, x5, x6 | 64: b5000066 cbnz x6, 70 <foo+0x70> | 68: c8250c82 stxp w5, x2, x3, [x4] | 6c: 35ffff45 cbnz w5, 54 <foo+0x54> | 70: d2800000 mov x0, #0x0 // #0 <--- BANG | 74: d50323bf autiasp | 78: d65f03c0 ret Notice that at the lines with "BANG" comments, GCC has assumed that the higher 8 bytes are unchanged by the cmpxchg_double() call, and that `hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and LL/SC versions of cmpxchg_double(). This patch fixes the issue by passing a pointer to __uint128_t into the +Q constraint, ensuring that the compiler hazards against the entire 16 bytes being modified. With this change, GCC 12.1.0 compiles the above test as: | 0000000000000000 <foo>: | 0: f9400407 ldr x7, [x0, #8] | 4: d503233f paciasp | 8: aa0003e4 mov x4, x0 | c: 1400000f b 48 <foo+0x48> | 10: d2800240 mov x0, #0x12 // #18 | 14: d2800681 mov x1, #0x34 // #52 | 18: aa0003e5 mov x5, x0 | 1c: aa0103e6 mov x6, x1 | 20: d2800ac2 mov x2, #0x56 // #86 | 24: d2800f03 mov x3, #0x78 // #120 | 28: 48207c82 casp x0, x1, x2, x3, [x4] | 2c: ca050000 eor x0, x0, x5 | 30: ca060021 eor x1, x1, x6 | 34: aa010000 orr x0, x0, x1 | 38: f9400480 ldr x0, [x4, #8] | 3c: d50323bf autiasp | 40: ca0000e0 eor x0, x7, x0 | 44: d65f03c0 ret | 48: d2800240 mov x0, #0x12 // #18 | 4c: d2800681 mov x1, #0x34 // #52 | 50: d2800ac2 mov x2, #0x56 // #86 | 54: d2800f03 mov x3, #0x78 // #120 | 58: f9800091 prfm pstl1strm, [x4] | 5c: c87f1885 ldxp x5, x6, [x4] | 60: ca0000a5 eor x5, x5, x0 | 64: ca0100c6 eor x6, x6, x1 | 68: aa0600a6 orr x6, x5, x6 | 6c: b5000066 cbnz x6, 78 <foo+0x78> | 70: c8250c82 stxp w5, x2, x3, [x4] | 74: 35ffff45 cbnz w5, 5c <foo+0x5c> | 78: f9400480 ldr x0, [x4, #8] | 7c: d50323bf autiasp | 80: ca0000e0 eor x0, x7, x0 | 84: d65f03c0 ret ... sampling the high 8 bytes before and after the cmpxchg, and performing an EOR, as we'd expect. For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note that linux-4.9.y is oldest currently supported stable release, and mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run on my machines due to library incompatibilities. I've also used a standalone test to check that we can use a __uint128_t pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM 3.9.1. Fixes: 5284e1b4bc8a ("arm64: xchg: Implement cmpxchg_double") Fixes: e9a4b795652f ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU") Reported-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/ Reported-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/ Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: stable@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-01-05arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warningjunhua huang1-1/+1
After we fixed the uprobe inst endian in aarch_be, the sparse check report the following warning info: sparse warnings: (new ones prefixed by >>) >> kernel/events/uprobes.c:223:25: sparse: sparse: restricted __le32 degrades to integer >> kernel/events/uprobes.c:574:56: sparse: sparse: incorrect type in argument 4 (different base types) @@ expected unsigned int [addressable] [usertype] opcode @@ got restricted __le32 [usertype] @@ kernel/events/uprobes.c:574:56: sparse: expected unsigned int [addressable] [usertype] opcode kernel/events/uprobes.c:574:56: sparse: got restricted __le32 [usertype] >> kernel/events/uprobes.c:1483:32: sparse: sparse: incorrect type in initializer (different base types) @@ expected unsigned int [usertype] insn @@ got restricted __le32 [usertype] @@ kernel/events/uprobes.c:1483:32: sparse: expected unsigned int [usertype] insn kernel/events/uprobes.c:1483:32: sparse: got restricted __le32 [usertype] use the __le32 to u32 for uprobe_opcode_t, to keep the same. Fixes: 60f07e22a73d ("arm64:uprobe fix the uprobe SWBP_INSN in big-endian") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: junhua huang <huang.junhua@zte.com.cn> Link: https://lore.kernel.org/r/202212280954121197626@zte.com.cn Signed-off-by: Will Deacon <will@kernel.org>
2023-01-05Merge branch kvm-arm64/s1ptw-write-fault into kvmarm-master/fixesMarc Zyngier3-27/+39
* kvm-arm64/s1ptw-write-fault: : . : Fix S1PTW fault handling that was until then always taken : as a write. From the cover letter: : : `Recent developments on the EFI front have resulted in guests that : simply won't boot if the page tables are in a read-only memslot and : that you're a bit unlucky in the way S2 gets paged in... The core : issue is related to the fact that we treat a S1PTW as a write, which : is close enough to what needs to be done. Until to get to RO memslots. : : The first patch fixes this and is definitely a stable candidate. It : splits the faulting of page tables in two steps (RO translation fault, : followed by a writable permission fault -- should it even happen). : The second one documents the slightly odd behaviour of PTW writes to : RO memslot, which do not result in a KVM_MMIO exit. The last patch is : totally optional, only tangentially related, and randomly repainting : stuff (maybe that's contagious, who knows)." : : . KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_* KVM: arm64: Document the behaviour of S1PTW faults on RO memslots KVM: arm64: Fix S1PTW handling on RO memslots Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-01-05KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementationsMarc Zyngier1-0/+4
I really hoped that Apple had fixed their not-quite-a-vgic implementation when moving from M1 to M2. Alas, it seems they didn't, and running a buggy EFI version results in the vgic generating SErrors outside of the guest and taking the host down. Apply the same workaround as for M1. Yes, this is all a bit crap. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230103095022.3230946-2-maz@kernel.org
2023-01-05arm64/mm: add pud_user_exec() check in pud_user_accessible_page()Liu Shixin1-2/+2
Add check for the executable case in pud_user_accessible_page() too like what we did for pte and pmd. Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK") Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Link: https://lore.kernel.org/r/20221122123137.429686-1-liushixin2@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2023-01-05arm64/mm: fix incorrect file_map_count for invalid pmdLiu Shixin1-1/+1
The page table check trigger BUG_ON() unexpectedly when split hugepage: ------------[ cut here ]------------ kernel BUG at mm/page_table_check.c:119! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748 Hardware name: linux,dummy-virt (DT) pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : page_table_check_set.isra.0+0x398/0x468 lr : page_table_check_set.isra.0+0x1c0/0x468 [...] Call trace: page_table_check_set.isra.0+0x398/0x468 __page_table_check_pte_set+0x160/0x1c0 __split_huge_pmd_locked+0x900/0x1648 __split_huge_pmd+0x28c/0x3b8 unmap_page_range+0x428/0x858 unmap_single_vma+0xf4/0x1c8 zap_page_range+0x2b0/0x410 madvise_vma_behavior+0xc44/0xe78 do_madvise+0x280/0x698 __arm64_sys_madvise+0x90/0xe8 invoke_syscall.constprop.0+0xdc/0x1d8 do_el0_svc+0xf4/0x3f8 el0_svc+0x58/0x120 el0t_64_sync_handler+0xb8/0xc0 el0t_64_sync+0x19c/0x1a0 [...] On arm64, pmd_leaf() will return true even if the pmd is invalid due to pmd_present_invalid() check. So in pmdp_invalidate() the file_map_count will not only decrease once but also increase once. Then in set_pte_at(), the file_map_count increase again, and so trigger BUG_ON() unexpectedly. Add !pmd_present_invalid() check in pmd_user_accessible_page() to fix the problem. Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK") Reported-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20221121073608.4183459-1-liushixin2@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2023-01-03KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*Marc Zyngier3-25/+19
The former is an AArch32 legacy, so let's move over to the verbose (and strictly identical) version. This involves moving some of the #defines that were private to KVM into the more generic esr.h. Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-01-03KVM: arm64: Fix S1PTW handling on RO memslotsMarc Zyngier1-2/+20
A recent development on the EFI front has resulted in guests having their page tables baked in the firmware binary, and mapped into the IPA space as part of a read-only memslot. Not only is this legitimate, but it also results in added security, so thumbs up. It is possible to take an S1PTW translation fault if the S1 PTs are unmapped at stage-2. However, KVM unconditionally treats S1PTW as a write to correctly handle hardware AF/DB updates to the S1 PTs. Furthermore, KVM injects an exception into the guest for S1PTW writes. In the aforementioned case this results in the guest taking an abort it won't recover from, as the S1 PTs mapping the vectors suffer from the same problem. So clearly our handling is... wrong. Instead, switch to a two-pronged approach: - On S1PTW translation fault, handle the fault as a read - On S1PTW permission fault, handle the fault as a write This is of no consequence to SW that *writes* to its PTs (the write will trigger a non-S1PTW fault), and SW that uses RO PTs will not use HW-assisted AF/DB anyway, as that'd be wrong. Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch") do we end-up with two back-to-back faults (page being evicted and faulted back). I don't think this is a case worth optimising for. Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Regression-tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
2022-12-16Merge tag 'arm64-fixes' of ↵Linus Torvalds2-18/+0
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Fix Kconfig dependencies to re-allow the enabling of function graph tracer and shadow call stacks at the same time. - Revert the workaround for CPU erratum #2645198 since the CONFIG_ guards were incorrect and the code has therefore not seen any real exposure in -next. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: Revert "arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption" ftrace: Allow WITH_ARGS flavour of graph tracer with shadow call stack
2022-12-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds10-38/+341
Pull kvm updates from Paolo Bonzini: "ARM64: - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a97d: "Fix a number of issues with MTE, such as races on the tags being initialised vs the PG_mte_tagged flag as well as the lack of support for VM_SHARED when KVM is involved. Patches from Catalin Marinas and Peter Collingbourne"). - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. s390: - Second batch of the lazy destroy patches - First batch of KVM changes for kernel virtual != physical address support - Removal of a unused function x86: - Allow compiling out SMM support - Cleanup and documentation of SMM state save area format - Preserve interrupt shadow in SMM state save area - Respond to generic signals during slow page faults - Fixes and optimizations for the non-executable huge page errata fix. - Reprogram all performance counters on PMU filter change - Cleanups to Hyper-V emulation and tests - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest running on top of a L1 Hyper-V hypervisor) - Advertise several new Intel features - x86 Xen-for-KVM: - Allow the Xen runstate information to cross a page boundary - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured - Add support for 32-bit guests in SCHEDOP_poll - Notable x86 fixes and cleanups: - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. - Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. - Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. - Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency. - Advertise (on AMD) that the SMM_CTL MSR is not supported - Remove unnecessary exports Generic: - Support for responding to signals during page faults; introduces new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks Selftests: - Fix an inverted check in the access tracking perf test, and restore support for asserting that there aren't too many idle pages when running on bare metal. - Fix build errors that occur in certain setups (unsure exactly what is unique about the problematic setup) due to glibc overriding static_assert() to a variant that requires a custom message. - Introduce actual atomics for clear/set_bit() in selftests - Add support for pinning vCPUs in dirty_log_perf_test. - Rename the so called "perf_util" framework to "memstress". - Add a lightweight psuedo RNG for guest use, and use it to randomize the access pattern and write vs. read percentage in the memstress tests. - Add a common ucall implementation; code dedup and pre-work for running SEV (and beyond) guests in selftests. - Provide a common constructor and arch hook, which will eventually be used by x86 to automatically select the right hypercall (AMD vs. Intel). - A bunch of added/enabled/fixed selftests for ARM64, covering memslots, breakpoints, stage-2 faults and access tracking. - x86-specific selftest changes: - Clean up x86's page table management. - Clean up and enhance the "smaller maxphyaddr" test, and add a related test to cover generic emulation failure. - Clean up the nEPT support checks. - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values. - Fix an ordering issue in the AMX test introduced by recent conversions to use kvm_cpu_has(), and harden the code to guard against similar bugs in the future. Anything that tiggers caching of KVM's supported CPUID, kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if the caching occurs before the test opts in via prctl(). Documentation: - Remove deleted ioctls from documentation - Clean up the docs for the x86 MSR filter. - Various fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits) KVM: x86: Add proper ReST tables for userspace MSR exits/flags KVM: selftests: Allocate ucall pool from MEM_REGION_DATA KVM: arm64: selftests: Align VA space allocator with TTBR0 KVM: arm64: Fix benign bug with incorrect use of VA_BITS KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow KVM: x86: Advertise that the SMM_CTL MSR is not supported KVM: x86: remove unnecessary exports KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic" tools: KVM: selftests: Convert clear/set_bit() to actual atomics tools: Drop "atomic_" prefix from atomic test_and_set_bit() tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests perf tools: Use dedicated non-atomic clear/set bit helpers tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers KVM: arm64: selftests: Enable single-step without a "full" ucall() KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself KVM: Remove stale comment about KVM_REQ_UNHALT KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR KVM: Reference to kvm_userspace_memory_region in doc and comments KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl ...
2022-12-15Revert "arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption"Will Deacon2-18/+0
This reverts commit 44ecda71fd8a70185c270f5914ac563827fe1d4c. All versions of this patch on the mailing list, including the version that ended up getting merged, have portions of code guarded by the non-existent CONFIG_ARM64_WORKAROUND_2645198 option. Although Anshuman says he tested the code with some additional debug changes [1], I'm hesitant to fix the CONFIG option and light up a bunch of code right before I (and others) disappear for the end of year holidays, during which time we won't be around to deal with any fallout. So revert the change for now. We can bring back a fixed, tested version for a later -rc when folks are thinking about things other than trees and turkeys. [1] https://lore.kernel.org/r/b6f61241-e436-5db1-1053-3b441080b8d6@arm.com Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/20221215094811.23188-1-lukas.bulwahn@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2022-12-13Merge tag 'mm-stable-2022-12-13' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - More userfaultfs work from Peter Xu - Several convert-to-folios series from Sidhartha Kumar and Huang Ying - Some filemap cleanups from Vishal Moola - David Hildenbrand added the ability to selftest anon memory COW handling - Some cpuset simplifications from Liu Shixin - Addition of vmalloc tracing support by Uladzislau Rezki - Some pagecache folioifications and simplifications from Matthew Wilcox - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series should have been in the non-MM tree, my bad - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages - Peter Xu utilized the PTE marker code for handling swapin errors - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand - zram support for multiple compression streams from Sergey Senozhatsky - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations - Vishal Moola removed the try_to_release_page() wrapper - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache - David Hildenbrand did some cleanup and repair work on KSM COW breaking - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range() - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages() - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting - David Hildenbrand has fixed some of our MM selftests for 32-bit machines - Many singleton patches, as usual * tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits) mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio mm: mmu_gather: allow more than one batch of delayed rmaps mm: fix typo in struct pglist_data code comment kmsan: fix memcpy tests mm: add cond_resched() in swapin_walk_pmd_entry() mm: do not show fs mm pc for VM_LOCKONFAULT pages selftests/vm: ksm_functional_tests: fixes for 32bit selftests/vm: cow: fix compile warning on 32bit selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem mm,thp,rmap: fix races between updates of subpages_mapcount mm: memcg: fix swapcached stat accounting mm: add nodes= arg to memory.reclaim mm: disable top-tier fallback to reclaim on proactive reclaim selftests: cgroup: make sure reclaim target memcg is unprotected selftests: cgroup: refactor proactive reclaim code to reclaim_until() mm: memcg: fix stale protection of reclaim target memcg mm/mmap: properly unaccount memory on mas_preallocate() failure omfs: remove ->writepage jfs: remove ->writepage ...
2022-12-13Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds1-0/+1
Pull ARM updates from Russell King: - update unwinder to cope with module PLTs - enable UBSAN on ARM - improve kernel fault message - update UEFI runtime page tables dump - avoid clang's __aeabi_uldivmod generated in NWFPE code - disable FIQs on CPU shutdown paths - update XOR register usage - a number of build updates (using .arch, thread pointer, removal of lazy evaluation in Makefile) - conversion of stacktrace code to stackwalk - findbit assembly updates - hwcap feature updates for ARMv8 CPUs - instruction dump updates for big-endian platforms - support for function error injection * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (31 commits) ARM: 9279/1: support function error injection ARM: 9277/1: Make the dumped instructions are consistent with the disassembled ones ARM: 9276/1: Refactor dump_instr() ARM: 9275/1: Drop '-mthumb' from AFLAGS_ISA ARM: 9274/1: Add hwcap for Speculative Store Bypassing Safe ARM: 9273/1: Add hwcap for Speculation Barrier(SB) ARM: 9272/1: vfp: Add hwcap for FEAT_AA32I8MM ARM: 9271/1: vfp: Add hwcap for FEAT_AA32BF16 ARM: 9270/1: vfp: Add hwcap for FEAT_FHM ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd ARM: 9268/1: vfp: Add hwcap FPHP and ASIMDHP for FEAT_FP16 ARM: 9267/1: Define Armv8 registers in AArch32 state ARM: findbit: add unwinder information ARM: findbit: operate by words ARM: findbit: convert to macros ARM: findbit: provide more efficient ARMv7 implementation ARM: findbit: document ARMv5 bit offset calculation ARM: 9259/1: stacktrace: Convert stacktrace to generic ARCH_STACKWALK ARM: 9258/1: stacktrace: Make stack walk callback consistent with generic code ARM: 9265/1: pass -march= only to compiler ...
2022-12-13Merge tag 'efi-next-for-v6.2' of ↵Linus Torvalds1-3/+24
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: "Another fairly sizable pull request, by EFI subsystem standards. Most of the work was done by me, some of it in collaboration with the distro and bootloader folks (GRUB, systemd-boot), where the main focus has been on removing pointless per-arch differences in the way EFI boots a Linux kernel. - Refactor the zboot code so that it incorporates all the EFI stub logic, rather than calling the decompressed kernel as a EFI app. - Add support for initrd= command line option to x86 mixed mode. - Allow initrd= to be used with arbitrary EFI accessible file systems instead of just the one the kernel itself was loaded from. - Move some x86-only handling and manipulation of the EFI memory map into arch/x86, as it is not used anywhere else. - More flexible handling of any random seeds provided by the boot environment (i.e., systemd-boot) so that it becomes available much earlier during the boot. - Allow improved arch-agnostic EFI support in loaders, by setting a uniform baseline of supported features, and adding a generic magic number to the DOS/PE header. This should allow loaders such as GRUB or systemd-boot to reduce the amount of arch-specific handling substantially. - (arm64) Run EFI runtime services from a dedicated stack, and use it to recover from synchronous exceptions that might occur in the firmware code. - (arm64) Ensure that we don't allocate memory outside of the 48-bit addressable physical range. - Make EFI pstore record size configurable - Add support for decoding CXL specific CPER records" * tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (43 commits) arm64: efi: Recover from synchronous exceptions occurring in firmware arm64: efi: Execute runtime services from a dedicated stack arm64: efi: Limit allocations to 48-bit addressable physical region efi: Put Linux specific magic number in the DOS header efi: libstub: Always enable initrd command line loader and bump version efi: stub: use random seed from EFI variable efi: vars: prohibit reading random seed variables efi: random: combine bootloader provided RNG seed with RNG protocol output efi/cper, cxl: Decode CXL Error Log efi/cper, cxl: Decode CXL Protocol Error Section efi: libstub: fix efi_load_initrd_dev_path() kernel-doc comment efi: x86: Move EFI runtime map sysfs code to arch/x86 efi: runtime-maps: Clarify purpose and enable by default for kexec efi: pstore: Add module parameter for setting the record size efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures efi: memmap: Move manipulation routines into x86 arch tree efi: memmap: Move EFI fake memmap support into x86 arch tree efi: libstub: Undeprecate the command line initrd loader efi: libstub: Add mixed mode support to command line initrd loader efi: libstub: Permit mixed mode return types other than efi_status_t ...
2022-12-12Merge tag 'random-6.2-rc1-for-linus' of ↵Linus Torvalds2-46/+11
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: - Replace prandom_u32_max() and various open-coded variants of it, there is now a new family of functions that uses fast rejection sampling to choose properly uniformly random numbers within an interval: get_random_u32_below(ceil) - [0, ceil) get_random_u32_above(floor) - (floor, U32_MAX] get_random_u32_inclusive(floor, ceil) - [floor, ceil] Coccinelle was used to convert all current users of prandom_u32_max(), as well as many open-coded patterns, resulting in improvements throughout the tree. I'll have a "late" 6.1-rc1 pull for you that removes the now unused prandom_u32_max() function, just in case any other trees add a new use case of it that needs to converted. According to linux-next, there may be two trivial cases of prandom_u32_max() reintroductions that are fixable with a 's/.../.../'. So I'll have for you a final conversion patch doing that alongside the removal patch during the second week. This is a treewide change that touches many files throughout. - More consistent use of get_random_canary(). - Updates to comments, documentation, tests, headers, and simplification in configuration. - The arch_get_random*_early() abstraction was only used by arm64 and wasn't entirely useful, so this has been replaced by code that works in all relevant contexts. - The kernel will use and manage random seeds in non-volatile EFI variables, refreshing a variable with a fresh seed when the RNG is initialized. The RNG GUID namespace is then hidden from efivarfs to prevent accidental leakage. These changes are split into random.c infrastructure code used in the EFI subsystem, in this pull request, and related support inside of EFISTUB, in Ard's EFI tree. These are co-dependent for full functionality, but the order of merging doesn't matter. - Part of the infrastructure added for the EFI support is also used for an improvement to the way vsprintf initializes its siphash key, replacing an sleep loop wart. - The hardware RNG framework now always calls its correct random.c input function, add_hwgenerator_randomness(), rather than sometimes going through helpers better suited for other cases. - The add_latent_entropy() function has long been called from the fork handler, but is a no-op when the latent entropy gcc plugin isn't used, which is fine for the purposes of latent entropy. But it was missing out on the cycle counter that was also being mixed in beside the latent entropy variable. So now, if the latent entropy gcc plugin isn't enabled, add_latent_entropy() will expand to a call to add_device_randomness(NULL, 0), which adds a cycle counter, without the absent latent entropy variable. - The RNG is now reseeded from a delayed worker, rather than on demand when used. Always running from a worker allows it to make use of the CPU RNG on platforms like S390x, whose instructions are too slow to do so from interrupts. It also has the effect of adding in new inputs more frequently with more regularity, amounting to a long term transcript of random values. Plus, it helps a bit with the upcoming vDSO implementation (which isn't yet ready for 6.2). - The jitter entropy algorithm now tries to execute on many different CPUs, round-robining, in hopes of hitting even more memory latencies and other unpredictable effects. It also will mix in a cycle counter when the entropy timer fires, in addition to being mixed in from the main loop, to account more explicitly for fluctuations in that timer firing. And the state it touches is now kept within the same cache line, so that it's assured that the different execution contexts will cause latencies. * tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits) random: include <linux/once.h> in the right header random: align entropy_timer_state to cache line random: mix in cycle counter when jitter timer fires random: spread out jitter callback to different CPUs random: remove extraneous period and add a missing one in comments efi: random: refresh non-volatile random seed when RNG is initialized vsprintf: initialize siphash key using notifier random: add back async readiness notifier random: reseed in delayed work rather than on-demand random: always mix cycle counter in add_latent_entropy() hw_random: use add_hwgenerator_randomness() for early entropy random: modernize documentation comment on get_random_bytes() random: adjust comment to account for removed function random: remove early archrandom abstraction random: use random.trust_{bootloader,cpu} command line option only stackprotector: actually use get_random_canary() stackprotector: move get_random_canary() into stackprotector.h treewide: use get_random_u32_inclusive() when possible treewide: use get_random_u32_{above,below}() instead of manual loop treewide: use get_random_u32_below() instead of deprecated function ...
2022-12-12Merge tag 'arm64-upstream' of ↵Linus Torvalds27-278/+348
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The highlights this time are support for dynamically enabling and disabling Clang's Shadow Call Stack at boot and a long-awaited optimisation to the way in which we handle the SVE register state on system call entry to avoid taking unnecessary traps from userspace. Summary: ACPI: - Enable FPDT support for boot-time profiling - Fix CPU PMU probing to work better with PREEMPT_RT - Update SMMUv3 MSI DeviceID parsing to latest IORT spec - APMT support for probing Arm CoreSight PMU devices CPU features: - Advertise new SVE instructions (v2.1) - Advertise range prefetch instruction - Advertise CSSC ("Common Short Sequence Compression") scalar instructions, adding things like min, max, abs, popcount - Enable DIT (Data Independent Timing) when running in the kernel - More conversion of system register fields over to the generated header CPU misfeatures: - Workaround for Cortex-A715 erratum #2645198 Dynamic SCS: - Support for dynamic shadow call stacks to allow switching at runtime between Clang's SCS implementation and the CPU's pointer authentication feature when it is supported (complete with scary DWARF parser!) Tracing and debug: - Remove static ftrace in favour of, err, dynamic ftrace! - Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace and existing arch code - Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the old FTRACE_WITH_REGS - Extend 'crashkernel=' parameter with default value and fallback to placement above 4G physical if initial (low) allocation fails SVE: - Optimisation to avoid disabling SVE unconditionally on syscall entry and just zeroing the non-shared state on return instead Exceptions: - Rework of undefined instruction handling to avoid serialisation on global lock (this includes emulation of user accesses to the ID registers) Perf and PMU: - Support for TLP filters in Hisilicon's PCIe PMU device - Support for the DDR PMU present in Amlogic Meson G12 SoCs - Support for the terribly-named "CoreSight PMU" architecture from Arm (and Nvidia's implementation of said architecture) Misc: - Tighten up our boot protocol for systems with memory above 52 bits physical - Const-ify static keys to satisty jump label asm constraints - Trivial FFA driver cleanups in preparation for v1.1 support - Export the kernel_neon_* APIs as GPL symbols - Harden our instruction generation routines against instrumentation - A bunch of robustness improvements to our arch-specific selftests - Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits) arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler arm64: Prohibit instrumentation on arch_stack_walk() arm64:uprobe fix the uprobe SWBP_INSN in big-endian arm64: alternatives: add __init/__initconst to some functions/variables arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init() kselftest/arm64: Allow epoll_wait() to return more than one result kselftest/arm64: Don't drain output while spawning children kselftest/arm64: Hold fp-stress children until they're all spawned arm64/sysreg: Remove duplicate definitions from asm/sysreg.h arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation arm64/sysreg: Convert MVFR2_EL1 to automatic generation arm64/sysreg: Convert MVFR1_EL1 to automatic generation arm64/sysreg: Convert MVFR0_EL1 to automatic generation arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation ...
2022-12-08arm64: efi: Recover from synchronous exceptions occurring in firmwareArd Biesheuvel1-0/+8
Unlike x86, which has machinery to deal with page faults that occur during the execution of EFI runtime services, arm64 has nothing like that, and a synchronous exception raised by firmware code brings down the whole system. With more EFI based systems appearing that were not built to run Linux (such as the Windows-on-ARM laptops based on Qualcomm SOCs), as well as the introduction of PRM (platform specific firmware routines that are callable just like EFI runtime services), we are more likely to run into issues of this sort, and it is much more likely that we can identify and work around such issues if they don't bring down the system entirely. Since we already use a EFI runtime services call wrapper in assembler, we can quite easily add some code that captures the execution state at the point where the call is made, allowing us to revert to this state and proceed execution if the call triggered a synchronous exception. Given that the kernel and the firmware don't share any data structures that could end up in an indeterminate state, we can happily continue running, as long as we mark the EFI runtime services as unavailable from that point on. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2022-12-08arm64: efi: Execute runtime services from a dedicated stackArd Biesheuvel1-0/+3
With the introduction of PRMT in the ACPI subsystem, the EFI rts workqueue is no longer the only caller of efi_call_virt_pointer() in the kernel. This means the EFI runtime services lock is no longer sufficient to manage concurrent calls into firmware, but also that firmware calls may occur that are not marshalled via the workqueue mechanism, but originate directly from the caller context. For added robustness, and to ensure that the runtime services have 8 KiB of stack space available as per the EFI spec, introduce a spinlock protected EFI runtime stack of 8 KiB, where the spinlock also ensures serialization between the EFI rts workqueue (which itself serializes EFI runtime calls) and other callers of efi_call_virt_pointer(). While at it, use the stack pivot to avoid reloading the shadow call stack pointer from the ordinary stack, as doing so could produce a gadget to defeat it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-12-07arm64: efi: Limit allocations to 48-bit addressable physical regionArd Biesheuvel1-0/+1
The UEFI spec does not mention or reason about the configured size of the virtual address space at all, but it does mention that all memory should be identity mapped using a page size of 4 KiB. This means that a LPA2 capable system that has any system memory outside of the 48-bit addressable physical range and follows the spec to the letter may serve page allocation requests from regions of memory that the kernel cannot access unless it was built with LPA2 support and enables it at runtime. So let's ensure that all page allocations are limited to the 48-bit range. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-12-07Merge tag 'v6.1-rc8' into efi/nextArd Biesheuvel2-10/+2
Linux 6.1-rc8
2022-12-06Merge branch 'for-next/undef-traps' into for-next/coreWill Deacon4-14/+17
* for-next/undef-traps: arm64: armv8_deprecated: fix unused-function error arm64: armv8_deprecated: rework deprected instruction handling arm64: armv8_deprecated: move aarch32 helper earlier arm64: armv8_deprecated move emulation functions arm64: armv8_deprecated: fold ops into insn_emulation arm64: rework EL0 MRS emulation arm64: factor insn read out of call_undef_hook() arm64: factor out EL1 SSBS emulation hook arm64: split EL0/EL1 UNDEF handlers arm64: allow kprobes on EL0 handlers
2022-12-06Merge branch 'for-next/trivial' into for-next/coreWill Deacon7-37/+19
* for-next/trivial: arm64: alternatives: add __init/__initconst to some functions/variables arm64/asm: Remove unused assembler DAIF save/restore macros arm64/kpti: Move DAIF masking to C code Revert "arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)" arm64/mm: Drop unused restore_ttbr1 arm64: alternatives: make apply_alternatives_vdso() static arm64/mm: Drop idmap_pg_end[] declaration arm64/mm: Drop redundant BUG_ON(!pgtable_alloc) arm64: make is_ttbrX_addr() noinstr-safe arm64/signal: Document our convention for choosing magic numbers arm64: atomics: lse: remove stale dependency on JUMP_LABEL arm64: paravirt: remove conduit check in has_pv_steal_clock arm64: entry: Fix typo arm64/booting: Add missing colon to FA64 entry arm64/mm: Drop ARM64_KERNEL_USES_PMD_MAPS arm64/asm: Remove unused enable_da macro
2022-12-06Merge branch 'for-next/sysregs' into for-next/coreWill Deacon1-138/+0
* for-next/sysregs: (39 commits) arm64/sysreg: Remove duplicate definitions from asm/sysreg.h arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation arm64/sysreg: Convert MVFR2_EL1 to automatic generation arm64/sysreg: Convert MVFR1_EL1 to automatic generation arm64/sysreg: Convert MVFR0_EL1 to automatic generation arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR6_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR5_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR4_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR3_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR2_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR1_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR0_EL1 to automatic generation arm64/sysreg: Convert ID_MMFR4_EL1 to automatic generation arm64/sysreg: Convert ID_MMFR3_EL1 to automatic generation ...
2022-12-06Merge branch 'for-next/sve-state' into for-next/coreWill Deacon3-5/+31
* for-next/sve-state: arm64/fp: Use a struct to pass data to fpsimd_bind_state_to_cpu() arm64/sve: Leave SVE enabled on syscall if we don't context switch arm64/fpsimd: SME no longer requires SVE register state arm64/fpsimd: Load FP state based on recorded data type arm64/fpsimd: Stop using TIF_SVE to manage register saving in KVM arm64/fpsimd: Have KVM explicitly say which FP registers to save arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE KVM: arm64: Discard any SVE state when entering KVM guests
2022-12-06Merge branch 'for-next/stacks' into for-next/coreWill Deacon2-13/+2
* for-next/stacks: arm64: move on_thread_stack() to <asm/stacktrace.h> arm64: remove current_top_of_stack()
2022-12-06Merge branch 'for-next/mm' into for-next/coreWill Deacon3-7/+6
* for-next/mm: arm64: booting: Require placement within 48-bit addressable memory arm64: mm: kfence: only handle translation faults arm64/mm: Simplify and document pte_to_phys() for 52 bit addresses
2022-12-06Merge branch 'for-next/insn' into for-next/coreWill Deacon2-48/+110
* for-next/insn: arm64:uprobe fix the uprobe SWBP_INSN in big-endian arm64: insn: always inline hint generation arm64: insn: simplify insn group identification arm64: insn: always inline predicates arm64: insn: remove aarch64_insn_gen_prefetch()
2022-12-06Merge branch 'for-next/ftrace' into for-next/coreWill Deacon1-6/+66
* for-next/ftrace: ftrace: arm64: remove static ftrace ftrace: arm64: move from REGS to ARGS ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accesses ftrace: rename ftrace_instruction_pointer_set() -> ftrace_regs_set_instruction_pointer() ftrace: pass fregs to arch_ftrace_set_direct_caller()
2022-12-06Merge branch 'for-next/errata' into for-next/coreWill Deacon3-0/+20
* for-next/errata: arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption arm64: Add Cortex-715 CPU part definition
2022-12-06Merge branch 'for-next/dynamic-scs' into for-next/coreWill Deacon2-0/+57
* for-next/dynamic-scs: arm64: implement dynamic shadow call stack for Clang scs: add support for dynamic shadow call stacks arm64: unwind: add asynchronous unwind tables to kernel and modules
2022-12-06Merge branch 'for-next/cpufeature' into for-next/coreWill Deacon3-4/+14
* for-next/cpufeature: kselftest/arm64: Add SVE 2.1 to hwcap test arm64/hwcap: Add support for SVE 2.1 kselftest/arm64: Add FEAT_RPRFM to the hwcap test arm64/hwcap: Add support for FEAT_RPRFM kselftest/arm64: Add FEAT_CSSC to the hwcap selftest arm64/hwcap: Add support for FEAT_CSSC arm64: Enable data independent timing (DIT) in the kernel
2022-12-05Merge remote-tracking branch 'arm64/for-next/sysregs' into kvmarm-master/nextMarc Zyngier1-140/+0
Merge arm64's sysreg repainting branch to avoid too many ugly conflicts... Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05Merge branch kvm-arm64/pmu-unchained into kvmarm-master/nextMarc Zyngier2-0/+6
* kvm-arm64/pmu-unchained: : . : PMUv3 fixes and improvements: : : - Make the CHAIN event handling strictly follow the architecture : : - Add support for PMUv3p5 (64bit counters all the way) : : - Various fixes and cleanups : . KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow KVM: arm64: PMU: Sanitise PMCR_EL0.LP on first vcpu run KVM: arm64: PMU: Simplify PMCR_EL0 reset handling KVM: arm64: PMU: Replace version number '0' with ID_AA64DFR0_EL1_PMUVer_NI KVM: arm64: PMU: Make kvm_pmc the main data structure KVM: arm64: PMU: Simplify vcpu computation on perf overflow notification KVM: arm64: PMU: Allow PMUv3p5 to be exposed to the guest KVM: arm64: PMU: Implement PMUv3p5 long counter support KVM: arm64: PMU: Allow ID_DFR0_EL1.PerfMon to be set from userspace KVM: arm64: PMU: Allow ID_AA64DFR0_EL1.PMUver to be set from userspace KVM: arm64: PMU: Move the ID_AA64DFR0_EL1.PMUver limit to VM creation KVM: arm64: PMU: Do not let AArch32 change the counters' top 32 bits KVM: arm64: PMU: Simplify setting a counter to a specific value KVM: arm64: PMU: Add counter_index_to_*reg() helpers KVM: arm64: PMU: Only narrow counters that are not 64bit wide KVM: arm64: PMU: Narrow the overflow checking when required KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow KVM: arm64: PMU: Always advertise the CHAIN event KVM: arm64: PMU: Align chained counter implementation with architecture pseudocode arm64: Add ID_DFR0_EL1.PerfMon values for PMUv3p7 and IMP_DEF Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05Merge branch kvm-arm64/mte-map-shared into kvmarm-master/nextMarc Zyngier2-3/+66
* kvm-arm64/mte-map-shared: : . : Update the MTE support to allow the VMM to use shared mappings : to back the memslots exposed to MTE-enabled guests. : : Patches courtesy of Catalin Marinas and Peter Collingbourne. : . : Fix a number of issues with MTE, such as races on the tags : being initialised vs the PG_mte_tagged flag as well as the : lack of support for VM_SHARED when KVM is involved. : : Patches from Catalin Marinas and Peter Collingbourne. : . Documentation: document the ABI changes for KVM_CAP_ARM_MTE KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE enabled KVM: arm64: unify the tests for VMAs in memslots when MTE is enabled arm64: mte: Lock a page for MTE tag initialisation mm: Add PG_arch_3 page flag KVM: arm64: Simplify the sanitise_mte_tags() logic arm64: mte: Fix/clarify the PG_mte_tagged semantics mm: Do not enable PG_arch_2 for all 64-bit architectures Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05Merge branch kvm-arm64/pkvm-vcpu-state into kvmarm-master/nextMarc Zyngier7-6/+140
* kvm-arm64/pkvm-vcpu-state: (25 commits) : . : Large drop of pKVM patches from Will Deacon and co, adding : a private vm/vcpu state at EL2, managed independently from : the EL1 state. From the cover letter: : : "This is version six of the pKVM EL2 state series, extending the pKVM : hypervisor code so that it can dynamically instantiate and manage VM : data structures without the host being able to access them directly. : These structures consist of a hyp VM, a set of hyp vCPUs and the stage-2 : page-table for the MMU. The pages used to hold the hypervisor structures : are returned to the host when the VM is destroyed." : . KVM: arm64: Use the pKVM hyp vCPU structure in handle___kvm_vcpu_run() KVM: arm64: Don't unnecessarily map host kernel sections at EL2 KVM: arm64: Explicitly map 'kvm_vgic_global_state' at EL2 KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2 KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the host KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache KVM: arm64: Instantiate guest stage-2 page-tables at EL2 KVM: arm64: Consolidate stage-2 initialisation into a single function KVM: arm64: Add generic hyp_memcache helpers KVM: arm64: Provide I-cache invalidation by virtual address at EL2 KVM: arm64: Initialise hypervisor copies of host symbols unconditionally KVM: arm64: Add per-cpu fixmap infrastructure at EL2 KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1 KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 KVM: arm64: Rename 'host_kvm' to 'host_mmu' KVM: arm64: Add hyp_spinlock_t static initializer KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h KVM: arm64: Add helpers to pin memory shared with the hypervisor at EL2 KVM: arm64: Prevent the donation of no-map pages KVM: arm64: Implement do_donate() helper for donating memory ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05Merge branch kvm-arm64/parallel-faults into kvmarm-master/nextMarc Zyngier1-28/+125
* kvm-arm64/parallel-faults: : . : Parallel stage-2 fault handling, courtesy of Oliver Upton. : From the cover letter: : : "Presently KVM only takes a read lock for stage 2 faults if it believes : the fault can be fixed by relaxing permissions on a PTE (write unprotect : for dirty logging). Otherwise, stage 2 faults grab the write lock, which : predictably can pile up all the vCPUs in a sufficiently large VM. : : Like the TDP MMU for x86, this series loosens the locking around : manipulations of the stage 2 page tables to allow parallel faults. RCU : and atomics are exploited to safely build/destroy the stage 2 page : tables in light of multiple software observers." : . KVM: arm64: Reject shared table walks in the hyp code KVM: arm64: Don't acquire RCU read lock for exclusive table walks KVM: arm64: Take a pointer to walker data in kvm_dereference_pteref() KVM: arm64: Handle stage-2 faults in parallel KVM: arm64: Make table->block changes parallel-aware KVM: arm64: Make leaf->leaf PTE changes parallel-aware KVM: arm64: Make block->table PTE changes parallel-aware KVM: arm64: Split init and set for table PTE KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks KVM: arm64: Protect stage-2 traversal with RCU KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make KVM: arm64: Use an opaque type for pteps KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data KVM: arm64: Pass mm_ops through the visitor context KVM: arm64: Stash observed pte value in visitor context KVM: arm64: Combine visitor arguments into a context structure Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05Merge branch kvm-arm64/dirty-ring into kvmarm-master/nextMarc Zyngier1-0/+1
* kvm-arm64/dirty-ring: : . : Add support for the "per-vcpu dirty-ring tracking with a bitmap : and sprinkles on top", courtesy of Gavin Shan. : : This branch drags the kvmarm-fixes-6.1-3 tag which was already : merged in 6.1-rc4 so that the branch is in a working state. : . KVM: Push dirty information unconditionally to backup bitmap KVM: selftests: Automate choosing dirty ring size in dirty_log_test KVM: selftests: Clear dirty ring states between two modes in dirty_log_test KVM: selftests: Use host page size to map ring buffer in dirty_log_test KVM: arm64: Enable ring-based dirty memory tracking KVM: Support dirty ring in conjunction with bitmap KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05Merge branch kvm-arm64/52bit-fixes into kvmarm-master/nextMarc Zyngier1-1/+5
* kvm-arm64/52bit-fixes: : . : 52bit PA fixes, courtesy of Ryan Roberts. From the cover letter: : : "I've been adding support for FEAT_LPA2 to KVM and as part of that work have been : testing various (84) configurations of HW, host and guest kernels on FVP. This : has thrown up a couple of pre-existing bugs, for which the fixes are provided." : . KVM: arm64: Fix benign bug with incorrect use of VA_BITS KVM: arm64: Fix PAR_TO_HPFAR() to work independently of PA_BITS. KVM: arm64: Fix kvm init failure when mode!=vhe and VA_BITS=52. Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05arm64:uprobe fix the uprobe SWBP_INSN in big-endianjunhua huang1-1/+1
We use uprobe in aarch64_be, which we found the tracee task would exit due to SIGILL when we enable the uprobe trace. We can see the replace inst from uprobe is not correct in aarch big-endian. As in Armv8-A, instruction fetches are always treated as little-endian, we should treat the UPROBE_SWBP_INSN as little-endian。 The test case is as following。 bash-4.4# ./mqueue_test_aarchbe 1 1 2 1 10 > /dev/null & bash-4.4# cd /sys/kernel/debug/tracing/ bash-4.4# echo 'p:test /mqueue_test_aarchbe:0xc30 %x0 %x1' > uprobe_events bash-4.4# echo 1 > events/uprobes/enable bash-4.4# bash-4.4# ps PID TTY TIME CMD 140 ? 00:00:01 bash 237 ? 00:00:00 ps [1]+ Illegal instruction ./mqueue_test_aarchbe 1 1 2 1 100 > /dev/null which we debug use gdb as following: bash-4.4# gdb attach 155 (gdb) disassemble send Dump of assembler code for function send: 0x0000000000400c30 <+0>: .inst 0xa00020d4 ; undefined 0x0000000000400c34 <+4>: mov x29, sp 0x0000000000400c38 <+8>: str w0, [sp, #28] 0x0000000000400c3c <+12>: strb w1, [sp, #27] 0x0000000000400c40 <+16>: str xzr, [sp, #40] 0x0000000000400c44 <+20>: str xzr, [sp, #48] 0x0000000000400c48 <+24>: add x0, sp, #0x1b 0x0000000000400c4c <+28>: mov w3, #0x0 // #0 0x0000000000400c50 <+32>: mov x2, #0x1 // #1 0x0000000000400c54 <+36>: mov x1, x0 0x0000000000400c58 <+40>: ldr w0, [sp, #28] 0x0000000000400c5c <+44>: bl 0x405e10 <mq_send> 0x0000000000400c60 <+48>: str w0, [sp, #60] 0x0000000000400c64 <+52>: ldr w0, [sp, #60] 0x0000000000400c68 <+56>: ldp x29, x30, [sp], #64 0x0000000000400c6c <+60>: ret End of assembler dump. (gdb) info b No breakpoints or watchpoints. (gdb) c Continuing. Program received signal SIGILL, Illegal instruction. 0x0000000000400c30 in send () (gdb) x/10x 0x400c30 0x400c30 <send>: 0xd42000a0 0xfd030091 0xe01f00b9 0xe16f0039 0x400c40 <send+16>: 0xff1700f9 0xff1b00f9 0xe06f0091 0x03008052 0x400c50 <send+32>: 0x220080d2 0xe10300aa (gdb) disassemble 0x400c30 Dump of assembler code for function send: => 0x0000000000400c30 <+0>: .inst 0xa00020d4 ; undefined 0x0000000000400c34 <+4>: mov x29, sp 0x0000000000400c38 <+8>: str w0, [sp, #28] 0x0000000000400c3c <+12>: strb w1, [sp, #27] 0x0000000000400c40 <+16>: str xzr, [sp, #40] Signed-off-by: junhua huang <huang.junhua@zte.com.cn> Link: https://lore.kernel.org/r/202212021511106844809@zte.com.cn Signed-off-by: Will Deacon <will@kernel.org>
2022-12-01Merge tag 'efi-fixes-for-v6.1-4' of ↵Linus Torvalds1-8/+0
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fix from Ard Biesheuvel: "A single revert for some code that I added during this cycle. The code is not wrong, but it should be a bit more careful about how to handle the shadow call stack pointer, so it is better to revert it for now and bring it back later in improved form. Summary: - Revert runtime service sync exception recovery on arm64" * tag 'efi-fixes-for-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: arm64: efi: Revert "Recover from synchronous exceptions ..."
2022-12-01arm64/sysreg: Remove duplicate definitions from asm/sysreg.hWill Deacon1-6/+0
With the new-fangled generation of asm/sysreg-defs.h, some definitions have ended up being duplicated between the two files. Remove these duplicate definitions, and consolidate the naming for GMID_EL1_BS_WIDTH. Signed-off-by: Will Deacon <will@kernel.org>
2022-12-01arm64/sysreg: Convert ID_DFR1_EL1 to automatic generationJames Morse1-2/+0
Convert ID_DFR1_EL1 to be automatically generated as per DDI0487I.a, no functional changes. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20221130171637.718182-39-james.morse@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-12-01arm64/sysreg: Convert ID_DFR0_EL1 to automatic generationJames Morse1-14/+0
Convert ID_DFR0_EL1 to be automatically generated as per DDI0487I.a, no functional changes. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20221130171637.718182-38-james.morse@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-12-01arm64/sysreg: Convert ID_AFR0_EL1 to automatic generationJames Morse1-1/+0
Convert ID_AFR0_EL1 to be automatically generated as per DDI0487I.a, no functional changes. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20221130171637.718182-37-james.morse@arm.com Signed-off-by: Will Deacon <will@kernel.org>