Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull KVM fixes from Radim Krčmář:
"ARM:
- A number of issues in the vgic discovered using SMATCH
- A bit one-off calculation in out stage base address mask (32-bit
and 64-bit)
- Fixes to single-step debugging instructions that trap for other
reasons such as MMMIO aborts
- Printing unavailable hyp mode as error
- Potential spinlock deadlock in the vgic
- Avoid calling vgic vcpu free more than once
- Broken bit calculation for big endian systems
s390:
- SPDX tags
- Fence storage key accesses from problem state
- Make sure that irq_state.flags is not used in the future
x86:
- Intercept port 0x80 accesses to prevent host instability (CVE)
- Use userspace FPU context for guest FPU (mainly an optimization
that fixes a double use of kernel FPU)
- Do not leak one page per module load
- Flush APIC page address cache from MMU invalidation notifiers"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
KVM: x86: fix APIC page invalidation
KVM: s390: Fix skey emulation permission check
KVM: s390: mark irq_state.flags as non-usable
KVM: s390: Remove redundant license text
KVM: s390: add SPDX identifiers to the remaining files
KVM: VMX: fix page leak in hardware_setup()
KVM: VMX: remove I/O port 0x80 bypass on Intel hosts
x86,kvm: remove KVM emulator get_fpu / put_fpu
x86,kvm: move qemu/guest FPU switching out to vcpu_run
KVM: arm/arm64: Fix broken GICH_ELRSR big endian conversion
KVM: arm/arm64: kvm_arch_destroy_vm cleanups
KVM: arm/arm64: Fix spinlock acquisition in vgic_set_owner
kvm: arm: don't treat unavailable HYP mode as an error
KVM: arm/arm64: Avoid attempting to load timer vgic state without a vgic
kvm: arm64: handle single-step of hyp emulated mmio instructions
kvm: arm64: handle single-step during SError exceptions
kvm: arm64: handle single-step of userspace mmio instructions
kvm: arm64: handle single-stepping trapped instructions
KVM: arm/arm64: debug: Introduce helper for single-step
arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one
...
|
|
Pull networking fixes from David Miller:
1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb
drivers).
2) Revert returning -EEXIST from __dev_alloc_name() as this propagates
to userspace and broke some apps. From Johannes Berg.
3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong
Wang.
4) Gianfar MAC can't do EEE so don't advertise it by default, from
Claudiu Manoil.
5) Relax strict netlink attribute validation, but emit a warning. From
David Ahern.
6) Fix regression in checksum offload of thunderx driver, from Florian
Westphal.
7) Fix UAPI bpf issues on s390, from Hendrik Brueckner.
8) New card support in iwlwifi, from Ihab Zhaika.
9) BBR congestion control bug fixes from Neal Cardwell.
10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren.
11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan.
12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni.
13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
net: mvpp2: fix the RSS table entry offset
tcp: evaluate packet losses upon RTT change
tcp: fix off-by-one bug in RACK
tcp: always evaluate losses in RACK upon undo
tcp: correctly test congestion state in RACK
bnxt_en: Fix sources of spurious netpoll warnings
tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
tcp_bbr: reset full pipe detection on loss recovery undo
tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
sfc: pass valid pointers from efx_enqueue_unwind
gianfar: Disable EEE autoneg by default
tcp: invalidate rate samples during SACK reneging
can: peak/pcie_fd: fix potential bug in restarting tx queue
can: usb_8dev: cancel urb on -EPIPE and -EPROTO
can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
can: esd_usb2: cancel urb on -EPIPE and -EPROTO
can: ems_usb: cancel urb on -EPIPE and -EPROTO
can: mcba_usb: cancel urb on -EPROTO
usbnet: fix alignment for frames with no ethernet header
tcp: use current time in tcp_rcv_space_adjust()
...
|
|
enter_lazy_tlb is called when a kernel thread rides on the back of
another mm, due to a context switch or an explicit call to unuse_mm
where a call to switch_mm is elided.
In these cases, it's important to keep the saved ttbr value up to date
with the active mm, otherwise we can end up with a stale value which
points to a potentially freed page table.
This patch implements enter_lazy_tlb for arm64, so that the saved ttbr0
is kept up-to-date with the active mm for kernel threads.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: <stable@vger.kernel.org>
Fixes: 39bc88e5e38e9b21 ("arm64: Disable TTBR0_EL1 during normal kernel execution")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
update_saved_ttbr0 mandates that mm->pgd is not swapper, since swapper
contains kernel mappings and should never be installed into ttbr0. However,
this means that callers must avoid passing the init_mm to update_saved_ttbr0
which in turn can cause the saved ttbr0 value to be out-of-date in the context
of the idle thread. For example, EFI runtime services may leave the saved ttbr0
pointing at the EFI page table, and kernel threads may end up with stale
references to freed page tables.
This patch changes update_saved_ttbr0 so that the init_mm points the saved
ttbr0 value to the empty zero page, which always exists and never contains
valid translations. EFI and switch can then call into update_saved_ttbr0
unconditionally.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: <stable@vger.kernel.org>
Fixes: 39bc88e5e38e9b21 ("arm64: Disable TTBR0_EL1 during normal kernel execution")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/ARM Fixes for v4.15.
Fixes:
- A number of issues in the vgic discovered using SMATCH
- A bit one-off calculation in out stage base address mask (32-bit and
64-bit)
- Fixes to single-step debugging instructions that trap for other
reasons such as MMMIO aborts
- Printing unavailable hyp mode as error
- Potential spinlock deadlock in the vgic
- Avoid calling vgic vcpu free more than once
- Broken bit calculation for big endian systems
|
|
Correct the broken uapi for the BPF_PROG_TYPE_PERF_EVENT program type
by exporting the user_pt_regs structure instead of the pt_regs structure
that is in-kernel only.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The critical one here is a fix for fpsimd register corruption across
signals which was introduced by the SVE support code (the register
files overlap), but the others are worth having as well.
Summary:
- Fix FP register corruption when SVE is not available or in use
- Fix out-of-tree module build failure when CONFIG_ARM64_MODULE_PLTS=y
- Missing 'const' generating errors with LTO builds
- Remove unsupported events from Cortex-A73 PMU description
- Removal of stale and incorrect comments"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: context: Fix comments and remove pointless smp_wmb()
arm64: cpu_ops: Add missing 'const' qualifiers
arm64: perf: remove unsupported events for Cortex-A73
arm64: fpsimd: Fix failure to restore FPSIMD state after signals
arm64: pgd: Mark pgd_cache as __ro_after_init
arm64: ftrace: emit ftrace-mod.o contents through code
arm64: module-plts: factor out PLT generation code for ftrace
arm64: mm: cleanup stale AIVIVT references
|
|
When building the arm64 kernel with both CONFIG_ARM64_MODULE_PLTS and
CONFIG_DYNAMIC_FTRACE enabled, the ftrace-mod.o object file is built
with the kernel and contains a trampoline that is linked into each
module, so that modules can be loaded far away from the kernel and
still reach the ftrace entry point in the core kernel with an ordinary
relative branch, as is emitted by the compiler instrumentation code
dynamic ftrace relies on.
In order to be able to build out of tree modules, this object file
needs to be included into the linux-headers or linux-devel packages,
which is undesirable, as it makes arm64 a special case (although a
precedent does exist for 32-bit PPC).
Given that the trampoline essentially consists of a PLT entry, let's
not bother with a source or object file for it, and simply patch it
in whenever the trampoline is being populated, using the existing
PLT support routines.
Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
To allow the ftrace trampoline code to reuse the PLT entry routines,
factor it out and move it into asm/module.h.
Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
In response to compile breakage introduced by a series that added the
pud_write helper to x86, Stephen notes:
did you consider using the other paradigm:
In arch include files:
#define pud_write pud_write
static inline int pud_write(pud_t pud)
.....
Then in include/asm-generic/pgtable.h:
#ifndef pud_write
tatic inline int pud_write(pud_t pud)
{
....
}
#endif
If you had, then the powerpc code would have worked ... ;-) and many
of the other interfaces in include/asm-generic/pgtable.h are
protected that way ...
Given that some architecture already define pmd_write() as a macro, it's
a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE.
Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After emulating instructions we may want return to user-space to handle
single-step debugging. Introduce a helper function, which, if
single-step is enabled, sets the run structure for return and returns
true.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
VTTBR_BADDR_MASK is used to sanity check the size and alignment of the
VTTBR address. It seems to currently be off by one, thereby only
allowing up to 47-bit addresses (instead of 48-bit) and also
insufficiently checking the alignment. This patch fixes it.
As an example, with 4k pages, before this patch we have:
PHYS_MASK_SHIFT = 48
VTTBR_X = 37 - 24 = 13
VTTBR_BADDR_SHIFT = 13 - 1 = 12
VTTBR_BADDR_MASK = ((1 << 35) - 1) << 12 = 0x00007ffffffff000
Which is wrong, because the mask doesn't allow bit 47 of the VTTBR
address to be set, and only requires the address to be 12-bit (4k)
aligned, while it actually needs to be 13-bit (8k) aligned because we
concatenate two 4k tables.
With this patch, the mask becomes 0x0000ffffffffe000, which is what we
want.
Fixes: 0369f6a34b9f ("arm64: KVM: EL2 register definitions")
Cc: <stable@vger.kernel.org> # 3.11.x
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Since commit:
155433cb365ee466 ("arm64: cache: Remove support for ASID-tagged VIVT I-caches")
... the kernel no longer cares about AIVIVT I-caches, as these were
removed from the architecture.
This patch removes the stale references to such I-caches.
The comment in flush_context() is also updated to clarify when and where
the TLB invalidation occurs.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking update from Jeff Layton:
"A couple of fixes for a patch that went into v4.14, and the bug report
just came in a few days ago.. It passes my (minimal) testing, and has
been in linux-next for a few days now.
I also would like to get my address changed in MAINTAINERS to clear
that hurdle"
* tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall
fcntl: don't leak fd reference when fixup_compat_flock fails
MAINTAINERS: s/jlayton@poochiereds.net/jlayton@kernel.org/
|
|
Pull KVM updates from Radim Krčmář:
"First batch of KVM changes for 4.15
Common:
- Python 3 support in kvm_stat
- Accounting of slabs to kmemcg
ARM:
- Optimized arch timer handling for KVM/ARM
- Improvements to the VGIC ITS code and introduction of an ITS reset
ioctl
- Unification of the 32-bit fault injection logic
- More exact external abort matching logic
PPC:
- Support for running hashed page table (HPT) MMU mode on a host that
is using the radix MMU mode; single threaded mode on POWER 9 is
added as a pre-requisite
- Resolution of merge conflicts with the last second 4.14 HPT fixes
- Fixes and cleanups
s390:
- Some initial preparation patches for exitless interrupts and crypto
- New capability for AIS migration
- Fixes
x86:
- Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs,
and after-reset state
- Refined dependencies for VMX features
- Fixes for nested SMI injection
- A lot of cleanups"
* tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (89 commits)
KVM: s390: provide a capability for AIS state migration
KVM: s390: clear_io_irq() requests are not expected for adapter interrupts
KVM: s390: abstract conversion between isc and enum irq_types
KVM: s390: vsie: use common code functions for pinning
KVM: s390: SIE considerations for AP Queue virtualization
KVM: s390: document memory ordering for kvm_s390_vcpu_wakeup
KVM: PPC: Book3S HV: Cosmetic post-merge cleanups
KVM: arm/arm64: fix the incompatible matching for external abort
KVM: arm/arm64: Unify 32bit fault injection
KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET
KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET
KVM: arm/arm64: vgic-its: Free caches when GITS_BASER Valid bit is cleared
KVM: arm/arm64: vgic-its: New helper functions to free the caches
KVM: arm/arm64: vgic-its: Remove kvm_its_unmap_device
arm/arm64: KVM: Load the timer state when enabling the timer
KVM: arm/arm64: Rework kvm_timer_should_fire
KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
KVM: arm/arm64: Move phys_timer_emulate function
KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
...
|
|
Convert all allocations that used a NOTRACK flag to stop using it.
Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The big highlight is support for the Scalable Vector Extension (SVE)
which required extensive ABI work to ensure we don't break existing
applications by blowing away their signal stack with the rather large
new vector context (<= 2 kbit per vector register). There's further
work to be done optimising things like exception return, but the ABI
is solid now.
Much of the line count comes from some new PMU drivers we have, but
they're pretty self-contained and I suspect we'll have more of them in
future.
Plenty of acronym soup here:
- initial support for the Scalable Vector Extension (SVE)
- improved handling for SError interrupts (required to handle RAS
events)
- enable GCC support for 128-bit integer types
- remove kernel text addresses from backtraces and register dumps
- use of WFE to implement long delay()s
- ACPI IORT updates from Lorenzo Pieralisi
- perf PMU driver for the Statistical Profiling Extension (SPE)
- perf PMU driver for Hisilicon's system PMUs
- misc cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (97 commits)
arm64: Make ARMV8_DEPRECATED depend on SYSCTL
arm64: Implement __lshrti3 library function
arm64: support __int128 on gcc 5+
arm64/sve: Add documentation
arm64/sve: Detect SVE and activate runtime support
arm64/sve: KVM: Hide SVE from CPU features exposed to guests
arm64/sve: KVM: Treat guest SVE use as undefined instruction execution
arm64/sve: KVM: Prevent guests from using SVE
arm64/sve: Add sysctl to set the default vector length for new processes
arm64/sve: Add prctl controls for userspace vector length management
arm64/sve: ptrace and ELF coredump support
arm64/sve: Preserve SVE registers around EFI runtime service calls
arm64/sve: Preserve SVE registers around kernel-mode NEON use
arm64/sve: Probe SVE capabilities and usable vector lengths
arm64: cpufeature: Move sys_caps_initialised declarations
arm64/sve: Backend logic for setting the vector length
arm64/sve: Signal handling support
arm64/sve: Support vector length resetting for new processes
arm64/sve: Core task context handling
arm64/sve: Low-level CPU setup
...
|
|
Currently, we're capping the values too low in the F_GETLK64 case. The
fields in that structure are 64-bit values, so we shouldn't need to do
any sort of fixup there.
Make sure we check that assumption at build time in the future however
by ensuring that the sizes we're copying will fit.
With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it.
Fixes: 94073ad77fff2 (fs/locks: don't mess with the address limit in compat_fcntl64)
Reported-by: Vitaly Lipatov <lav@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These update ACPICA to upstream revision 20170831, fix APEI to use the
fixmap instead of ioremap_page_range(), add an operation region driver
for TI PMIC TPS68470, add support for PCC subspace IDs to the ACPI
CPPC driver, fix a few assorted issues and clean up some code.
Specifics:
- Update the ACPICA code to upstream revision 20170831 including
* PDTT table header support (Bob Moore).
* Cleanup and extension of internal string-to-integer conversion
functions (Bob Moore).
* Support for 64-bit hardware accesses (Lv Zheng).
* ACPI PM Timer code adjustment to deal with 64-bit return values
of acpi_hw_read() (Bob Moore).
* Support for deferred table verification in acpiexec (Lv Zheng).
- Fix APEI to use the fixmap instead of ioremap_page_range() which
cannot work correctly the way the code in there attempted to use it
and drop some code that's not necessary any more after that change
(James Morse).
- Clean up the APEI support code and make it use 64-bit timestamps
(Arnd Bergmann, Dongjiu Geng, Jan Beulich).
- Add operation region driver for TI PMIC TPS68470 (Rajmohan Mani).
- Add support for PCC subspace IDs to the ACPI CPPC driver (George
Cherian).
- Fix an ACPI EC driver regression related to the handling of EC
events during the "noirq" phases of system suspend/resume (Lv
Zheng).
- Delay the initialization of the lid state in the ACPI button driver
to fix issues appearing on some systems (Hans de Goede).
- Extend the KIOX000A "device always present" quirk to cover all
affected BIOS versions (Hans de Goede).
- Clean up some code in the ACPI core and drivers (Colin Ian King,
Gustavo Silva)"
* tag 'acpi-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits)
ACPI: Mark expected switch fall-throughs
ACPI / LPSS: Remove redundant initialization of clk
ACPI / CPPC: Make CPPC ACPI driver aware of PCC subspace IDs
mailbox: PCC: Move the MAX_PCC_SUBSPACES definition to header file
ACPI / sysfs: Make function param_set_trace_method_name() static
ACPI / button: Delay acpi_lid_initialize_state() until first user space open
ACPI / EC: Fix regression related to triggering source of EC event handling
APEI / ERST: use 64-bit timestamps
ACPI / APEI: Remove arch_apei_flush_tlb_one()
arm64: mm: Remove arch_apei_flush_tlb_one()
ACPI / APEI: Remove ghes_ioremap_area
ACPI / APEI: Replace ioremap_page_range() with fixmap
ACPI / APEI: remove the unused dead-code for SEA/NMI notification type
ACPI / x86: Extend KIOX000A quirk to cover all affected BIOS versions
ACPI / APEI: adjust a local variable type in ghes_ioremap_pfn_irq()
ACPICA: Update version to 20170831
ACPICA: Update acpi_get_timer for 64-bit interface to acpi_hw_read
ACPICA: String conversions: Update to add new behaviors
ACPICA: String conversions: Cleanup/format comments. No functional changes
ACPICA: Restructure/cleanup all string-to-integer conversion functions
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"There are no real big ticket items here this time.
The most noticeable change is probably the relocation of the OPP
(Operating Performance Points) framework to its own directory under
drivers/ as it has grown big enough for that. Also Viresh is now going
to maintain it and send pull requests for it to me, so you will see
this change in the git history going forward (but still not right
now).
Another noticeable set of changes is the modifications of the PM core,
the PCI subsystem and the ACPI PM domain to allow of more integration
between system-wide suspend/resume and runtime PM. For now it's just a
way to avoid resuming devices from runtime suspend unnecessarily
during system suspend (if the driver sets a flag to indicate its
readiness for that) and in the works is an analogous mechanism to
allow devices to stay suspended after system resume.
In addition to that, we have some changes related to supporting
frequency-invariant CPU utilization metrics in the scheduler and in
the schedutil cpufreq governor on ARM and changes to add support for
device performance states to the generic power domains (genpd)
framework.
The rest is mostly fixes and cleanups of various sorts.
Specifics:
- Relocate the OPP (Operating Performance Points) framework to its
own directory under drivers/ and add support for power domain
performance states to it (Viresh Kumar).
- Modify the PM core, the PCI bus type and the ACPI PM domain to
support power management driver flags allowing device drivers to
specify their capabilities and preferences regarding the handling
of devices with enabled runtime PM during system suspend/resume and
clean up that code somewhat (Rafael Wysocki, Ulf Hansson).
- Add frequency-invariant accounting support to the task scheduler on
ARM and ARM64 (Dietmar Eggemann).
- Fix PM QoS device resume latency framework to prevent "no
restriction" requests from overriding requests with specific
requirements and drop the confusing PM_QOS_FLAG_REMOTE_WAKEUP
device PM QoS flag (Rafael Wysocki).
- Drop legacy class suspend/resume operations from the PM core and
drop legacy bus type suspend and resume callbacks from ARM/locomo
(Rafael Wysocki).
- Add min/max frequency support to devfreq and clean it up somewhat
(Chanwoo Choi).
- Rework wakeup support in the generic power domains (genpd)
framework and update some of its users accordingly (Geert
Uytterhoeven).
- Convert timers in the PM core to use timer_setup() (Kees Cook).
- Add support for exposing the SLP_S0 (Low Power S0 Idle) residency
counter based on the LPIT ACPI table on Intel platforms (Srinivas
Pandruvada).
- Add per-CPU PM QoS resume latency support to the ladder cpuidle
governor (Ramesh Thomas).
- Fix a deadlock between the wakeup notify handler and the notifier
removal in the ACPI core (Ville Syrjälä).
- Fix a cpufreq schedutil governor issue causing it to use stale
cached frequency values sometimes (Viresh Kumar).
- Fix an issue in the system suspend core support code causing wakeup
events detection to fail in some cases (Rajat Jain).
- Fix the generic power domains (genpd) framework to prevent the PM
core from using the direct-complete optimization with it as that is
guaranteed to fail (Ulf Hansson).
- Fix a minor issue in the cpuidle core and clean it up a bit (Gaurav
Jindal, Nicholas Piggin).
- Fix and clean up the intel_idle and ARM cpuidle drivers (Jason
Baron, Len Brown, Leo Yan).
- Fix a couple of minor issues in the OPP framework and clean it up
(Arvind Yadav, Fabio Estevam, Sudeep Holla, Tobias Jordan).
- Fix and clean up some cpufreq drivers and fix a minor issue in the
cpufreq statistics code (Arvind Yadav, Bhumika Goyal, Fabio
Estevam, Gautham Shenoy, Gustavo Silva, Marek Szyprowski, Masahiro
Yamada, Robert Jarzmik, Zumeng Chen).
- Fix minor issues in the system suspend and hibernation core, in
power management documentation and in the AVS (Adaptive Voltage
Scaling) framework (Helge Deller, Himanshu Jha, Joe Perches, Rafael
Wysocki).
- Fix some issues in the cpupower utility and document that Shuah
Khan is going to maintain it going forward (Prarit Bhargava, Shuah
Khan)"
* tag 'pm-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (88 commits)
tools/power/cpupower: add libcpupower.so.0.0.1 to .gitignore
tools/power/cpupower: Add 64 bit library detection
intel_idle: Graceful probe failure when MWAIT is disabled
cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq
freezer: Fix typo in freezable_schedule_timeout() comment
PM / s2idle: Clear the events_check_enabled flag
cpufreq: stats: Handle the case when trans_table goes beyond PAGE_SIZE
cpufreq: arm_big_little: make cpufreq_arm_bL_ops structures const
cpufreq: arm_big_little: make function arguments and structure pointer const
cpuidle: Avoid assignment in if () argument
cpuidle: Clean up cpuidle_enable_device() error handling a bit
ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock
PM / Domains: Fix genpd to deal with drivers returning 1 from ->prepare()
cpuidle: ladder: Add per CPU PM QoS resume latency support
PM / QoS: Fix device resume latency framework
PM / domains: Rework governor code to be more consistent
PM / Domains: Remove gpd_dev_ops.active_wakeup() callback
soc: rockchip: power-domain: Use GENPD_FLAG_ACTIVE_WAKEUP
soc: mediatek: Use GENPD_FLAG_ACTIVE_WAKEUP
ARM: shmobile: pm-rmobile: Use GENPD_FLAG_ACTIVE_WAKEUP
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq core updates from Thomas Gleixner:
"A rather large update for the interrupt core code and the irq chip drivers:
- Add a new bitmap matrix allocator and supporting changes, which is
used to replace the x86 vector allocator which comes with separate
pull request. This allows to replace the convoluted nested loop
allocation function in x86 with a facility which supports the
recently added property of managed interrupts proper and allows to
switch to a best effort vector reservation scheme, which addresses
problems with vector exhaustion.
- A large update to the ARM GIC-V3-ITS driver adding support for
range selectors.
- New interrupt controllers:
- Meson and Meson8 GPIO
- BCM7271 L2
- Socionext EXIU
If you expected that this will stop at some point, I have to
disappoint you. There are new ones posted already. Sigh!
- STM32 interrupt controller support for new platforms.
- A pile of fixes, cleanups and updates to the MIPS GIC driver
- The usual small fixes, cleanups and updates all over the place.
Most visible one is to move the irq chip drivers Kconfig switches
into a separate Kconfig menu"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
genirq: Fix type of shifting literal 1 in __setup_irq()
irqdomain: Drop pointless NULL check in virq_debug_show_one
genirq/proc: Return proper error code when irq_set_affinity() fails
irq/work: Use llist_for_each_entry_safe
irqchip: mips-gic: Print warning if inherited GIC base is used
irqchip/mips-gic: Add pr_fmt and reword pr_* messages
irqchip/stm32: Move the wakeup on interrupt mask
irqchip/stm32: Fix initial values
irqchip/stm32: Add stm32h7 support
dt-bindings/interrupt-controllers: Add compatible string for stm32h7
irqchip/stm32: Add multi-bank management
irqchip/stm32: Select GENERIC_IRQ_CHIP
irqchip/exiu: Add support for Socionext Synquacer EXIU controller
dt-bindings: Add description of Socionext EXIU interrupt controller
irqchip/gic-v3-its: Fix VPE activate callback return value
irqchip: mips-gic: Make IPI bitmaps static
irqchip: mips-gic: Share register writes in gic_set_type()
irqchip: mips-gic: Remove gic_vpes variable
irqchip: mips-gic: Use num_possible_cpus() to reserve IPIs
irqchip: mips-gic: Configure EIC when CPUs come online
...
|
|
* acpi-pmic:
ACPI / PMIC: Add TI PMIC TPS68470 operation region driver
* acpi-apei:
APEI / ERST: use 64-bit timestamps
ACPI / APEI: Remove arch_apei_flush_tlb_one()
arm64: mm: Remove arch_apei_flush_tlb_one()
ACPI / APEI: Remove ghes_ioremap_area
ACPI / APEI: Replace ioremap_page_range() with fixmap
ACPI / APEI: remove the unused dead-code for SEA/NMI notification type
ACPI / APEI: adjust a local variable type in ghes_ioremap_pfn_irq()
* acpi-x86:
ACPI / x86: Extend KIOX000A quirk to cover all affected BIOS versions
|
|
* pm-cpufreq: (22 commits)
cpufreq: stats: Handle the case when trans_table goes beyond PAGE_SIZE
cpufreq: arm_big_little: make cpufreq_arm_bL_ops structures const
cpufreq: arm_big_little: make function arguments and structure pointer const
cpufreq: pxa: convert to clock API
cpufreq: speedstep-lib: mark expected switch fall-through
cpufreq: ti-cpufreq: add missing of_node_put()
cpufreq: dt: Remove support for Exynos4212 SoCs
cpufreq: imx6q: Move speed grading check to cpufreq driver
cpufreq: ti-cpufreq: kfree opp_data when failure
cpufreq: SPEAr: pr_err() strings should end with newlines
cpufreq: powernow-k8: pr_err() strings should end with newlines
cpufreq: dt-platdev: drop socionext,uniphier-ld6b from whitelist
arm64: wire cpu-invariant accounting support up to the task scheduler
arm64: wire frequency-invariant accounting support up to the task scheduler
arm: wire cpu-invariant accounting support up to the task scheduler
arm: wire frequency-invariant accounting support up to the task scheduler
drivers base/arch_topology: allow inlining cpu-invariant accounting support
drivers base/arch_topology: provide frequency-invariant accounting support
cpufreq: dt: invoke frequency-invariance setter function
cpufreq: arm_big_little: invoke frequency-invariance setter function
...
|
|
Nothing calls arch_apei_flush_tlb_one() anymore, instead relying on
__set_fixmap() to do the invalidation. Remove it.
Move the IPI-considered-harmful comment to __set_fixmap().
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: All applicable <stable@vger.kernel.org>
|
|
Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range()
with __set_fixmap() as ioremap_page_range() may sleep to allocate a new
level of page-table, even if its passed an existing final-address to
use in the mapping.
The GHES driver can only be enabled for architectures that select
HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64.
clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64
and __set_pte_vaddr() for x86. In each case its the same as the
respective arch_apei_flush_tlb_one().
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Tested-by: Toshi Kani <toshi.kani@hpe.com>
[ For the arm64 bits: ]
Acked-by: Will Deacon <will.deacon@arm.com>
[ For the x86 bits: ]
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: All applicable <stable@vger.kernel.org>
|
|
Conflicts:
include/linux/compiler-clang.h
include/linux/compiler-gcc.h
include/linux/compiler-intel.h
include/uapi/linux/stddef.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
kvm_vcpu_dabt_isextabt() tries to match a full fault syndrome, but
calls kvm_vcpu_trap_get_fault_type() that only returns the fault class,
thus reducing the scope of the check. This doesn't cause any observable
bug yet as we end-up matching a closely related syndrome for which we
return the same value.
Using kvm_vcpu_trap_get_fault() instead fixes it for good.
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Both arm and arm64 implementations are capable of injecting
faults, and yet have completely divergent implementations,
leading to different bugs and reduced maintainability.
Let's elect the arm64 version as the canonical one
and move it into aarch32.c, which is common to both
architectures.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
As we are about to be lazy with saving and restoring the timer
registers, we prepare by moving all possible timer configuration logic
out of the hyp code. All virtual timer registers can be programmed from
EL1 and since the arch timer is always a level triggered interrupt we
can safely do this with interrupts disabled in the host kernel on the
way to the guest without taking vtimer interrupts in the host kernel
(yet).
The downside is that the cntvoff register can only be programmed from
hyp mode, so we jump into hyp mode and back to program it. This is also
safe, because the host kernel doesn't use the virtual timer in the KVM
code. It may add a little performance performance penalty, but only
until following commits where we move this operation to vcpu load/put.
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Using the physical counter allows KVM to retain the offset between the
virtual and physical counter as long as it is actively running a VCPU.
As soon as a VCPU is released, another thread is scheduled or we start
running userspace applications, we reset the offset to 0, so that
userspace accessing the virtual timer can still read the virtual counter
and get the same view of time as the kernel.
This opens up potential improvements for KVM performance, but we have to
make a few adjustments to preserve system consistency.
Currently get_cycles() is hardwired to arch_counter_get_cntvct() on
arm64, but as we move to using the physical timer for the in-kernel
time-keeping on systems that boot in EL2, we should use the same counter
for get_cycles() as for other in-kernel timekeeping operations.
Similarly, implementations of arch_timer_set_next_event_phys() is
modified to use the counter specific to the timer being programmed.
VHE kernels or kernels continuing to use the virtual timer are
unaffected.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
|
|
As we are about to use the physical counter on arm64 systems that have
KVM support, implement arch_counter_get_cntpct() and the associated
errata workaround functionality for stable timer reads.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
This patch enables detection of hardware SVE support via the
cpufeatures framework, and reports its presence to the kernel and
userspace via the new ARM64_SVE cpucap and HWCAP_SVE hwcap
respectively.
Userspace can also detect SVE using ID_AA64PFR0_EL1, using the
cpufeatures MRS emulation.
When running on hardware that supports SVE, this enables runtime
kernel support for SVE, and allows user tasks to execute SVE
instructions and make of the of the SVE-specific user/kernel
interface extensions implemented by this series.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Until KVM has full SVE support, guests must not be allowed to
execute SVE instructions.
This patch enables the necessary traps, and also ensures that the
traps are disabled again on exit from the guest so that the host
can still use SVE if it wants to.
On guest exit, high bits of the SVE Zn registers may have been
clobbered as a side-effect the execution of FPSIMD instructions in
the guest. The existing KVM host FPSIMD restore code is not
sufficient to restore these bits, so this patch explicitly marks
the CPU as not containing cached vector state for any task, thus
forcing a reload on the next return to userspace. This is an
interim measure, in advance of adding full SVE awareness to KVM.
This marking of cached vector state in the CPU as invalid is done
using __this_cpu_write(fpsimd_last_state, NULL) in fpsimd.c. Due
to the repeated use of this rather obscure operation, it makes
sense to factor it out as a separate helper with a clearer name.
This patch factors it out as fpsimd_flush_cpu_state(), and ports
all callers to use it.
As a side effect of this refactoring, a this_cpu_write() in
fpsimd_cpu_pm_notifier() is changed to __this_cpu_write(). This
should be fine, since cpu_pm_enter() is supposed to be called only
with interrupts disabled.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch adds two arm64-specific prctls, to permit userspace to
control its vector length:
* PR_SVE_SET_VL: set the thread's SVE vector length and vector
length inheritance mode.
* PR_SVE_GET_VL: get the same information.
Although these prctls resemble instruction set features in the SVE
architecture, they provide additional control: the vector length
inheritance mode is Linux-specific and nothing to do with the
architecture, and the architecture does not permit EL0 to set its
own vector length directly. Both can be used in portable tools
without requiring the use of SVE instructions.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: Fixed up prctl constants to avoid clash with PDEATHSIG]
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch defines and implements a new regset NT_ARM_SVE, which
describes a thread's SVE register state. This allows a debugger to
manipulate the SVE state, as well as being included in ELF
coredumps for post-mortem debugging.
Because the regset size and layout are dependent on the thread's
current vector length, it is not possible to define a C struct to
describe the regset contents as is done for existing regsets.
Instead, and for the same reasons, NT_ARM_SVE is based on the
freeform variable-layout approach used for the SVE signal frame.
Additionally, to reduce debug overhead when debugging threads that
might or might not have live SVE register state, NT_ARM_SVE may be
presented in one of two different formats: the old struct
user_fpsimd_state format is embedded for describing the state of a
thread with no live SVE state, whereas a new variable-layout
structure is embedded for describing live SVE state. This avoids a
debugger needing to poll NT_PRFPREG in addition to NT_ARM_SVE, and
allows existing userspace code to handle the non-SVE case without
too much modification.
For this to work, NT_ARM_SVE is defined with a fixed-format header
of type struct user_sve_header, which the recipient can use to
figure out the content, size and layout of the reset of the regset.
Accessor macros are defined to allow the vector-length-dependent
parts of the regset to be manipulated.
Signed-off-by: Alan Hayward <alan.hayward@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Okamoto Takayuki <tokamoto@jp.fujitsu.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch uses the cpufeatures framework to determine common SVE
capabilities and vector lengths, and configures the runtime SVE
support code appropriately.
ZCR_ELx is not really a feature register, but it is convenient to
use it as a template for recording the maximum vector length
supported by a CPU, using the LEN field. This field is similar to
a feature field in that it is a contiguous bitfield for which we
want to determine the minimum system-wide value. This patch adds
ZCR as a pseudo-register in cpuinfo/cpufeatures, with appropriate
custom code to populate it. Finding the minimum supported value of
the LEN field is left to the cpufeatures framework in the usual
way.
The meaning of ID_AA64ZFR0_EL1 is not architecturally defined yet,
so for now we just require it to be zero.
Note that much of this code is dormant and SVE still won't be used
yet, since system_supports_sve() remains hardwired to false.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch implements the core logic for changing a task's vector
length on request from userspace. This will be used by the ptrace
and prctl frontends that are implemented in later patches.
The SVE architecture permits, but does not require, implementations
to support vector lengths that are not a power of two. To handle
this, logic is added to check a requested vector length against a
possibly sparse bitmap of available vector lengths at runtime, so
that the best supported value can be chosen.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch implements support for saving and restoring the SVE
registers around signals.
A fixed-size header struct sve_context is always included in the
signal frame encoding the thread's vector length at the time of
signal delivery, optionally followed by a variable-layout structure
encoding the SVE registers.
Because of the need to preserve backwards compatibility, the FPSIMD
view of the SVE registers is always dumped as a struct
fpsimd_context in the usual way, in addition to any sve_context.
The SVE vector registers are dumped in full, including bits 127:0
of each register which alias the corresponding FPSIMD vector
registers in the hardware. To avoid any ambiguity about which
alias to restore during sigreturn, the kernel always restores bits
127:0 of each SVE vector register from the fpsimd_context in the
signal frame (which must be present): userspace needs to take this
into account if it wants to modify the SVE vector register contents
on return from a signal.
FPSR and FPCR, which are used by both FPSIMD and SVE, are not
included in sve_context because they are always present in
fpsimd_context anyway.
For signal delivery, a new helper
fpsimd_signal_preserve_current_state() is added to update _both_
the FPSIMD and SVE views in the task struct, to make it easier to
populate this information into the signal frame. Because of the
redundancy between the two views of the state, only one is updated
otherwise.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
It's desirable to be able to reset the vector length to some sane
default for new processes, since the new binary and its libraries
may or may not be SVE-aware.
This patch tracks the desired post-exec vector length (if any) in a
new thread member sve_vl_onexec, and adds a new thread flag
TIF_SVE_VL_INHERIT to control whether to inherit or reset the
vector length. Currently these are inactive. Subsequent patches
will provide the capability to configure them.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch adds the core support for switching and managing the SVE
architectural state of user tasks.
Calls to the existing FPSIMD low-level save/restore functions are
factored out as new functions task_fpsimd_{save,load}(), since SVE
now dynamically may or may not need to be handled at these points
depending on the kernel configuration, hardware features discovered
at boot, and the runtime state of the task. To make these
decisions as fast as possible, const cpucaps are used where
feasible, via the system_supports_sve() helper.
The SVE registers are only tracked for threads that have explicitly
used SVE, indicated by the new thread flag TIF_SVE. Otherwise, the
FPSIMD view of the architectural state is stored in
thread.fpsimd_state as usual.
When in use, the SVE registers are not stored directly in
thread_struct due to their potentially large and variable size.
Because the task_struct slab allocator must be configured very
early during kernel boot, it is also tricky to configure it
correctly to match the maximum vector length provided by the
hardware, since this depends on examining secondary CPUs as well as
the primary. Instead, a pointer sve_state in thread_struct points
to a dynamically allocated buffer containing the SVE register data,
and code is added to allocate and free this buffer at appropriate
times.
TIF_SVE is set when taking an SVE access trap from userspace, if
suitable hardware support has been detected. This enables SVE for
the thread: a subsequent return to userspace will disable the trap
accordingly. If such a trap is taken without sufficient system-
wide hardware support, SIGILL is sent to the thread instead as if
an undefined instruction had been executed: this may happen if
userspace tries to use SVE in a system where not all CPUs support
it for example.
The kernel will clear TIF_SVE and disable SVE for the thread
whenever an explicit syscall is made by userspace. For backwards
compatibility reasons and conformance with the spirit of the base
AArch64 procedure call standard, the subset of the SVE register
state that aliases the FPSIMD registers is still preserved across a
syscall even if this happens. The remainder of the SVE register
state logically becomes zero at syscall entry, though the actual
zeroing work is currently deferred until the thread next tries to
use SVE, causing another trap to the kernel. This implementation
is suboptimal: in the future, the fastpath case may be optimised
to zero the registers in-place and leave SVE enabled for the task,
where beneficial.
TIF_SVE is also cleared in the following slowpath cases, which are
taken as reasonable hints that the task may no longer use SVE:
* exec
* fork and clone
Code is added to sync data between thread.fpsimd_state and
thread.sve_state whenever enabling/disabling SVE, in a manner
consistent with the SVE architectural programmer's model.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: added #include to fix allnoconfig build]
[will: use enable_daif in do_sve_acc]
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch adds CONFIG_ARM64_SVE to control building of SVE support
into the kernel, and adds a stub predicate system_supports_sve() to
control conditional compilation and runtime SVE support.
system_supports_sve() just returns false for now: it will be
replaced with a non-trivial implementation in a later patch, once
SVE support is complete enough to be enabled safely.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Manipulating the SVE architectural state, including the vector and
predicate registers, first-fault register and the vector length,
requires the use of dedicated instructions added by SVE.
This patch adds suitable assembly functions for saving and
restoring the SVE registers and querying the vector length.
Setting of the vector length is done as part of register restore.
Since people building kernels may not all get an SVE-enabled
toolchain for a while, this patch uses macros that generate
explicit opcodes in place of assembler mnemonics.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The SVE architecture adds some system registers, ID register fields
and a dedicated ESR exception class.
This patch adds the appropriate definitions that will be needed by
the kernel.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently, a guest kernel sees the true CPU feature registers
(ID_*_EL1) when it reads them using MRS instructions. This means
that the guest may observe features that are present in the
hardware but the host doesn't understand or doesn't provide support
for. A guest may legimitately try to use such a feature as per the
architecture, but use of the feature may trap instead of working
normally, triggering undef injection into the guest.
This is not a problem for the host, but the guest may go wrong when
running on newer hardware than the host knows about.
This patch hides from guest VMs any AArch64-specific CPU features
that the host doesn't support, by exposing to the guest the
sanitised versions of the registers computed by the cpufeatures
framework, instead of the true hardware registers. To achieve
this, HCR_EL2.TID3 is now set for AArch64 guests, and emulation
code is added to KVM to report the sanitised versions of the
affected registers in response to MRS and register reads from
userspace.
The affected registers are removed from invariant_sys_regs[] (since
the invariant_sys_regs handling is no longer quite correct for
them) and added to sys_reg_desgs[], with appropriate access(),
get_user() and set_user() methods. No runtime vcpu storage is
allocated for the registers: instead, they are read on demand from
the cpufeatures framework. This may need modification in the
future if there is a need for userspace to customise the features
visible to the guest.
Attempts by userspace to write the registers are handled similarly
to the current invariant_sys_regs handling: writes are permitted,
but only if they don't attempt to change the value. This is
sufficient to support VM snapshot/restore from userspace.
Because of the additional registers, restoring a VM on an older
kernel may not work unless userspace knows how to handle the extra
VM registers exposed to the KVM user ABI by this patch.
Under the principle of least damage, this patch makes no attempt to
handle any of the other registers currently in
invariant_sys_regs[], or to emulate registers for AArch32: however,
these could be handled in a similar way in future, as necessary.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Following our 'dai' order, irqs should be processed with debug and
serror exceptions unmasked.
Add a helper to unmask these two, (and fiq for good measure).
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
el0_sync also unmasks exceptions on a case-by-case basis, debug exceptions
are enabled, unless this was a debug exception. Irqs are unmasked for
some exception types but not for others.
el0_dbg should run with everything masked to prevent us taking a debug
exception from do_debug_exception. For the other cases we can unmask
everything. This changes the behaviour of fpsimd_{acc,exc} and el0_inv
which previously ran with irqs masked.
This patch removed the last user of enable_dbg_and_irq, remove it.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
el1_sync unmasks exceptions on a case-by-case basis, debug exceptions
are unmasked, unless this was a debug exception. IRQs are unmasked
for instruction and data aborts only if the interupted context had
irqs unmasked.
Following our 'dai' order, el1_dbg should run with everything masked.
For the other cases we can inherit whatever we interrupted.
Add a macro inherit_daif to set daif based on the interrupted pstate.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
enable_step_tsk is the only user of disable_dbg, which doesn't respect
our 'dai' order for exception masking. enable_step_tsk may enable
single-step, so previously needed to mask debug exceptions to prevent us
from single-stepping kernel_exit. enable_step_tsk is called at the end
of the ret_to_user loop, which has already masked all exceptions so this
is no longer needed.
Remove disable_dbg, add a comment that enable_step_tsk's caller should
have masked debug.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Remove the local_{async,fiq}_{en,dis}able macros as they don't respect
our newly defined order and are only used to set the flags for process
context when we bring CPUs online.
Add a helper to do this. The IRQ flag varies as we want it masked on
the boot CPU until we are ready to handle interrupts.
The boot CPU unmasks SError during early boot once it can print an error
message. If we can print an error message about SError, we can do the
same for FIQ. Debug exceptions are already enabled by __cpu_setup(),
which has also configured MDSCR_EL1 to disable MDE and KDE.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently SError is always masked in the kernel. To support RAS exceptions
using SError on hardware with the v8.2 RAS Extensions we need to unmask
SError as much as possible.
Let's define an order for masking and unmasking exceptions. 'dai' is
memorable and effectively what we have today.
Disabling debug exceptions should cause all other exceptions to be masked.
Masking SError should mask irq, but not disable debug exceptions.
Masking irqs has no side effects for other flags. Keeping to this order
makes it easier for entry.S to know which exceptions should be unmasked.
FIQ is never expected, but we mask it when we mask debug exceptions, and
unmask it at all other times.
Given masking debug exceptions masks everything, we don't need macros
to save/restore that bit independently. Remove them and switch the last
caller over to use the daif calls.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|