Age | Commit message (Collapse) | Author | Files | Lines |
|
Meteor Lake is Intel's successor to Raptor lake. PPERF and SMI_COUNT MSRs
are also supported.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/20230104201349.1451191-7-kan.liang@linux.intel.com
|
|
Meteor Lake is Intel's successor to Raptor lake. From the perspective of
Intel cstate residency counters, there is nothing changed compared with
Raptor lake.
Share adl_cstates with Raptor lake.
Update the comments for Meteor Lake.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/20230104201349.1451191-6-kan.liang@linux.intel.com
|
|
Passing the host topology to the guest is almost certainly wrong
and will confuse the scheduler. In addition, several fields of
these CPUID leaves vary on each processor; it is simply impossible to
return the right values from KVM_GET_SUPPORTED_CPUID in such a way that
they can be passed to KVM_SET_CPUID2.
The values that will most likely prevent confusion are all zeroes.
Userspace will have to override it anyway if it wishes to present a
specific topology to the guest.
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The mysterious comment "We only want the cr8 intercept bits of L1"
dates back to basically the introduction of nested SVM, back when
the handling of "less typical" hypervisors was very haphazard.
With the development of kvm-unit-tests for interrupt handling,
the same code grew another vmcb_clr_intercept for the interrupt
window (VINTR) vmexit, this time with a comment that is at least
decent.
It turns out however that the same comment applies to the CR8 write
intercept, which is also a "recheck if an interrupt should be
injected" intercept. The CR8 read intercept instead has not
been used by KVM for 14 years (commit 649d68643ebf, "KVM: SVM:
sync TPR value to V_TPR field in the VMCB"), so do not bother
clearing it and let one comment describe both CR8 write and VINTR
handling.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The function p2m_index is defined in the p2m.c file, but not called
elsewhere, so remove this unused function.
arch/x86/xen/p2m.c:137:24: warning: unused function 'p2m_index'.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3557
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230105090141.36248-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Three fixes for various bogosity in our linker script, revealed
by the recent commit which changed discard behaviour with some
toolchains.
* tag 'powerpc-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/vmlinux.lds: Don't discard .comment
powerpc/vmlinux.lds: Don't discard .rela* for relocatable builds
powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT
|
|
The 32-bit memory resource is needed for non-prefetchable memory
allocations on the PCIe bus, however with some cards (such as the
SM768) the system fails to allocate memory from this.
Checking the allocation against the datasheet, it looks like there
has been a mis-calcualation of the resource for the first memory
region (0x0060090000..0x0070ffffff) which in the data-sheet for
the fu740 (v1p2) is from 0x0060000000..0x007fffffff. Changing
this to allocate from 0x0060090000..0x007fffffff fixes the probing
issues.
Fixes: ae80d5148085 ("riscv: dts: Add PCIe support for the SiFive FU740-C000 SoC")
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: stable@vger.kernel.org
Tested-by: Ron Economos <re@w6rz.net> # from IRC
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Fix DT memory scanning for some MIPS boards when memory is not
specified in DT
- Redo CONFIG_CMDLINE* handling for missing /chosen node. The first
attempt broke PS3 (and possibly other PPC platforms).
- Fix constraints in QCom Soundwire schema
* tag 'devicetree-fixes-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: fdt: Honor CONFIG_CMDLINE* even without /chosen node, take 2
Revert "of: fdt: Honor CONFIG_CMDLINE* even without /chosen node"
dt-bindings: soundwire: qcom,soundwire: correct sizes related to number of ports
of/fdt: run soc memory setup when early_init_dt_scan_memory fails
|
|
bad_vaddr, bad_uaddr and error_code fields are set but never read by the
xtensa arch-specific code. Drop them. Also drop the commented out info
field.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
Pull arm TIF_NOTIFY_SIGNAL fixup from Jens Axboe:
"Hui Tang reported a performance regressions with _TIF_WORK_MASK in
newer kernels, which he tracked to a change that went into 5.11. After
this change, we'll call do_work_pending() more often than we need to,
because we're now testing bits 0..15 rather than just 0..7.
Shuffle the bits around to avoid this"
* tag 'tif-notify-signal-2023-01-06' of git://git.kernel.dk/linux:
ARM: renumber bits related to _TIF_WORK_MASK
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- use the correct mask for c.jr/c.jalr when decoding instructions
- build fix for get_user() to avoid a sparse warning
* tag 'riscv-for-linus-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: uaccess: fix type of 0 variable on error in get_user()
riscv, kprobes: Stricter c.jr/c.jalr decoding
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Ingo Molnar:
"Intel RAPL updates for new model IDs"
* tag 'perf-urgent-2023-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/rapl: Add support for Intel Emerald Rapids
perf/x86/rapl: Add support for Intel Meteor Lake
perf/x86/rapl: Treat Tigerlake like Icelake
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a CFI crash in arm64/sm4 as well as a regression in the
caam driver"
* tag 'v6.2-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm64/sm4 - fix possible crash with CFI enabled
crypto: caam - fix CAAM io mem access in blob_gen
|
|
If a Cortex-A715 cpu sees a page mapping permissions change from executable
to non-executable, it may corrupt the ESR_ELx and FAR_ELx registers, on the
next instruction abort caused by permission fault.
Only user-space does executable to non-executable permission transition via
mprotect() system call which calls ptep_modify_prot_start() and ptep_modify
_prot_commit() helpers, while changing the page mapping. The platform code
can override these helpers via __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION.
Work around the problem via doing a break-before-make TLB invalidation, for
all executable user space mappings, that go through mprotect() system call.
This overrides ptep_modify_prot_start() and ptep_modify_prot_commit(), via
defining HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION on the platform thus giving
an opportunity to intercept user space exec mappings, and do the necessary
TLB invalidation. Similar interceptions are also implemented for HugeTLB.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20230102061651.34745-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Nathan Chancellor reports that the s390 vmlinux fails to link with
GNU ld < 2.36 since commit 99cb0d917ffa ("arch: fix broken BuildID
for arm64 and riscv").
It happens for defconfig, or more specifically for CONFIG_EXPOLINE=y.
$ s390x-linux-gnu-ld --version | head -n1
GNU ld (GNU Binutils for Debian) 2.35.2
$ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- allnoconfig
$ ./scripts/config -e CONFIG_EXPOLINE
$ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- olddefconfig
$ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu-
`.exit.text' referenced in section `.s390_return_reg' of drivers/base/dd.o: defined in discarded section `.exit.text' of drivers/base/dd.o
make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
make: *** [Makefile:1252: vmlinux] Error 2
arch/s390/kernel/vmlinux.lds.S wants to keep EXIT_TEXT:
.exit.text : {
EXIT_TEXT
}
But, at the same time, EXIT_TEXT is thrown away by DISCARD because
s390 does not define RUNTIME_DISCARD_EXIT.
I still do not understand why the latter wins after 99cb0d917ffa,
but defining RUNTIME_DISCARD_EXIT seems correct because the comment
line in arch/s390/kernel/vmlinux.lds.S says:
/*
* .exit.text is discarded at runtime, not link time,
* to deal with references from __bug_table
*/
Nathan also found that binutils commit 21401fc7bf67 ("Duplicate output
sections in scripts") cured this issue, so we cannot reproduce it with
binutils 2.36+, but it is better to not rely on it.
Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv")
Link: https://lore.kernel.org/all/Y7Jal56f6UBh1abE@dev-arch.thelio-3990X/
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20230105031306.1455409-1-masahiroy@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Symbols _edata and _end in the linker script are the
only unaligned expicitly on page boundary. Although
_end is aligned implicitly by BSS_SECTION macro that
is still inconsistent and could lead to a bug if a tool
or function would assume that _edata is as aligned as
others.
For example, vmem_map_init() function does not align
symbols _etext, _einittext etc. Should these symbols
be unaligned as well, the size of ranges to update
were short on one page.
Instead of fixing every occurrence of this kind in the
code and external tools just force the alignment on
these two symbols.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Using DEBUG_H without a prefix is very generic and inconsistent with
other header guards in arch/s390/include/asm. In fact it collides with
the same name in the ath9k wireless driver though that depends on !S390
via disabled wireless support. Let's just use a consistent header guard
name and prevent possible future trouble.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
This reverts commit 703e84d6615a4a95fb504c8f2e4c9426b86f3930.
USB device enumeration was not working on Odroid HC4 as both USB2 PHYs
need to be enabled. This is inherited from the GLX USB design [1].
[1]: https://lore.kernel.org/all/20170814224542.18257-1-martin.blumenstingl@googlemail.com/T/
Signed-off-by: Pierre-Olivier Mercier <nemunaire@nemunai.re>
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://lore.kernel.org/r/20230105120206.28964-1-nemunaire@nemunai.re
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, wifi, and netfilter.
Current release - regressions:
- bpf: fix nullness propagation for reg to reg comparisons, avoid
null-deref
- inet: control sockets should not use current thread task_frag
- bpf: always use maximal size for copy_array()
- eth: bnxt_en: don't link netdev to a devlink port for VFs
Current release - new code bugs:
- rxrpc: fix a couple of potential use-after-frees
- netfilter: conntrack: fix IPv6 exthdr error check
- wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes
- eth: dsa: qca8k: various fixes for the in-band register access
- eth: nfp: fix schedule in atomic context when sync mc address
- eth: renesas: rswitch: fix getting mac address from device tree
- mobile: ipa: use proper endpoint mask for suspend
Previous releases - regressions:
- tcp: add TIME_WAIT sockets in bhash2, fix regression caught by
Jiri / python tests
- net: tc: don't intepret cls results when asked to drop, fix
oob-access
- vrf: determine the dst using the original ifindex for multicast
- eth: bnxt_en:
- fix XDP RX path if BPF adjusted packet length
- fix HDS (header placement) and jumbo thresholds for RX packets
- eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
avoid memory corruptions
Previous releases - always broken:
- ulp: prevent ULP without clone op from entering the LISTEN status
- veth: fix race with AF_XDP exposing old or uninitialized
descriptors
- bpf:
- pull before calling skb_postpull_rcsum() (fix checksum support
and avoid a WARN())
- fix panic due to wrong pageattr of im->image (when livepatch and
kretfunc coexist)
- keep a reference to the mm, in case the task is dead
- mptcp: fix deadlock in fastopen error path
- netfilter:
- nf_tables: perform type checking for existing sets
- nf_tables: honor set timeout and garbage collection updates
- ipset: fix hash:net,port,net hang with /0 subnet
- ipset: avoid hung task warning when adding/deleting entries
- selftests: net:
- fix cmsg_so_mark.sh test hang on non-x86 systems
- fix the arp_ndisc_evict_nocarrier test for IPv6
- usb: rndis_host: secure rndis_query check against int overflow
- eth: r8169: fix dmar pte write access during suspend/resume with
WOL
- eth: lan966x: fix configuration of the PCS
- eth: sparx5: fix reading of the MAC address
- eth: qed: allow sleep in qed_mcp_trace_dump()
- eth: hns3:
- fix interrupts re-initialization after VF FLR
- fix handling of promisc when MAC addr table gets full
- refine the handling for VF heartbeat
- eth: mlx5:
- properly handle ingress QinQ-tagged packets on VST
- fix io_eq_size and event_eq_size params validation on big endian
- fix RoCE setting at HCA level if not supported at all
- don't turn CQE compression on by default for IPoIB
- eth: ena:
- fix toeplitz initial hash key value
- account for the number of XDP-processed bytes in interface stats
- fix rx_copybreak value update
Misc:
- ethtool: harden phy stat handling against buggy drivers
- docs: netdev: convert maintainer's doc from FAQ to a normal
document"
* tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
caif: fix memory leak in cfctrl_linkup_request()
inet: control sockets should not use current thread task_frag
net/ulp: prevent ULP without clone op from entering the LISTEN status
qed: allow sleep in qed_mcp_trace_dump()
MAINTAINERS: Update maintainers for ptp_vmw driver
usb: rndis_host: Secure rndis_query check against int overflow
net: dpaa: Fix dtsec check for PCS availability
octeontx2-pf: Fix lmtst ID used in aura free
drivers/net/bonding/bond_3ad: return when there's no aggregator
netfilter: ipset: Rework long task execution when adding/deleting entries
netfilter: ipset: fix hash:net,port,net hang with /0 subnet
net: sparx5: Fix reading of the MAC address
vxlan: Fix memory leaks in error path
net: sched: htb: fix htb_classify() kernel-doc
net: sched: cbq: dont intepret cls results when asked to drop
net: sched: atm: dont intepret cls results when asked to drop
dt-bindings: net: marvell,orion-mdio: Fix examples
dt-bindings: net: sun8i-emac: Add phy-supply property
net: ipa: use proper endpoint mask for suspend
selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier
...
|
|
If the get_user(x, ptr) has x as a pointer, then the setting
of (x) = 0 is going to produce the following sparse warning,
so fix this by forcing the type of 'x' when access_ok() fails.
fs/aio.c:2073:21: warning: Using plain integer as NULL pointer
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20221229170545.718264-1-ben-linux@fluff.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
In the compressed instruction extension, c.jr, c.jalr, c.mv, and c.add
is encoded the following way (each instruction is 16b):
---+-+-----------+-----------+--
100 0 rs1[4:0]!=0 00000 10 : c.jr
100 1 rs1[4:0]!=0 00000 10 : c.jalr
100 0 rd[4:0]!=0 rs2[4:0]!=0 10 : c.mv
100 1 rd[4:0]!=0 rs2[4:0]!=0 10 : c.add
The following logic is used to decode c.jr and c.jalr:
insn & 0xf007 == 0x8002 => instruction is an c.jr
insn & 0xf007 == 0x9002 => instruction is an c.jalr
When 0xf007 is used to mask the instruction, c.mv can be incorrectly
decoded as c.jr, and c.add as c.jalr.
Correct the decoding by changing the mask from 0xf007 to 0xf07f.
Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230102160748.1307289-1-bjorn@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Currently we only allocate space for SVE signal frames on systems that
support SVE, meaning that SME only systems do not allocate a signal frame
for streaming mode SVE state. Change the check so space is allocated if
either feature is supported.
Fixes: 85ed24dad290 ("arm64/sme: Implement streaming SVE signal handling")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221223-arm64-fix-sme-only-v1-3-938d663f69e5@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently we reject an attempt to restore a SVE signal frame on a system
with SME but not SVE supported. This means that it is not possible to
disable streaming mode via signal return as this is configured via the
flags in the SVE signal context. Instead accept the signal frame, we will
require it to have a vector length of 0 specified and no payload since the
task will have no SVE vector length configured.
Fixes: 85ed24dad290 ("arm64/sme: Implement streaming SVE signal handling")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221223-arm64-fix-sme-only-v1-2-938d663f69e5@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When refactoring fpsimd_load() to support keeping SVE enabled over syscalls
support for systems with SME but not SVE was broken. The code that selects
between loading regular FPSIMD and SVE states was guarded by using
system_supports_sve() but is also needed to handle the streaming SVE state
in SME only systems where that check will be false. Fix this by also
checking for system_supports_sme().
Fixes: a0136be443d5 ("arm64/fpsimd: Load FP state based on recorded data type")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221223-arm64-fix-sme-only-v1-1-938d663f69e5@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The inline assembly for arm64's cmpxchg_double*() implementations use a
+Q constraint to hazard against other accesses to the memory location
being exchanged. However, the pointer passed to the constraint is a
pointer to unsigned long, and thus the hazard only applies to the first
8 bytes of the location.
GCC can take advantage of this, assuming that other portions of the
location are unchanged, leading to a number of potential problems.
This is similar to what we fixed back in commit:
fee960bed5e857eb ("arm64: xchg: hazard against entire exchange variable")
... but we forgot to adjust cmpxchg_double*() similarly at the same
time.
The same problem applies, as demonstrated with the following test:
| struct big {
| u64 lo, hi;
| } __aligned(128);
|
| unsigned long foo(struct big *b)
| {
| u64 hi_old, hi_new;
|
| hi_old = b->hi;
| cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78);
| hi_new = b->hi;
|
| return hi_old ^ hi_new;
| }
... which GCC 12.1.0 compiles as:
| 0000000000000000 <foo>:
| 0: d503233f paciasp
| 4: aa0003e4 mov x4, x0
| 8: 1400000e b 40 <foo+0x40>
| c: d2800240 mov x0, #0x12 // #18
| 10: d2800681 mov x1, #0x34 // #52
| 14: aa0003e5 mov x5, x0
| 18: aa0103e6 mov x6, x1
| 1c: d2800ac2 mov x2, #0x56 // #86
| 20: d2800f03 mov x3, #0x78 // #120
| 24: 48207c82 casp x0, x1, x2, x3, [x4]
| 28: ca050000 eor x0, x0, x5
| 2c: ca060021 eor x1, x1, x6
| 30: aa010000 orr x0, x0, x1
| 34: d2800000 mov x0, #0x0 // #0 <--- BANG
| 38: d50323bf autiasp
| 3c: d65f03c0 ret
| 40: d2800240 mov x0, #0x12 // #18
| 44: d2800681 mov x1, #0x34 // #52
| 48: d2800ac2 mov x2, #0x56 // #86
| 4c: d2800f03 mov x3, #0x78 // #120
| 50: f9800091 prfm pstl1strm, [x4]
| 54: c87f1885 ldxp x5, x6, [x4]
| 58: ca0000a5 eor x5, x5, x0
| 5c: ca0100c6 eor x6, x6, x1
| 60: aa0600a6 orr x6, x5, x6
| 64: b5000066 cbnz x6, 70 <foo+0x70>
| 68: c8250c82 stxp w5, x2, x3, [x4]
| 6c: 35ffff45 cbnz w5, 54 <foo+0x54>
| 70: d2800000 mov x0, #0x0 // #0 <--- BANG
| 74: d50323bf autiasp
| 78: d65f03c0 ret
Notice that at the lines with "BANG" comments, GCC has assumed that the
higher 8 bytes are unchanged by the cmpxchg_double() call, and that
`hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and
LL/SC versions of cmpxchg_double().
This patch fixes the issue by passing a pointer to __uint128_t into the
+Q constraint, ensuring that the compiler hazards against the entire 16
bytes being modified.
With this change, GCC 12.1.0 compiles the above test as:
| 0000000000000000 <foo>:
| 0: f9400407 ldr x7, [x0, #8]
| 4: d503233f paciasp
| 8: aa0003e4 mov x4, x0
| c: 1400000f b 48 <foo+0x48>
| 10: d2800240 mov x0, #0x12 // #18
| 14: d2800681 mov x1, #0x34 // #52
| 18: aa0003e5 mov x5, x0
| 1c: aa0103e6 mov x6, x1
| 20: d2800ac2 mov x2, #0x56 // #86
| 24: d2800f03 mov x3, #0x78 // #120
| 28: 48207c82 casp x0, x1, x2, x3, [x4]
| 2c: ca050000 eor x0, x0, x5
| 30: ca060021 eor x1, x1, x6
| 34: aa010000 orr x0, x0, x1
| 38: f9400480 ldr x0, [x4, #8]
| 3c: d50323bf autiasp
| 40: ca0000e0 eor x0, x7, x0
| 44: d65f03c0 ret
| 48: d2800240 mov x0, #0x12 // #18
| 4c: d2800681 mov x1, #0x34 // #52
| 50: d2800ac2 mov x2, #0x56 // #86
| 54: d2800f03 mov x3, #0x78 // #120
| 58: f9800091 prfm pstl1strm, [x4]
| 5c: c87f1885 ldxp x5, x6, [x4]
| 60: ca0000a5 eor x5, x5, x0
| 64: ca0100c6 eor x6, x6, x1
| 68: aa0600a6 orr x6, x5, x6
| 6c: b5000066 cbnz x6, 78 <foo+0x78>
| 70: c8250c82 stxp w5, x2, x3, [x4]
| 74: 35ffff45 cbnz w5, 5c <foo+0x5c>
| 78: f9400480 ldr x0, [x4, #8]
| 7c: d50323bf autiasp
| 80: ca0000e0 eor x0, x7, x0
| 84: d65f03c0 ret
... sampling the high 8 bytes before and after the cmpxchg, and
performing an EOR, as we'd expect.
For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note
that linux-4.9.y is oldest currently supported stable release, and
mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run
on my machines due to library incompatibilities.
I've also used a standalone test to check that we can use a __uint128_t
pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM
3.9.1.
Fixes: 5284e1b4bc8a ("arm64: xchg: Implement cmpxchg_double")
Fixes: e9a4b795652f ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU")
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/
Reported-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
After we fixed the uprobe inst endian in aarch_be, the sparse check report
the following warning info:
sparse warnings: (new ones prefixed by >>)
>> kernel/events/uprobes.c:223:25: sparse: sparse: restricted __le32 degrades to integer
>> kernel/events/uprobes.c:574:56: sparse: sparse: incorrect type in argument 4 (different base types)
@@ expected unsigned int [addressable] [usertype] opcode @@ got restricted __le32 [usertype] @@
kernel/events/uprobes.c:574:56: sparse: expected unsigned int [addressable] [usertype] opcode
kernel/events/uprobes.c:574:56: sparse: got restricted __le32 [usertype]
>> kernel/events/uprobes.c:1483:32: sparse: sparse: incorrect type in initializer (different base types)
@@ expected unsigned int [usertype] insn @@ got restricted __le32 [usertype] @@
kernel/events/uprobes.c:1483:32: sparse: expected unsigned int [usertype] insn
kernel/events/uprobes.c:1483:32: sparse: got restricted __le32 [usertype]
use the __le32 to u32 for uprobe_opcode_t, to keep the same.
Fixes: 60f07e22a73d ("arm64:uprobe fix the uprobe SWBP_INSN in big-endian")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: junhua huang <huang.junhua@zte.com.cn>
Link: https://lore.kernel.org/r/202212280954121197626@zte.com.cn
Signed-off-by: Will Deacon <will@kernel.org>
|
|
* kvm-arm64/s1ptw-write-fault:
: .
: Fix S1PTW fault handling that was until then always taken
: as a write. From the cover letter:
:
: `Recent developments on the EFI front have resulted in guests that
: simply won't boot if the page tables are in a read-only memslot and
: that you're a bit unlucky in the way S2 gets paged in... The core
: issue is related to the fact that we treat a S1PTW as a write, which
: is close enough to what needs to be done. Until to get to RO memslots.
:
: The first patch fixes this and is definitely a stable candidate. It
: splits the faulting of page tables in two steps (RO translation fault,
: followed by a writable permission fault -- should it even happen).
: The second one documents the slightly odd behaviour of PTW writes to
: RO memslot, which do not result in a KVM_MMIO exit. The last patch is
: totally optional, only tangentially related, and randomly repainting
: stuff (maybe that's contagious, who knows)."
:
: .
KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*
KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
KVM: arm64: Fix S1PTW handling on RO memslots
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/pmu-fixes-6.2:
: .
: Fix for an incredibly stupid bug in the PMU rework that went into
: 6.2. Brown paper bag time.
: .
KVM: arm64: PMU: Fix PMCR_EL0 reset value
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
I really hoped that Apple had fixed their not-quite-a-vgic implementation
when moving from M1 to M2. Alas, it seems they didn't, and running
a buggy EFI version results in the vgic generating SErrors outside
of the guest and taking the host down.
Apply the same workaround as for M1. Yes, this is all a bit crap.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230103095022.3230946-2-maz@kernel.org
|
|
The MTE coredump code in arch/arm64/kernel/elfcore.c iterates over the
vma list without the mmap_lock held. This can race with another process
or userfaultfd concurrently modifying the vma list. Change the
for_each_mte_vma macro and its callers to instead use the vma snapshot
taken by dump_vma_snapshot() and stored in the cprm object.
Fixes: 6dd8b1a0b6cb ("arm64: mte: Dump the MTE tags in the core file")
Cc: <stable@vger.kernel.org> # 5.18.x
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Seth Jenkins <sethjenkins@google.com>
Suggested-by: Seth Jenkins <sethjenkins@google.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221222181251.1345752-4-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
A subsequent fix for arm64 will use this parameter to parse the vma
information from the snapshot created by dump_vma_snapshot() rather than
traversing the vma list without the mmap_lock.
Fixes: 6dd8b1a0b6cb ("arm64: mte: Dump the MTE tags in the core file")
Cc: <stable@vger.kernel.org> # 5.18.x
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Seth Jenkins <sethjenkins@google.com>
Suggested-by: Seth Jenkins <sethjenkins@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221222181251.1345752-3-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit 16decce22efa ("arm64: mte: Fix the stack frame size warning in
mte_dump_tag_range()") moved the temporary tag storage array from the
stack to slab but it also introduced an error in double freeing this
object. Remove the in-loop freeing.
Fixes: 16decce22efa ("arm64: mte: Fix the stack frame size warning in mte_dump_tag_range()")
Cc: <stable@vger.kernel.org> # 5.18.x
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Seth Jenkins <sethjenkins@google.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221222181251.1345752-2-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We currently guard REGSET_{SSVE, ZA} using ARM64_SVE for no good reason.
Both enumerations would be pointless without ARM64_SME and create two empty
entries in aarch64_regsets[] which would then become part of a process's
native regset view (they should be ignored though).
Switch to use ARM64_SME instead.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221214135943.379-1-yuzenghui@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add check for the executable case in pud_user_accessible_page() too
like what we did for pte and pmd.
Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Link: https://lore.kernel.org/r/20221122123137.429686-1-liushixin2@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The page table check trigger BUG_ON() unexpectedly when split hugepage:
------------[ cut here ]------------
kernel BUG at mm/page_table_check.c:119!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
Hardware name: linux,dummy-virt (DT)
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : page_table_check_set.isra.0+0x398/0x468
lr : page_table_check_set.isra.0+0x1c0/0x468
[...]
Call trace:
page_table_check_set.isra.0+0x398/0x468
__page_table_check_pte_set+0x160/0x1c0
__split_huge_pmd_locked+0x900/0x1648
__split_huge_pmd+0x28c/0x3b8
unmap_page_range+0x428/0x858
unmap_single_vma+0xf4/0x1c8
zap_page_range+0x2b0/0x410
madvise_vma_behavior+0xc44/0xe78
do_madvise+0x280/0x698
__arm64_sys_madvise+0x90/0xe8
invoke_syscall.constprop.0+0xdc/0x1d8
do_el0_svc+0xf4/0x3f8
el0_svc+0x58/0x120
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x19c/0x1a0
[...]
On arm64, pmd_leaf() will return true even if the pmd is invalid due to
pmd_present_invalid() check. So in pmdp_invalidate() the file_map_count
will not only decrease once but also increase once. Then in set_pte_at(),
the file_map_count increase again, and so trigger BUG_ON() unexpectedly.
Add !pmd_present_invalid() check in pmd_user_accessible_page() to fix the
problem.
Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221121073608.4183459-1-liushixin2@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Although the powerpc linker script mentions .comment in the DISCARD
section, that has never actually caused it to be discarded, because the
earlier ELF_DETAILS macro (previously STABS_DEBUG) explicitly includes
.comment.
However commit 99cb0d917ffa ("arch: fix broken BuildID for arm64 and
riscv") introduced an earlier use of DISCARD as part of the RO_DATA
macro. With binutils < 2.36 that causes the DISCARD directives later in
the script to be applied earlier, causing .comment to actually be
discarded.
It's confusing to explicitly include and discard .comment, and even more
so if the behaviour depends on the toolchain version. So don't discard
.comment in order to maintain the existing behaviour in all cases.
Fixes: 83a092cf95f2 ("powerpc: Link warning for orphan sections")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230105132349.384666-3-mpe@ellerman.id.au
|
|
Relocatable kernels must not discard relocations, they need to be
processed at runtime. As such they are included for CONFIG_RELOCATABLE
builds in the powerpc linker script (line 340).
However they are also unconditionally discarded later in the
script (line 414). Previously that worked because the earlier inclusion
superseded the discard.
However commit 99cb0d917ffa ("arch: fix broken BuildID for arm64 and
riscv") introduced an earlier use of DISCARD as part of the RO_DATA
macro (line 137). With binutils < 2.36 that causes the DISCARD
directives later in the script to be applied earlier, causing .rela* to
actually be discarded at link time, leading to build warnings and a
kernel that doesn't boot:
ld: warning: discarding dynamic section .rela.init.rodata
Fix it by conditionally discarding .rela* only when CONFIG_RELOCATABLE
is disabled.
Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230105132349.384666-2-mpe@ellerman.id.au
|
|
The powerpc linker script explicitly includes .exit.text, because
otherwise the link fails due to references from __bug_table and
__ex_table. The code is freed (discarded) at runtime along with
.init.text and data.
That has worked in the past despite powerpc not defining
RUNTIME_DISCARD_EXIT because DISCARDS appears late in the powerpc linker
script (line 410), and the explicit inclusion of .exit.text
earlier (line 280) supersedes the discard.
However commit 99cb0d917ffa ("arch: fix broken BuildID for arm64 and
riscv") introduced an earlier use of DISCARD as part of the RO_DATA
macro (line 136). With binutils < 2.36 that causes the DISCARD
directives later in the script to be applied earlier [1], causing
.exit.text to actually be discarded at link time, leading to build
errors:
'.exit.text' referenced in section '__bug_table' of crypto/algboss.o: defined in
discarded section '.exit.text' of crypto/algboss.o
'.exit.text' referenced in section '__ex_table' of drivers/nvdimm/core.o: defined in
discarded section '.exit.text' of drivers/nvdimm/core.o
Fix it by defining RUNTIME_DISCARD_EXIT, which causes the generic
DISCARDS macro to not include .exit.text at all.
1: https://lore.kernel.org/lkml/87fscp2v7k.fsf@igel.home/
Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230105132349.384666-1-mpe@ellerman.id.au
|
|
Emerald Rapids RAPL support is the same as previous Sapphire Rapids.
Add Emerald Rapids model for RAPL.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230104145831.25498-2-rui.zhang@intel.com
|
|
Meteor Lake RAPL support is the same as previous Sky Lake.
Add Meteor Lake model for RAPL.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230104145831.25498-1-rui.zhang@intel.com
|
|
We want to ensure that the mask related to calling do_work_pending()
is within the first 16 bits. Move bits unrelated to that outside of
that range, to avoid spuriously calling do_work_pending() when we don't
need to.
Cc: stable@vger.kernel.org
Fixes: 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL")
Reported-and-tested-by: Hui Tang <tanghui20@huawei.com>
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Link: https://lore.kernel.org/lkml/7ecb8f3c-2aeb-a905-0d4a-aa768b9649b5@huawei.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We missed the window between the TIF flag update and the next reschedule.
Signed-off-by: Rodrigo Branco <bsdaemon@google.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
|
|
If memory has been found early_init_dt_scan_memory now returns 1. If
it hasn't found any memory it will return 0, allowing other memory
setup mechanisms to carry on.
Previously early_init_dt_scan_memory always returned 0 without
distinguishing between any kind of memory setup being done or not. Any
code path after the early_init_dt_scan memory call in the ramips
plat_mem_setup code wouldn't be executed anymore. Making
early_init_dt_scan_memory the only way to initialize the memory.
Some boards, including my mt7621 based Cudy X6 board, depend on memory
initialization being done via the soc_info.mem_detect function
pointer. Those wouldn't be able to obtain memory and panic the kernel
during early bootup with the message "early_init_dt_alloc_memory_arch:
Failed to allocate 12416 bytes align=0x40".
Fixes: 1f012283e936 ("of/fdt: Rework early_init_dt_scan_memory() to call directly")
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Rammhold <andreas@rammhold.de>
Link: https://lore.kernel.org/r/20221223112748.2935235-1-andreas@rammhold.de
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Since Tigerlake seems to have inherited its cstates and other RAPL power
caps from Icelake, assume it also follows Icelake for its RAPL events.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20221228113454.1199118-1-rodrigo.vivi@intel.com
|
|
from MMIO trace type
Both <linux/mmiotrace.h> and <asm/insn-eval.h> define various MMIO_ enum constants,
whose namespace overlaps.
Rename the <asm/insn-eval.h> ones to have a INSN_ prefix, so that the headers can be
used from the same source file.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230101162910.710293-2-Jason@zx2c4.com
|
|
Fix a warning: "found `movsd'; assuming `movsl' was meant"
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
|
|
The former is an AArch32 legacy, so let's move over to the
verbose (and strictly identical) version.
This involves moving some of the #defines that were private
to KVM into the more generic esr.h.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
A recent development on the EFI front has resulted in guests having
their page tables baked in the firmware binary, and mapped into the
IPA space as part of a read-only memslot. Not only is this legitimate,
but it also results in added security, so thumbs up.
It is possible to take an S1PTW translation fault if the S1 PTs are
unmapped at stage-2. However, KVM unconditionally treats S1PTW as a
write to correctly handle hardware AF/DB updates to the S1 PTs.
Furthermore, KVM injects an exception into the guest for S1PTW writes.
In the aforementioned case this results in the guest taking an abort
it won't recover from, as the S1 PTs mapping the vectors suffer from
the same problem.
So clearly our handling is... wrong.
Instead, switch to a two-pronged approach:
- On S1PTW translation fault, handle the fault as a read
- On S1PTW permission fault, handle the fault as a write
This is of no consequence to SW that *writes* to its PTs (the write
will trigger a non-S1PTW fault), and SW that uses RO PTs will not
use HW-assisted AF/DB anyway, as that'd be wrong.
Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
fault on S1PTW permission fault on instruction fetch") do we end-up
with two back-to-back faults (page being evicted and faulted back).
I don't think this is a case worth optimising for.
Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Regression-tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
|
|
After
b3e34a47f989 ("x86/kexec: fix memory leak of elf header buffer"),
freeing image->elf_headers in the error path of crash_load_segments()
is not needed because kimage_file_post_load_cleanup() will take
care of that later. And not clearing it could result in a double-free.
Drop the superfluous vfree() call at the error path of
crash_load_segments().
Fixes: b3e34a47f989 ("x86/kexec: fix memory leak of elf header buffer")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20221122115122.13937-1-tiwai@suse.de
|
|
The GW7901 has USB2 routed to a USB VBUS supply with over-current
protection via an active-low pin. Define the OC pin polarity properly.
Fixes: 2b1649a83afc ("arm64: dts: imx: Add i.mx8mm Gateworks gw7901 dts support")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|