summaryrefslogtreecommitdiffstats
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2020-05-28perf/x86/rapl: Add AMD Fam17h RAPL supportStephane Eranian2-0/+21
This patch enables AMD Fam17h RAPL support for the Package level metric. The support is as per AMD Fam17h Model31h (Zen2) and model 00-ffh (Zen1) PPR. The same output is available via the energy-pkg pseudo event: $ perf stat -a -I 1000 --per-socket -e power/energy-pkg/ Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200527224659.206129-6-eranian@google.com
2020-05-28perf/x86/rapl: Make perf_probe_msr() more robust and flexibleStephane Eranian1-0/+13
This patch modifies perf_probe_msr() by allowing passing of struct perf_msr array where some entries are not populated, i.e., they have either an msr address of 0 or no attribute_group pointer. This helps with certain call paths, e.g., RAPL. In case the grp is NULL, the default sysfs visibility rule applies which is to make the group visible. Without the patch, you would get a kernel crash with a NULL group. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200527224659.206129-5-eranian@google.com
2020-05-28perf/x86/rapl: Flip logic on default events visibilityStephane Eranian1-0/+11
This patch modifies the default visibility of the attribute_group for each RAPL event. By default if the grp.is_visible field is NULL, sysfs considers that it must display the attribute group. If the field is not NULL (callback function), then the return value of the callback determines the visibility (0 = not visible). The RAPL attribute groups had the field set to NULL, meaning that unless they failed the probing from perf_msr_probe(), they would be visible. We want to avoid having to specify attribute groups that are not supported by the HW in the rapl_msrs[] array, they don't have an MSR address to begin with. Therefore, we intialize the visible field of all RAPL attribute groups to a callback that returns 0. If the RAPL msr goes through probing and succeeds the is_visible field will be set back to NULL (visible). If the probing fails the field is set to a callback that return 0 (not visible). Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200527224659.206129-4-eranian@google.com
2020-05-28perf/x86/rapl: Refactor to share the RAPL code between Intel and AMD CPUsStephane Eranian1-6/+23
This patch modifies the rapl_model struct to include architecture specific knowledge in this previously Intel specific structure, and in particular it adds the MSR for POWER_UNIT and the rapl_msrs array. No functional changes. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200527224659.206129-3-eranian@google.com
2020-05-28perf/x86/rapl: Move RAPL support to common x86 codeStephane Eranian4-8/+10
To prepare for support of both Intel and AMD RAPL. As per the AMD PPR, Fam17h support Package RAPL counters to monitor power usage. The RAPL counter operates as with Intel RAPL, and as such it is beneficial to share the code. No change in functionality. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200527224659.206129-2-eranian@google.com
2020-05-28Merge tag 'v5.7-rc7' into perf/core, to pick up fixesIngo Molnar40-322/+523
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-27copy_xstate_to_kernel(): don't leave parts of destination uninitializedAl Viro1-38/+48
copy the corresponding pieces of init_fpstate into the gaps instead. Cc: stable@kernel.org Tested-by: Alexander Potapenko <glider@google.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-27x86: Hide the archdata.iommu field behind generic IOMMU_APIKrzysztof Kozlowski1-1/+1
There is a generic, kernel wide configuration symbol for enabling the IOMMU specific bits: CONFIG_IOMMU_API. Implementations (including INTEL_IOMMU and AMD_IOMMU driver) select it so use it here as well. This makes the conditional archdata.iommu field consistent with other platforms and also fixes any compile test builds of other IOMMU drivers, when INTEL_IOMMU or AMD_IOMMU are not selected). For the case when INTEL_IOMMU/AMD_IOMMU and COMPILE_TEST are not selected, this should create functionally equivalent code/choice. With COMPILE_TEST this field could appear if other IOMMU drivers are chosen but neither INTEL_IOMMU nor AMD_IOMMU are not. Reported-by: kbuild test robot <lkp@intel.com> Fixes: e93a1695d7fb ("iommu: Enable compile testing for some of drivers") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200518120855.27822-2-krzk@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-27x86/apb_timer: Drop unused declaration and macroJohan Hovold1-3/+0
Drop an extern declaration that has never been used and a no longer needed macro. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200513100944.9171-2-johan@kernel.org
2020-05-27x86/apb_timer: Drop unused TSC calibrationJohan Hovold2-55/+0
Drop the APB-timer TSC calibration, which hasn't been used since the removal of Moorestown support by commit 1a8359e411eb ("x86/mid: Remove Intel Moorestown"). Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200513100944.9171-1-johan@kernel.org
2020-05-26x86/io_apic: Remove unused function mp_init_irq_at_boot()YueHaibing1-13/+0
There are no callers in-tree anymore since ef9e56d894ea ("x86/ioapic: Remove obsolete post hotplug update") so remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200508140808.49428-1-yuehaibing@huawei.com
2020-05-26x86/syscalls: Revert "x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long"Andy Lutomirski1-2/+9
Revert 45e29d119e99 ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long") and add a comment to discourage someone else from making the same mistake again. It turns out that some user code fails to compile if __X32_SYSCALL_BIT is unsigned long. See, for example [1] below. [ bp: Massage and do the same thing in the respective tools/ header. ] Fixes: 45e29d119e99 ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long") Reported-by: Thorsten Glaser <t.glaser@tarent.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@kernel.org Link: [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954294 Link: https://lkml.kernel.org/r/92e55442b744a5951fdc9cfee10badd0a5f7f828.1588983892.git.luto@kernel.org
2020-05-25Merge tag 'efi-changes-for-v5.8' of ↵Ingo Molnar3-21/+29
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core More EFI changes for v5.8: - Rename pr_efi/pr_efi_err to efi_info/efi_err, and use them consistently - Simplify and unify initrd loading - Parse the builtin command line on x86 (if provided) - Implement printk() support, including support for wide character strings - Some fixes for issues introduced by the first batch of v5.8 changes - Fix a missing prototypes warning - Simplify GDT handling in early mixed mode thunking code - Some other minor fixes and cleanups Conflicts: drivers/firmware/efi/libstub/efistub.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-25Merge tag 'v5.7-rc7' into efi/core, to refresh the branch and pick up fixesIngo Molnar43-360/+582
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-24Merge tag 'efi-urgent-2020-05-24' of ↵Linus Torvalds1-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Thomas Gleixner: "A set of EFI fixes: - Don't return a garbage screen info when EFI framebuffer is not available - Make the early EFI console work properly with wider fonts instead of drawing garbage - Prevent a memory buffer leak in allocate_e820() - Print the firmware error record properly so it can be decoded by users - Fix a symbol clash in the host tool build which only happens with newer compilers. - Add a missing check for the event log version of TPM which caused boot failures on several Dell systems due to an attempt to decode SHA-1 format with the crypto agile algorithm" * tag 'efi-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tpm: check event log version before reading final events efi: Pull up arch-specific prototype efi_systab_show_arch() x86/boot: Mark global variables as static efi: cper: Add support for printing Firmware Error Record Reference efi/libstub/x86: Avoid EFI map buffer alloc in allocate_e820() efi/earlycon: Fix early printk for wider fonts efi/libstub: Avoid returning uninitialized data from setup_graphics()
2020-05-24Merge tag 'x86-urgent-2020-05-24' of ↵Linus Torvalds2-2/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two fixes for x86: - Unbreak stack dumps for inactive tasks by interpreting the special first frame left by __switch_to_asm() correctly. The recent change not to skip the first frame so ORC and frame unwinder behave in the same way caused all entries to be unreliable, i.e. prepended with '?'. - Use cpumask_available() instead of an implicit NULL check of a cpumask_var_t in mmio trace to prevent a Clang build warning" * tag 'x86-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks x86/mmiotrace: Use cpumask_available() for cpumask_var_t variables
2020-05-24efi/x86: Drop the special GDT for the EFI thunkArvind Sankar1-16/+3
Instead of using efi_gdt64 to switch back to 64-bit mode and then switching to the real boot-time GDT, just switch to the boot-time GDT directly. The two GDT's are identical other than efi_gdt64 not including the 32-bit code segment. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200523221513.1642948-1-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-05-23x86: bitops: fix build regressionNick Desaulniers1-6/+6
This is easily reproducible via CC=clang + CONFIG_STAGING=y + CONFIG_VT6656=m. It turns out that if your config tickles __builtin_constant_p via differences in choices to inline or not, these statements produce invalid assembly: $ cat foo.c long a(long b, long c) { asm("orb %1, %0" : "+q"(c): "r"(b)); return c; } $ gcc foo.c foo.c: Assembler messages: foo.c:2: Error: `%rax' not allowed with `orb' Use the `%b` "x86 Operand Modifier" to instead force register allocation to select a lower-8-bit GPR operand. The "q" constraint only has meaning on -m32 otherwise is treated as "r". Not all GPRs have low-8-bit aliases for -m32. Fixes: 1651e700664b4 ("x86: Fix bitops.h warning with a moved cast") Reported-by: kernelci.org bot <bot@kernelci.org> Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Suggested-by: Brian Gerst <brgerst@gmail.com> Suggested-by: H. Peter Anvin <hpa@zytor.com> Suggested-by: Ilie Halip <ilie.halip@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> [build, clang-11] Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-By: Brian Gerst <brgerst@gmail.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Marco Elver <elver@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Daniel Axtens <dja@axtens.net> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Link: http://lkml.kernel.org/r/20200508183230.229464-1-ndesaulniers@google.com Link: https://github.com/ClangBuiltLinux/linux/issues/961 Link: https://lore.kernel.org/lkml/20200504193524.GA221287@google.com/ Link: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#x86Operandmodifiers Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-23x86/apic/uv: Remove code for unused distributed GRU modeSteve Wahl1-58/+1
Distributed GRU mode appeared in only one generation of UV hardware, and no version of the BIOS has shipped with this feature enabled, and we have no plans to ever change that. The gru.s3.mode check has always been and will continue to be false. So remove this dead code. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dimitri Sivanich <sivanich@hpe.com> Link: https://lkml.kernel.org/r/20200513221123.GJ3240@raspberrypi
2020-05-23x86/mm: Stop printing BRK addressesArvind Sankar1-2/+0
This currently leaks kernel physical addresses into userspace. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Link: https://lkml.kernel.org/r/20200229231120.1147527-1-nivedita@alum.mit.edu
2020-05-22Merge tag 'efi-fixes-for-v5.7-rc6' of ↵Borislav Petkov1-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/urgent Pull EFI fixes from Ard Biesheuvel: "- fix EFI framebuffer earlycon for wide fonts - avoid filling screen_info with garbage if the EFI framebuffer is not available - fix a potential host tool build error due to a symbol clash on x86 - work around a EFI firmware bug regarding the binary format of the TPM final events table - fix a missing memory free by reworking the E820 table sizing routine to not do the allocation in the first place - add CPER parsing for firmware errors"
2020-05-22x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasksJosh Poimboeuf1-0/+7
Normally, show_trace_log_lvl() scans the stack, looking for text addresses to print. In parallel, it unwinds the stack with unwind_next_frame(). If the stack address matches the pointer returned by unwind_get_return_address_ptr() for the current frame, the text address is printed normally without a question mark. Otherwise it's considered a breadcrumb (potentially from a previous call path) and it's printed with a question mark to indicate that the address is unreliable and typically can be ignored. Since the following commit: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks") ... for inactive tasks, show_trace_log_lvl() prints *only* unreliable addresses (prepended with '?'). That happens because, for the first frame of an inactive task, unwind_get_return_address_ptr() returns the wrong return address pointer: one word *below* the task stack pointer. show_trace_log_lvl() starts scanning at the stack pointer itself, so it never finds the first 'reliable' address, causing only guesses to being printed. The first frame of an inactive task isn't a normal stack frame. It's actually just an instance of 'struct inactive_task_frame' which is left behind by __switch_to_asm(). Now that this inactive frame is actually exposed to callers, fix unwind_get_return_address_ptr() to interpret it properly. Fixes: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200522135435.vbxs7umku5pyrdbk@treble
2020-05-22x86/boot: Discard .discard.unreachable for arch/x86/boot/compressed/vmlinuxFangrui Song1-0/+2
With commit ce5e3f909fc0 ("efi/printf: Add 64-bit and 8-bit integer support") arch/x86/boot/compressed/vmlinux may have an undesired .discard.unreachable section coming from drivers/firmware/efi/libstub/vsprintf.stub.o. That section gets generated from unreachable() annotations when CONFIG_STACK_VALIDATION is enabled. .discard.unreachable contains an R_X86_64_PC32 relocation which will be warned about by LLD: a non-SHF_ALLOC section (.discard.unreachable) is not part of the memory image, thus conceptually the distance between a non-SHF_ALLOC and a SHF_ALLOC is not a constant which can be resolved at link time: % ld.lld -m elf_x86_64 -T arch/x86/boot/compressed/vmlinux.lds ... -o arch/x86/boot/compressed/vmlinux ld.lld: warning: vsprintf.c:(.discard.unreachable+0x0): has non-ABS relocation R_X86_64_PC32 against symbol '' Reuse the DISCARDS macro which includes .discard.* to drop .discard.unreachable. [ bp: Massage and complete the commit message. ] Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Arvind Sankar <nivedita@alum.mit.edu> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Link: https://lkml.kernel.org/r/20200520182010.242489-1-maskray@google.com
2020-05-20efi/libstub: Add definitions for console input and eventsArvind Sankar2-1/+11
Add the required typedefs etc for using con_in's simple text input protocol, and for using the boottime event services. Also add the prototype for the "stall" boot service. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200518190716.751506-19-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-05-19Merge tag 'hyperv-fixes-signed' of ↵Linus Torvalds1-2/+17
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fix from Wei Liu: "One patch from Vitaly to fix reenlightenment notifications" * tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Properly suspend/resume reenlightenment notifications
2020-05-19perf/x86: Replace zero-length array with flexible-arrayGustavo A. R. Silva2-2/+2
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200511200911.GA13149@embeddedor
2020-05-19perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel TremontKan Liang1-2/+2
The mask in the extra_regs for Intel Tremont need to be extended to allow more defined bits. "Outstanding Requests" (bit 63) is only available on MSR_OFFCORE_RSP0; Fixes: 6daeb8737f8a ("perf/x86/intel: Add Tremont core PMU support") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200501125442.7030-1-kan.liang@linux.intel.com
2020-05-19perf/x86/rapl: Add Ice Lake RAPL supportKan Liang1-0/+2
Enable RAPL support for Intel Ice Lake X and Ice Lake D. For RAPL support, it is identical to Sky Lake X. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1588857258-38213-1-git-send-email-kan.liang@linux.intel.com
2020-05-19x86/mmiotrace: Use cpumask_available() for cpumask_var_t variablesNathan Chancellor1-2/+2
When building with Clang + -Wtautological-compare and CONFIG_CPUMASK_OFFSTACK unset: arch/x86/mm/mmio-mod.c:375:6: warning: comparison of array 'downed_cpus' equal to a null pointer is always false [-Wtautological-pointer-compare] if (downed_cpus == NULL && ^~~~~~~~~~~ ~~~~ arch/x86/mm/mmio-mod.c:405:6: warning: comparison of array 'downed_cpus' equal to a null pointer is always false [-Wtautological-pointer-compare] if (downed_cpus == NULL || cpumask_weight(downed_cpus) == 0) ^~~~~~~~~~~ ~~~~ 2 warnings generated. Commit f7e30f01a9e2 ("cpumask: Add helper cpumask_available()") added cpumask_available() to fix warnings of this nature. Use that here so that clang does not warn regardless of CONFIG_CPUMASK_OFFSTACK's value. Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://github.com/ClangBuiltLinux/linux/issues/982 Link: https://lkml.kernel.org/r/20200408205323.44490-1-natechancellor@gmail.com
2020-05-19x86/audit: Fix a -Wmissing-prototypes warning for ia32_classify_syscall()Benjamin Thiel3-1/+9
Lift the prototype of ia32_classify_syscall() into its own header. Signed-off-by: Benjamin Thiel <b.thiel@posteo.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200516123816.2680-1-b.thiel@posteo.de
2020-05-19x86: Replace ist_enter() with nmi_enter()Peter Zijlstra5-65/+24
A few exceptions (like #DB and #BP) can happen at any location in the code, this then means that tracers should treat events from these exceptions as NMI-like. The interrupted context could be holding locks with interrupts disabled for instance. Similarly, #MC is an actual NMI-like exception. All of them use ist_enter() which only concerns itself with RCU, but does not do any of the other setup that NMIs need. This means things like: printk() raw_spin_lock_irq(&logbuf_lock); <#DB/#BP/#MC> printk() raw_spin_lock_irq(&logbuf_lock); are entirely possible (well, not really since printk tries hard to play nice, but the concept stands). So replace ist_enter() with nmi_enter(). Also observe that any nmi_enter() caller must be both notrace and NOKPROBE, or in the noinstr text section. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Link: https://lkml.kernel.org/r/20200505134101.525508608@linutronix.de
2020-05-19x86/mce: Send #MC singal from task workPeter Zijlstra1-25/+31
Convert #MC over to using task_work_add(); it will run the same code slightly later, on the return to user path of the same exception. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Link: https://lkml.kernel.org/r/20200505134100.957390899@linutronix.de
2020-05-19x86/entry: Get rid of ist_begin/end_non_atomic()Thomas Gleixner3-41/+4
This is completely overengineered and definitely not an interface which should be made available to anything else than this particular MCE case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134059.462640294@linutronix.de
2020-05-19x86/boot: Correct relocation destination on old linkersArvind Sankar2-2/+4
For the 32-bit kernel, as described in 6d92bc9d483a ("x86/build: Build compressed x86 kernels as PIE"), pre-2.26 binutils generates R_386_32 relocations in PIE mode. Since the startup code does not perform relocation, any reloc entry with R_386_32 will remain as 0 in the executing code. Commit 974f221c84b0 ("x86/boot: Move compressed kernel to the end of the decompression buffer") added a new symbol _end but did not mark it hidden, which doesn't give the correct offset on older linkers. This causes the compressed kernel to be copied beyond the end of the decompression buffer, rather than flush against it. This region of memory may be reserved or already allocated for other purposes by the bootloader. Mark _end as hidden to fix. This changes the relocation from R_386_32 to R_386_RELATIVE even on the pre-2.26 binutils. For 64-bit, this is not strictly necessary, as the 64-bit kernel is only built as PIE if the linker supports -z noreloc-overflow, which implies binutils-2.27+, but for consistency, mark _end as hidden here too. The below illustrates the before/after impact of the patch using binutils-2.25 and gcc-4.6.4 (locally compiled from source) and QEMU. Disassembly before patch: 48: 8b 86 60 02 00 00 mov 0x260(%esi),%eax 4e: 2d 00 00 00 00 sub $0x0,%eax 4f: R_386_32 _end Disassembly after patch: 48: 8b 86 60 02 00 00 mov 0x260(%esi),%eax 4e: 2d 00 f0 76 00 sub $0x76f000,%eax 4f: R_386_RELATIVE *ABS* Dump from extract_kernel before patch: early console in extract_kernel input_data: 0x0207c098 <--- this is at output + init_size input_len: 0x0074fef1 output: 0x01000000 output_len: 0x00fa63d0 kernel_total_size: 0x0107c000 needed_size: 0x0107c000 Dump from extract_kernel after patch: early console in extract_kernel input_data: 0x0190d098 <--- this is at output + init_size - _end input_len: 0x0074fef1 output: 0x01000000 output_len: 0x00fa63d0 kernel_total_size: 0x0107c000 needed_size: 0x0107c000 Fixes: 974f221c84b0 ("x86/boot: Move compressed kernel to the end of the decompression buffer") Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200207214926.3564079-1-nivedita@alum.mit.edu
2020-05-18x86/cpu: Use RDRAND and RDSEED mnemonics in archrandom.hUros Bizjak1-18/+8
Current minimum required version of binutils is 2.23, which supports RDRAND and RDSEED instruction mnemonics. Replace the byte-wise specification of RDRAND and RDSEED with these proper mnemonics. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200508105817.207887-1-ubizjak@gmail.com
2020-05-18Merge tag 'v5.7-rc6' into objtool/core, to pick up fixes and resolve ↵Ingo Molnar37-304/+483
semantic conflict Resolve structural conflict between: 59566b0b622e: ("x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up") which introduced a new reference to 'ftrace_epilogue', and: 0298739b7983: ("x86,ftrace: Fix ftrace_regs_caller() unwind") Which renamed it to 'ftrace_caller_end'. Rename the new usage site in the merge commit. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-17Merge tag 'objtool-urgent-2020-05-17' of ↵Linus Torvalds1-7/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 stack unwinding fix from Thomas Gleixner: "A single bugfix for the ORC unwinder to ensure that the error flag which tells the unwinding code whether a stack trace can be trusted or not is always set correctly. This was messed up by a couple of changes in the recent past" * tag 'objtool-urgent-2020-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/orc: Fix error handling in __unwind_start()
2020-05-17Merge tag 'x86_urgent_for_v5.7-rc7' of ↵Linus Torvalds3-1/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Borislav Petkov: "A single fix for early boot crashes of kernels built with gcc10 and stack protector enabled" * tag 'x86_urgent_for_v5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix early boot crash on gcc-10, third try
2020-05-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds6-85/+97
Pull kvm fixes from Paolo Bonzini: "A new testcase for guest debugging (gdbstub) that exposed a bunch of bugs, mostly for AMD processors. And a few other x86 fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c KVM: SVM: Disable AVIC before setting V_IRQ KVM: Introduce kvm_make_all_cpus_request_except() KVM: VMX: pass correct DR6 for GD userspace exit KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6 KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6 KVM: nSVM: trap #DB and #BP to userspace if guest debugging is on KVM: selftests: Add KVM_SET_GUEST_DEBUG test KVM: X86: Fix single-step with KVM_SET_GUEST_DEBUG KVM: X86: Set RTM for DB_VECTOR too for KVM_EXIT_DEBUG KVM: x86: fix DR6 delivery for various cases of #DB injection KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
2020-05-16x86/fpu/xstate: Restore supervisor states for signal returnYu-cheng Yu1-5/+39
The signal return fast path directly restores user states from the user buffer. Once that succeeds, restore supervisor states (but only when they are not yet restored). For the slow path, save supervisor states to preserve them across context switches, and restore after the user states are restored. The previous version has the overhead of an XSAVES in both the fast and the slow paths. It is addressed as the following: - In the fast path, only do an XRSTORS. - In the slow path, do a supervisor-state-only XSAVES, and relocate the buffer contents. Some thoughts in the implementation: - In the slow path, can any supervisor state become stale between save/restore? Answer: set_thread_flag(TIF_NEED_FPU_LOAD) protects the xstate buffer. - In the slow path, can any code reference a stale supervisor state register between save/restore? Answer: In the current lazy-restore scheme, any reference to xstate registers needs fpregs_lock()/fpregs_unlock() and __fpregs_load_activate(). - Are there other options? One other option is eagerly restoring all supervisor states. Currently, CET user-mode states and ENQCMD's PASID do not need to be eagerly restored. The upcoming CET kernel-mode states (24 bytes) need to be eagerly restored. To me, eagerly restoring all supervisor states adds more overhead then benefit at this point. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20200512145444.15483-11-yu-cheng.yu@intel.com
2020-05-16x86/fpu/xstate: Preserve supervisor states for the slow path in ↵Yu-cheng Yu1-25/+28
__fpu__restore_sig() The signal return code is responsible for taking an XSAVE buffer present in user memory and loading it into the hardware registers. This operation only affects user XSAVE state and never affects supervisor state. The fast path through this code simply points XRSTOR directly at the user buffer. However, since user memory is not guaranteed to be always mapped, this XRSTOR can fail. If it fails, the signal return code falls back to a slow path which can tolerate page faults. That slow path copies the xfeatures one by one out of the user buffer into the task's fpu state area. However, by being in a context where it can handle page faults, the code can also schedule. The lazy-fpu-load code would think it has an up-to-date fpstate and would fail to save the supervisor state when scheduling the task out. When scheduling back in, it would likely restore stale supervisor state. To fix that, preserve supervisor state before the slow path. Modify copy_user_to_fpregs_zeroing() so that if it fails, fpregs are not zeroed, and there is no need for fpregs_deactivate() and supervisor states are preserved. Move set_thread_flag(TIF_NEED_FPU_LOAD) to the slow path. Without doing this, the fast path also needs supervisor states to be saved first. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200512145444.15483-10-yu-cheng.yu@intel.com
2020-05-16x86/fpu: Introduce copy_supervisor_to_kernel()Yu-cheng Yu2-0/+85
The XSAVES instruction takes a mask and saves only the features specified in that mask. The kernel normally specifies that all features be saved. XSAVES also unconditionally uses the "compacted format" which means that all specified features are saved next to each other in memory. If a feature is removed from the mask, all the features after it will "move up" into earlier locations in the buffer. Introduce copy_supervisor_to_kernel(), which saves only supervisor states and then moves those states into the standard location where they are normally found. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200512145444.15483-9-yu-cheng.yu@intel.com
2020-05-16x86/nmi: Remove edac.h include leftoverBorislav Petkov1-4/+0
... which db47d5f85646 ("x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI") forgot to remove. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200515182246.3553-1-bp@alien8.de
2020-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds1-0/+1
Pull networking fixes from David Miller: 1) Fix sk_psock reference count leak on receive, from Xiyu Yang. 2) CONFIG_HNS should be invisible, from Geert Uytterhoeven. 3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this, from Maciej Żenczykowski. 4) ipv4 route redirect backoff wasn't actually enforced, from Paolo Abeni. 5) Fix netprio cgroup v2 leak, from Zefan Li. 6) Fix infinite loop on rmmod in conntrack, from Florian Westphal. 7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet. 8) Various bpf probe handling fixes, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits) selftests: mptcp: pm: rm the right tmp file dpaa2-eth: properly handle buffer size restrictions bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range bpf: Restrict bpf_probe_read{, str}() only to archs where they work MAINTAINERS: Mark networking drivers as Maintained. ipmr: Add lockdep expression to ipmr_for_each_table macro ipmr: Fix RCU list debugging warning drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810 tcp: fix error recovery in tcp_zerocopy_receive() MAINTAINERS: Add Jakub to networking drivers. MAINTAINERS: another add of Karsten Graul for S390 networking drivers: ipa: fix typos for ipa_smp2p structure doc pppoe: only process PADT targeted at local interfaces selftests/bpf: Enforce returning 0 for fentry/fexit programs bpf: Enforce returning 0 for fentry/fexit progs net: stmmac: fix num_por initialization security: Fix the default value of secid_to_secctx hook libbpf: Fix register naming in PT_REGS s390 macros ...
2020-05-15KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mceJim Mattson1-1/+1
Bank_num is a one-based count of banks, not a zero-based index. It overflows the allocated space only when strictly greater than KVM_MAX_MCE_BANKS. Fixes: a9e38c3e01ad ("KVM: x86: Catch potential overrun in MCE setup") Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Message-Id: <20200511225616.19557-1-jmattson@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15bpf: Restrict bpf_probe_read{, str}() only to archs where they workDaniel Borkmann1-0/+1
Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs with overlapping address ranges, we should really take the next step to disable them from BPF use there. To generally fix the situation, we've recently added new helper variants bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str(). For details on them, see 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user,kernel}_str helpers"). Given bpf_probe_read{,str}() have been around for ~5 years by now, there are plenty of users at least on x86 still relying on them today, so we cannot remove them entirely w/o breaking the BPF tracing ecosystem. However, their use should be restricted to archs with non-overlapping address ranges where they are working in their current form. Therefore, move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and have x86, arm64, arm select it (other archs supporting it can follow-up on it as well). For the remaining archs, they can workaround easily by relying on the feature probe from bpftool which spills out defines that can be used out of BPF C code to implement the drop-in replacement for old/new kernels via: bpftool feature probe macro Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
2020-05-15x86: Fix early boot crash on gcc-10, third tryBorislav Petkov3-1/+15
... or the odyssey of trying to disable the stack protector for the function which generates the stack canary value. The whole story started with Sergei reporting a boot crash with a kernel built with gcc-10: Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b37df9 #139 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013 Call Trace: dump_stack panic ? start_secondary __stack_chk_fail start_secondary secondary_startup_64 -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary This happens because gcc-10 tail-call optimizes the last function call in start_secondary() - cpu_startup_entry() - and thus emits a stack canary check which fails because the canary value changes after the boot_init_stack_canary() call. To fix that, the initial attempt was to mark the one function which generates the stack canary with: __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused) however, using the optimize attribute doesn't work cumulatively as the attribute does not add to but rather replaces previously supplied optimization options - roughly all -fxxx options. The key one among them being -fno-omit-frame-pointer and thus leading to not present frame pointer - frame pointer which the kernel needs. The next attempt to prevent compilers from tail-call optimizing the last function call cpu_startup_entry(), shy of carving out start_secondary() into a separate compilation unit and building it with -fno-stack-protector, was to add an empty asm(""). This current solution was short and sweet, and reportedly, is supported by both compilers but we didn't get very far this time: future (LTO?) optimization passes could potentially eliminate this, which leads us to the third attempt: having an actual memory barrier there which the compiler cannot ignore or move around etc. That should hold for a long time, but hey we said that about the other two solutions too so... Reported-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kalle Valo <kvalo@codeaurora.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org
2020-05-15x86/unwind/orc: Fix error handling in __unwind_start()Josh Poimboeuf1-7/+9
The unwind_state 'error' field is used to inform the reliable unwinding code that the stack trace can't be trusted. Set this field for all errors in __unwind_start(). Also, move the zeroing out of the unwind_state struct to before the ORC table initialization check, to prevent the caller from reading uninitialized data if the ORC table is corrupted. Fixes: af085d9084b4 ("stacktrace/x86: add function for detecting reliable stack traces") Fixes: d3a09104018c ("x86/unwinder/orc: Dont bail on stack overflow") Fixes: 98d0c8ebf77e ("x86/unwind/orc: Prevent unwinding before ORC initialization") Reported-by: Pavel Machek <pavel@denx.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/d6ac7215a84ca92b895fdd2e1aa546729417e6e6.1589487277.git.jpoimboe@redhat.com
2020-05-14Merge tag 'trace-v5.7-rc5' of ↵Linus Torvalds3-1/+37
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing fixes from Steven Rostedt: "Various tracing fixes: - Fix a crash when having function tracing and function stack tracing on the command line. The ftrace trampolines are created as executable and read only. But the stack tracer tries to modify them with text_poke() which expects all kernel text to still be writable at boot. Keep the trampolines writable at boot, and convert them to read-only with the rest of the kernel. - A selftest was triggering in the ring buffer iterator code, that is no longer valid with the update of keeping the ring buffer writable while a iterator is reading. Just bail after three failed attempts to get an event and remove the warning and disabling of the ring buffer. - While modifying the ring buffer code, decided to remove all the unnecessary BUG() calls" * tag 'trace-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Remove all BUG() calls ring-buffer: Don't deactivate the ring buffer on failed iterator reads x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
2020-05-14x86/fpu/xstate: Update copy_kernel_to_xregs_err() for supervisor statesYu-cheng Yu1-1/+4
The function copy_kernel_to_xregs_err() uses XRSTOR which can work with standard or compacted format without supervisor xstates. However, when supervisor xstates are present, XRSTORS must be used. Fix it by using XRSTORS when supervisor state handling is enabled. I also considered if there were additional cases where XRSTOR might be mistakenly called instead of XRSTORS. There are only three XRSTOR sites in the kernel: 1. copy_kernel_to_xregs_booting(), already switches between XRSTOR and XRSTORS based on X86_FEATURE_XSAVES. 2. copy_user_to_xregs(), which *needs* XRSTOR because it is copying from userspace and must never copy supervisor state with XRSTORS. 3. copy_kernel_to_xregs_err() mistakenly used XRSTOR only. Fix it. [ bp: Massage commit message. ] Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20200512145444.15483-8-yu-cheng.yu@intel.com