summaryrefslogtreecommitdiffstats
path: root/tools/objtool
AgeCommit message (Collapse)AuthorFilesLines
2023-01-09objtool: Tolerate STT_NOTYPE symbols at end of sectionNicholas Piggin1-0/+9
Hand-written asm often contains non-function symbols in executable sections. _end symbols for finding the size of instruction blocks for runtime processing is one such usage. optprobe_template_end is one example that causes the warning: objtool: optprobe_template_end(): can't find starting instruction This is because the symbol happens to be at the end of the file (and therefore end of a section in the object file). So ignore end-of-section STT_NOTYPE symbols instead of bailing out because an instruction can't be found. While we're here, add a more descriptive warning for STT_FUNC symbols found at the end of a section. [ This also solves a PowerPC regression reported by Sathvika Vasireddy. ] Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reported-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Sathvika Vasireddy <sv@linux.ibm.com> Link: https://lore.kernel.org/r/20221220101323.3119939-1-npiggin@gmail.com
2022-12-19Merge tag 'powerpc-6.2-1' of ↵Linus Torvalds19-56/+269
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add powerpc qspinlock implementation optimised for large system scalability and paravirt. See the merge message for more details - Enable objtool to be built on powerpc to generate mcount locations - Use a temporary mm for code patching with the Radix MMU, so the writable mapping is restricted to the patching CPU - Add an option to build the 64-bit big-endian kernel with the ELFv2 ABI - Sanitise user registers on interrupt entry on 64-bit Book3S - Many other small features and fixes Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang, Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A. R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng, XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu, and Wolfram Sang. * tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits) powerpc/code-patching: Fix oops with DEBUG_VM enabled powerpc/qspinlock: Fix 32-bit build powerpc/prom: Fix 32-bit build powerpc/rtas: mandate RTAS syscall filtering powerpc/rtas: define pr_fmt and convert printk call sites powerpc/rtas: clean up includes powerpc/rtas: clean up rtas_error_log_max initialization powerpc/pseries/eeh: use correct API for error log size powerpc/rtas: avoid scheduling in rtas_os_term() powerpc/rtas: avoid device tree lookups in rtas_os_term() powerpc/rtasd: use correct OF API for event scan rate powerpc/rtas: document rtas_call() powerpc/pseries: unregister VPA when hot unplugging a CPU powerpc/pseries: reset the RCU watchdogs after a LPM powerpc: Take in account addition CPU node when building kexec FDT powerpc: export the CPU node count powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state powerpc/dts/fsl: Fix pca954x i2c-mux node names cxl: Remove unnecessary cxl_pci_window_alignment() selftests/powerpc: Fix resource leaks ...
2022-12-14Merge tag 'x86_core_for_v6.2' of ↵Linus Torvalds10-182/+566
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Add the call depth tracking mitigation for Retbleed which has been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups * tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) x86/paravirt: Use common macro for creating simple asm paravirt functions x86/paravirt: Remove clobber bitmask from .parainstructions x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit x86/Kconfig: Enable kernel IBT by default x86,pm: Force out-of-line memcpy() objtool: Fix weak hole vs prefix symbol objtool: Optimize elf_dirty_reloc_sym() x86/cfi: Add boot time hash randomization x86/cfi: Boot time selection of CFI scheme x86/ibt: Implement FineIBT objtool: Add --cfi to generate the .cfi_sites section x86: Add prefix symbols for function padding objtool: Add option to generate prefix symbols objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf objtool: Slice up elf_create_section_symbol() kallsyms: Revert "Take callthunks into account" x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces x86/retpoline: Fix crash printing warning x86/paravirt: Fix a !PARAVIRT build warning ...
2022-11-23objtool/powerpc: Implement arch_pc_relative_reloc()Michael Ellerman2-0/+11
Provide an implementation for arch_pc_relative_reloc(). It is needed to pass the build once 61c6065ef7ec ("objtool: Allow !PC relative relocations") is merged. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2022-11-18objtool/powerpc: Add --mcount specific implementationSathvika Vasireddy2-0/+18
This patch enables objtool --mcount on powerpc, and adds implementation specific to powerpc. Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-17-sv@linux.ibm.com
2022-11-18objtool/powerpc: Enable objtool to be built on ppcSathvika Vasireddy6-0/+146
This patch adds [stub] implementations for required functions, inorder to enable objtool build on powerpc. [Christophe Leroy: powerpc: Add missing asm/asm.h for objtool, Use local variables for type and imm in arch_decode_instruction(), Adapt len for prefixed instructions.] Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-16-sv@linux.ibm.com
2022-11-18objtool: Add arch specific function arch_ftrace_match()Sathvika Vasireddy3-1/+8
Add architecture specific function to look for relocation records pointing to architecture specific symbols. Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-15-sv@linux.ibm.com
2022-11-18objtool: Use macros to define arch specific reloc typesSathvika Vasireddy2-1/+3
Make relocation types architecture specific. Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-14-sv@linux.ibm.com
2022-11-18objtool: Read special sections with alts only when specific options are selectedSathvika Vasireddy1-3/+5
Call add_special_section_alts() only when stackval or orc or uaccess or noinstr options are passed to objtool. Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-13-sv@linux.ibm.com
2022-11-18objtool: Add --mnop as an option to --mcountSathvika Vasireddy3-9/+25
Some architectures (powerpc) may not support ftrace locations being nop'ed out at build time. Introduce CONFIG_HAVE_OBJTOOL_NOP_MCOUNT for objtool, as a means for architectures to enable nop'ing of ftrace locations. Add --mnop as an option to objtool --mcount, to indicate support for the same. Also, make sure that --mnop can be passed as an option to objtool only when --mcount is passed. Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-12-sv@linux.ibm.com
2022-11-18objtool: Use target file class size instead of a compiled constantChristophe Leroy3-10/+24
In order to allow using objtool on cross-built kernels, determine size of long from elf data instead of using sizeof(long) at build time. For the time being this covers only mcount. [Sathvika Vasireddy: Rename variable "size" to "addrsize" and function "elf_class_size()" to "elf_class_addrsize()", and modify create_mcount_loc_sections() function to follow reverse christmas tree format to order local variable declarations.] Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-11-sv@linux.ibm.com
2022-11-18objtool: Use target file endianness instead of a compiled constantChristophe Leroy6-31/+30
Some architectures like powerpc support both endianness, it's therefore not possible to fix the endianness via arch/endianness.h because there is no easy way to get the target endianness at build time. Use the endianness recorded in the file objtool is working on. Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-10-sv@linux.ibm.com
2022-11-18objtool: Fix SEGFAULTChristophe Leroy1-1/+1
find_insn() will return NULL in case of failure. Check insn in order to avoid a kernel Oops for NULL pointer dereference. Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-9-sv@linux.ibm.com
2022-11-05objtool: Fix weak hole vs prefix symbolPeter Zijlstra1-1/+21
Boris (and the robot) reported that objtool grew a new complaint about unreachable instructions. Upon inspection it was immediately clear the __weak zombie instructions struck again. For the unweary, the linker will simply remove the symbol for overriden __weak symbols but leave the instructions in place, creating unreachable instructions -- and objtool likes to report these. Commit 4adb23686795 ("objtool: Ignore extra-symbol code") was supposed to have dealt with that, but the new commit 9f2899fe36a6 ("objtool: Add option to generate prefix symbols") subtly broke that logic by created unvisited symbols. Fixes: 9f2899fe36a6 ("objtool: Add option to generate prefix symbols") Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2022-11-05objtool: Optimize elf_dirty_reloc_sym()Peter Zijlstra2-17/+12
When moving a symbol in the symtab its index changes and any reloc referring that symtol-table-index will need to be rewritten too. In order to facilitate this, objtool simply marks the whole reloc section 'changed' which will cause the whole section to be re-generated. However, finding the relocs that use any given symbol is implemented rather crudely -- a fully iteration of all sections and their relocs. Given that some builds have over 20k sections (kallsyms etc..) iterating all that for *each* symbol moved takes a bit of time. Instead have each symbol keep a list of relocs that reference it. This *vastly* improves build times for certain configs. Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Y2LlRA7x+8UsE1xf@hirez.programming.kicks-ass.net
2022-11-01objtool: Add --cfi to generate the .cfi_sites sectionPeter Zijlstra3-0/+71
Add the location of all __cfi_##name symbols (as generated by kCFI) to a section such that we might re-write things at kernel boot. Notably; boot time re-hashing and FineIBT are the intended use of this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221027092842.568039454@infradead.org
2022-11-01objtool: Add option to generate prefix symbolsPeter Zijlstra5-1/+67
When code is compiled with: -fpatchable-function-entry=${PADDING_BYTES},${PADDING_BYTES} functions will have PADDING_BYTES of NOP in front of them. Unwinders and other things that symbolize code locations will typically attribute these bytes to the preceding function. Given that these bytes nominally belong to the following symbol this mis-attribution is confusing. Inspired by the fact that CFI_CLANG emits __cfi_##name symbols to claim these bytes, allow objtool to emit __pfx_##name symbols to do the same. Therefore add the objtool --prefix=N argument, to conditionally place a __pfx_##name symbol at N bytes ahead of symbol 'name' when: all these preceding bytes are NOP and name-N is an instruction boundary. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yujie Liu <yujie.liu@intel.com> Link: https://lkml.kernel.org/r/20221028194453.526899822@infradead.org
2022-11-01objtool: Avoid O(bloody terrible) behaviour -- an ode to libelfPeter Zijlstra2-7/+84
Due to how gelf_update_sym*() requires an Elf_Data pointer, and how libelf keeps Elf_Data in a linked list per section, elf_update_symbol() ends up having to iterate this list on each update to find the correct Elf_Data for the index'ed symbol. By allocating one Elf_Data per new symbol, the list grows per new symbol, giving an effective O(n^2) insertion time. This is obviously bloody terrible. Therefore over-allocate the Elf_Data when an extention is needed. Except it turns out libelf disregards Elf_Scn::sh_size in favour of the sum of Elf_Data::d_size. IOW it will happily write out all the unused space and fill it with: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND entries (aka zeros). Which obviously violates the STB_LOCAL placement rule, and is a general pain in the backside for not being the desired behaviour. Manually fix-up the Elf_Data size to avoid this problem before calling elf_update(). This significantly improves performance when adding a significant number of symbols. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yujie Liu <yujie.liu@intel.com> Link: https://lkml.kernel.org/r/20221028194453.461658986@infradead.org
2022-11-01objtool: Slice up elf_create_section_symbol()Peter Zijlstra1-21/+35
In order to facilitate creation of more symbol types, slice up elf_create_section_symbol() to extract a generic helper that deals with adding ELF symbols. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yujie Liu <yujie.liu@intel.com> Link: https://lkml.kernel.org/r/20221028194453.396634875@infradead.org
2022-10-18objtool, kcsan: Add volatile read/write instrumentation to whitelistMarco Elver1-0/+10
Adds KCSAN's volatile instrumentation to objtool's uaccess whitelist. Recent kernel change have shown that this was missing from the uaccess whitelist (since the first upstreamed version of KCSAN): mm/gup.o: warning: objtool: fault_in_readable+0x101: call to __tsan_volatile_write1() with UACCESS enabled Fixes: 75d75b7a4d54 ("kcsan: Support distinguishing volatile accesses") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-17objtool: Allow STT_NOTYPE -> STT_FUNC+0 sibling-callsPeter Zijlstra1-27/+47
Teach objtool about STT_NOTYPE -> STT_FUNC+0 sibling calls. Doing do allows slightly simpler .S files. There is a slight complication in that we specifically do not want to allow sibling calls from symbol holes (previously covered by STT_WEAK symbols) -- such things exist where a weak function has a .cold subfunction for example. Additionally, STT_NOTYPE tail-calls are allowed to happen with a modified stack frame, they don't need to obey the normal rules after all. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2022-10-17objtool: Rework instruction -> symbol mappingPeter Zijlstra2-51/+66
Currently insn->func contains a instruction -> symbol link for STT_FUNC symbols. A NULL value is assumed to mean STT_NOTYPE. However, there are also instructions not covered by any symbol at all. This can happen due to __weak symbols for example. Since the current scheme cannot differentiate between no symbol and STT_NOTYPE symbol, change things around. Make insn->sym point to any symbol type such that !insn->sym means no symbol and add a helper insn_func() that check the sym->type to retain the old functionality. This then prepares the way to add code that depends on the distinction between STT_NOTYPE and no symbol at all. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2022-10-17objtool: Allow symbol range comparisons for IBT/ENDBRPeter Zijlstra1-0/+28
A semi common pattern is where code checks if a code address is within a specific range. All text addresses require either ENDBR or ANNOTATE_ENDBR, however the ANNOTATE_NOENDBR past the range is unnatural. Instead, suppress this warning when this is exactly at the end of a symbol that itself starts with either ENDBR/ANNOTATE_ENDBR. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111146.434642471@infradead.org
2022-10-17objtool: Fix find_{symbol,func}_containing()Peter Zijlstra2-54/+42
The current find_{symbol,func}_containing() functions are broken in the face of overlapping symbols, exactly the case that is needed for a new ibt/endbr supression. Import interval_tree_generic.h into the tools tree and convert the symbol tree to an interval tree to support proper range stabs. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111146.330203761@infradead.org
2022-10-17objtool: Add --hacks=skylakePeter Zijlstra3-5/+13
Make the call/func sections selectable via the --hacks option. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111146.120821440@infradead.org
2022-10-17objtool: Add .call_sites sectionPeter Zijlstra3-0/+53
In preparation for call depth tracking provide a section which collects all direct calls. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111146.016511961@infradead.org
2022-10-17objtool: Track init sectionPeter Zijlstra2-8/+11
For future usage of .init.text exclusion track the init section in the instruction decoder and use the result in retpoline validation. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111145.910334431@infradead.org
2022-10-17objtool: Allow !PC relative relocationsPeter Zijlstra3-1/+27
Objtool doesn't currently much like per-cpu usage in alternatives: arch/x86/entry/entry_64.o: warning: objtool: .altinstr_replacement+0xf: unsupported relocation in alternatives section f: 65 c7 04 25 00 00 00 00 00 00 00 80 movl $0x80000000,%gs:0x0 13: R_X86_64_32S __x86_call_depth Since the R_X86_64_32S relocation is location invariant (it's computation doesn't include P - the address of the location itself), it can be trivially allowed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111145.806607235@infradead.org
2022-10-10Merge tag 'mm-stable-2022-10-08' of ↵Linus Torvalds1-0/+20
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
2022-10-10Merge tag 'objtool-core-2022-10-07' of ↵Linus Torvalds2-4/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Remove the "ANNOTATE_NOENDBR on ENDBR" warning: it's not really useful and only found a non-bug false positive so far. - Properly decode LOOP/LOOPE/LOOPNE, which were missing from the x86 decoder. Because these instructions are rather ineffective, they never showed up in compiler output, but they are simple enough to support, so add them for completeness. - A bit more cross-arch preparatory work. * tag 'objtool-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool,x86: Teach decode about LOOP* instructions objtool: Remove "ANNOTATE_NOENDBR on ENDBR" warning objtool: Use arch_jump_destination() in read_intra_function_calls()
2022-10-04Merge tag 'net-next-6.1' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Introduce and use a single page frag cache for allocating small skb heads, clawing back the 10-20% performance regression in UDP flood test from previous fixes. - Run packets which already went thru HW coalescing thru SW GRO. This significantly improves TCP segment coalescing and simplifies deployments as different workloads benefit from HW or SW GRO. - Shrink the size of the base zero-copy send structure. - Move TCP init under a new slow / sleepable version of DO_ONCE(). BPF: - Add BPF-specific, any-context-safe memory allocator. - Add helpers/kfuncs for PKCS#7 signature verification from BPF programs. - Define a new map type and related helpers for user space -> kernel communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF). - Allow targeting BPF iterators to loop through resources of one task/thread. - Add ability to call selected destructive functions. Expose crash_kexec() to allow BPF to trigger a kernel dump. Use CAP_SYS_BOOT check on the loading process to judge permissions. - Enable BPF to collect custom hierarchical cgroup stats efficiently by integrating with the rstat framework. - Support struct arguments for trampoline based programs. Only structs with size <= 16B and x86 are supported. - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping sockets (instead of just TCP and UDP sockets). - Add a helper for accessing CLOCK_TAI for time sensitive network related programs. - Support accessing network tunnel metadata's flags. - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open. - Add support for writing to Netfilter's nf_conn:mark. Protocols: - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation (MLO) work (802.11be, WiFi 7). - vsock: improve support for SO_RCVLOWAT. - SMC: support SO_REUSEPORT. - Netlink: define and document how to use netlink in a "modern" way. Support reporting missing attributes via extended ACK. - IPSec: support collect metadata mode for xfrm interfaces. - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST packets. - TCP: introduce optional per-netns connection hash table to allow better isolation between namespaces (opt-in, at the cost of memory and cache pressure). - MPTCP: support TCP_FASTOPEN_CONNECT. - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior. - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets. - Open vSwitch: - Allow specifying ifindex of new interfaces. - Allow conntrack and metering in non-initial user namespace. - TLS: support the Korean ARIA-GCM crypto algorithm. - Remove DECnet support. Driver API: - Allow selecting the conduit interface used by each port in DSA switches, at runtime. - Ethernet Power Sourcing Equipment and Power Device support. - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per traffic class max frame size for time-based packet schedules. - Support PHY rate matching - adapting between differing host-side and link-side speeds. - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode. - Validate OF (device tree) nodes for DSA shared ports; make phylink-related properties mandatory on DSA and CPU ports. Enforcing more uniformity should allow transitioning to phylink. - Require that flash component name used during update matches one of the components for which version is reported by info_get(). - Remove "weight" argument from driver-facing NAPI API as much as possible. It's one of those magic knobs which seemed like a good idea at the time but is too indirect to use in practice. - Support offload of TLS connections with 256 bit keys. New hardware / drivers: - Ethernet: - Microchip KSZ9896 6-port Gigabit Ethernet Switch - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs - Analog Devices ADIN1110 and ADIN2111 industrial single pair Ethernet (10BASE-T1L) MAC+PHY. - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP). - Ethernet SFPs / modules: - RollBall / Hilink / Turris 10G copper SFPs - HALNy GPON module - WiFi: - CYW43439 SDIO chipset (brcmfmac) - CYW89459 PCIe chipset (brcmfmac) - BCM4378 on Apple platforms (brcmfmac) Drivers: - CAN: - gs_usb: HW timestamp support - Ethernet PHYs: - lan8814: cable diagnostics - Ethernet NICs: - Intel (100G): - implement control of FCS/CRC stripping - port splitting via devlink - L2TPv3 filtering offload - nVidia/Mellanox: - tunnel offload for sub-functions - MACSec offload, w/ Extended packet number and replay window offload - significantly restructure, and optimize the AF_XDP support, align the behavior with other vendors - Huawei: - configuring DSCP map for traffic class selection - querying standard FEC statistics - querying SerDes lane number via ethtool - Marvell/Cavium: - egress priority flow control - MACSec offload - AMD/SolarFlare: - PTP over IPv6 and raw Ethernet - small / embedded: - ax88772: convert to phylink (to support SFP cages) - altera: tse: convert to phylink - ftgmac100: support fixed link - enetc: standard Ethtool counters - macb: ZynqMP SGMII dynamic configuration support - tsnep: support multi-queue and use page pool - lan743x: Rx IP & TCP checksum offload - igc: add xdp frags support to ndo_xdp_xmit - Ethernet high-speed switches: - Marvell (prestera): - support SPAN port features (traffic mirroring) - nexthop object offloading - Microchip (sparx5): - multicast forwarding offload - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets) - Ethernet embedded switches: - Marvell (mv88e6xxx): - support RGMII cmode - NXP (felix): - standardized ethtool counters - Microchip (lan966x): - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets) - traffic policing and mirroring - link aggregation / bonding offload - QUSGMII PHY mode support - Qualcomm 802.11ax WiFi (ath11k): - cold boot calibration support on WCN6750 - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - Wake-on-WLAN support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz - MediaTek WiFi (mt76): - WiFi-to-Ethernet bridging offload for MT7986 chips - RealTek WiFi (rtw89): - P2P support" * tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits) eth: pse: add missing static inlines once: rename _SLOW to _SLEEPABLE net: pse-pd: add regulator based PSE driver dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller ethtool: add interface to interact with Ethernet Power Equipment net: mdiobus: search for PSE nodes by parsing PHY nodes. net: mdiobus: fwnode_mdiobus_register_phy() rework error handling net: add framework to support Ethernet PSE and PDs devices dt-bindings: net: phy: add PoDL PSE property net: marvell: prestera: Propagate nh state from hw to kernel net: marvell: prestera: Add neighbour cache accounting net: marvell: prestera: add stub handler neighbour events net: marvell: prestera: Add heplers to interact with fib_notifier_info net: marvell: prestera: Add length macros for prestera_ip_addr net: marvell: prestera: add delayed wq and flush wq on deinit net: marvell: prestera: Add strict cleanup of fib arbiter net: marvell: prestera: Add cleanup of allocated fib_nodes net: marvell: prestera: Add router nexthops ABI eth: octeon: fix build after netif_napi_add() changes net/mlx5: E-Switch, Return EBUSY if can't get mode lock ...
2022-10-04Merge tag 'x86_cpu_for_v6.1_rc1' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Borislav Petkov: - Print the CPU number at segfault time. The number printed is not always accurate (preemption is enabled at that time) but the print string contains "likely" and after a lot of back'n'forth on this, this was the consensus that was reached. See thread at [1]. - After a *lot* of testing and polishing, finally the clear_user() improvements to inline REP; STOSB by default Link: https://lore.kernel.org/r/5d62c1d0-7425-d5bb-ecb5-1dc3b4d7d245@intel.com [1] * tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Print likely CPU at segfault time x86/clear_user: Make it faster
2022-10-03objtool: kmsan: list KMSAN API functions as uaccess-safeAlexander Potapenko1-0/+20
KMSAN inserts API function calls in a lot of places (function entries and exits, local variables, memory accesses), so they may get called from the uaccess regions as well. KMSAN API functions are used to update the metadata (shadow/origin pages) for kernel memory accesses. The metadata pages for kernel pointers are also located in the kernel memory, so touching them is not a problem. For userspace pointers, no metadata is allocated. If an API function is supposed to read or modify the metadata, it does so for kernel pointers and ignores userspace pointers. If an API function is supposed to return a pair of metadata pointers for the instrumentation to use (like all __msan_metadata_ptr_for_TYPE_SIZE() functions do), it returns the allocated metadata for kernel pointers and special dummy buffers residing in the kernel memory for userspace pointers. As a result, none of KMSAN API functions perform userspace accesses, but since they might be called from UACCESS regions they use user_access_save/restore(). Link: https://lkml.kernel.org/r/20220915150417.722975-32-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26objtool: Disable CFI warningsSami Tolvanen1-1/+6
The __cfi_ preambles contain a mov instruction that embeds the KCFI type identifier in the following format: ; type preamble __cfi_function: mov <id>, %eax function: ... While the preamble symbols are STT_FUNC and contain valid instructions, they are never executed and always fall through. Skip the warning for them. .kcfi_traps sections point to CFI traps in text sections. Also skip the warning about them referencing !ENDBR instructions. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220908215504.3686827-18-samitolvanen@google.com
2022-09-26objtool: Preserve special st_shndx indexes in elf_update_symbolSami Tolvanen1-1/+6
elf_update_symbol fails to preserve the special st_shndx values between [SHN_LORESERVE, SHN_HIRESERVE], which results in it converting SHN_ABS entries into SHN_UNDEF, for example. Explicitly check for the special indexes and ensure these symbols are not marked undefined. Fixes: ead165fa1042 ("objtool: Fix symbol creation") Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220908215504.3686827-17-samitolvanen@google.com
2022-09-16ftrace: Add HAVE_DYNAMIC_FTRACE_NO_PATCHABLEPeter Zijlstra (Intel)1-1/+2
x86 will shortly start using -fpatchable-function-entry for purposes other than ftrace, make sure the __patchable_function_entry section isn't merged in the mcount_loc section. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220903131154.420467-2-jolsa@kernel.org
2022-09-15objtool,x86: Teach decode about LOOP* instructionsPeter Zijlstra1-0/+6
When 'discussing' control flow Masami mentioned the LOOP* instructions and I realized objtool doesn't decode them properly. As it turns out, these instructions are somewhat inefficient and as such unlikely to be emitted by the compiler (a few vmlinux.o checks can't find a single one) so this isn't critical, but still, best to decode them properly. Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Yxhd4EMKyoFoH9y4@hirez.programming.kicks-ass.net
2022-08-28Merge tag 'x86-urgent-2022-08-28' of ↵Linus Torvalds1-16/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix PAT on Xen, which caused i915 driver failures - Fix compat INT 80 entry crash on Xen PV guests - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs - Fix RSB stuffing regressions - Fix ORC unwinding on ftrace trampolines - Add Intel Raptor Lake CPU model number - Fix (work around) a SEV-SNP bootloader bug providing bogus values in boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups. - Fix SEV-SNP early boot failure - Fix the objtool list of noreturn functions and annotate snp_abort(), which bug confused objtool on gcc-12. - Fix the documentation for retbleed * tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/ABI: Mention retbleed vulnerability info file for sysfs x86/sev: Mark snp_abort() noreturn x86/sev: Don't use cc_platform_has() for early SEV-SNP calls x86/boot: Don't propagate uninitialized boot_params->cc_blob_address x86/cpu: Add new Raptor Lake CPU model number x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry x86/nospec: Fix i386 RSB stuffing x86/nospec: Unwreck the RSB stuffing x86/bugs: Add "unknown" reporting for MMIO Stale Data x86/entry: Fix entry_INT80_compat for Xen PV guests x86/PAT: Have pat_enabled() properly reflect state when running on Xen
2022-08-25x86/sev: Mark snp_abort() noreturnBorislav Petkov1-16/+18
Mark both the function prototype and definition as noreturn in order to prevent the compiler from doing transformations which confuse objtool like so: vmlinux.o: warning: objtool: sme_enable+0x71: unreachable instruction This triggers with gcc-12. Add it and sev_es_terminate() to the objtool noreturn tracking array too. Sort it while at it. Suggested-by: Michael Matz <matz@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220824152420.20547-1-bp@alien8.de
2022-08-19objtool: Remove "ANNOTATE_NOENDBR on ENDBR" warningJosh Poimboeuf1-3/+0
This warning isn't very useful: why would you put ANNOTATE_NOENDBR on ENDBR, and if you did, what's the harm? And thus far it's only found one non-bug, where the '__end_entry_SYSENTER_compat' label happens to land on the ENDBR from entry_SYSCALL_compat: vmlinux.o: warning: objtool: entry_SYSCALL_compat+0x0: ANNOTATE_NOENDBR on ENDBR .. which is fine. Just remove the warning. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/142341a5dafdfc788e4c95b9e226a6eefc9b626e.1660839773.git.jpoimboe@kernel.org
2022-08-19objtool: Use arch_jump_destination() in read_intra_function_calls()Chen Zhongjin1-1/+1
Use arch_jump_destiation() instead of the open-coded 'offset + len + immediate' that is x86 specific. Avoids future trouble with other architectures. Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220818014553.220261-1-chenzhongjin@huawei.com
2022-08-19x86/ibt, objtool: Add IBT_NOSEAL()Josh Poimboeuf1-1/+2
Add a macro which prevents a function from getting sealed if there are no compile-time references to it. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Message-Id: <20220818213927.e44fmxkoq4yj6ybn@treble> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-18x86/clear_user: Make it fasterBorislav Petkov1-0/+3
Based on a patch by Mark Hemment <markhemm@googlemail.com> and incorporating very sane suggestions from Linus. The point here is to have the default case with FSRM - which is supposed to be the majority of x86 hw out there - if not now then soon - be directly inlined into the instruction stream so that no function call overhead is taking place. Drop the early clobbers from the @size and @addr operands as those are not needed anymore since we have single instruction alternatives. The benchmarks I ran would show very small improvements and a PF benchmark would even show weird things like slowdowns with higher core counts. So for a ~6m running the git test suite, the function gets called under 700K times, all from padzero(): <...>-2536 [006] ..... 261.208801: padzero: to: 0x55b0663ed214, size: 3564, cycles: 21900 <...>-2536 [006] ..... 261.208819: padzero: to: 0x7f061adca078, size: 3976, cycles: 17160 <...>-2537 [008] ..... 261.211027: padzero: to: 0x5572d019e240, size: 3520, cycles: 23850 <...>-2537 [008] ..... 261.211049: padzero: to: 0x7f1288dc9078, size: 3976, cycles: 15900 ... which is around 1%-ish of the total time and which is consistent with the benchmark numbers. So Mel gave me the idea to simply measure how fast the function becomes. I.e.: start = rdtsc_ordered(); ret = __clear_user(to, n); end = rdtsc_ordered(); Computing the mean average of all the samples collected during the test suite run then shows some improvement: clear_user_original: Amean: 9219.71 (Sum: 6340154910, samples: 687674) fsrm: Amean: 8030.63 (Sum: 5522277720, samples: 687652) That's on Zen3. The situation looks a lot more confusing on Intel: Icelake: clear_user_original: Amean: 19679.4 (Sum: 13652560764, samples: 693750) Amean: 19743.7 (Sum: 13693470604, samples: 693562) (I ran it twice just to be sure.) ERMS: Amean: 20374.3 (Sum: 13910601024, samples: 682752) Amean: 20453.7 (Sum: 14186223606, samples: 693576) FSRM: Amean: 20458.2 (Sum: 13918381386, sample s: 680331) The original microbenchmark which people were complaining about: for i in $(seq 1 10); do dd if=/dev/zero of=/dev/null bs=1M status=progress count=65536; done 2>&1 | grep copied 32207011840 bytes (32 GB, 30 GiB) copied, 1 s, 32.2 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.93069 s, 35.6 GB/s 37597741056 bytes (38 GB, 35 GiB) copied, 1 s, 37.6 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.78017 s, 38.6 GB/s 62020124672 bytes (62 GB, 58 GiB) copied, 2 s, 31.0 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 2.13716 s, 32.2 GB/s 60010004480 bytes (60 GB, 56 GiB) copied, 1 s, 60.0 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.14129 s, 60.2 GB/s 53212086272 bytes (53 GB, 50 GiB) copied, 1 s, 53.2 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.28398 s, 53.5 GB/s 55698259968 bytes (56 GB, 52 GiB) copied, 1 s, 55.7 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.22507 s, 56.1 GB/s 55306092544 bytes (55 GB, 52 GiB) copied, 1 s, 55.3 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.23647 s, 55.6 GB/s 54387539968 bytes (54 GB, 51 GiB) copied, 1 s, 54.4 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.25693 s, 54.7 GB/s 50566529024 bytes (51 GB, 47 GiB) copied, 1 s, 50.6 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.35096 s, 50.9 GB/s 58308165632 bytes (58 GB, 54 GiB) copied, 1 s, 58.3 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.17394 s, 58.5 GB/s Now the same thing with smaller buffers: for i in $(seq 1 10); do dd if=/dev/zero of=/dev/null bs=1M status=progress count=8192; done 2>&1 | grep copied 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.28485 s, 30.2 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.276112 s, 31.1 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.29136 s, 29.5 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.283803 s, 30.3 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.306503 s, 28.0 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.349169 s, 24.6 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.276912 s, 31.0 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.265356 s, 32.4 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.28464 s, 30.2 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.242998 s, 35.3 GB/s is also not conclusive because it all depends on the buffer sizes, their alignments and when the microcode detects that cachelines can be aggregated properly and copied in bigger sizes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/CAHk-=wh=Mu_EYhtOmPn6AxoQZyEh-4fo2Zx3G7rBv1g7vwoKiw@mail.gmail.com
2022-08-02Merge tag 'docs-6.0' of git://git.lwn.net/linuxLinus Torvalds1-1/+1
Pull documentation updates from Jonathan Corbet: "This was a moderately busy cycle for documentation, but nothing all that earth-shaking: - More Chinese translations, and an update to the Italian translations. The Japanese, Korean, and traditional Chinese translations are more-or-less unmaintained at this point, instead. - Some build-system performance improvements. - The removal of the archaic submitting-drivers.rst document, with the movement of what useful material that remained into other docs. - Improvements to sphinx-pre-install to, hopefully, give more useful suggestions. - A number of build-warning fixes Plus the usual collection of typo fixes, updates, and more" * tag 'docs-6.0' of git://git.lwn.net/linux: (92 commits) docs: efi-stub: Fix paths for x86 / arm stubs Docs/zh_CN: Update the translation of sched-stats to 5.19-rc8 Docs/zh_CN: Update the translation of pci to 5.19-rc8 Docs/zh_CN: Update the translation of pci-iov-howto to 5.19-rc8 Docs/zh_CN: Update the translation of usage to 5.19-rc8 Docs/zh_CN: Update the translation of testing-overview to 5.19-rc8 Docs/zh_CN: Update the translation of sparse to 5.19-rc8 Docs/zh_CN: Update the translation of kasan to 5.19-rc8 Docs/zh_CN: Update the translation of iio_configfs to 5.19-rc8 doc:it_IT: align Italian documentation docs: Remove spurious tag from admin-guide/mm/overcommit-accounting.rst Documentation: process: Update email client instructions for Thunderbird docs: ABI: correct QEMU fw_cfg spec path doc/zh_CN: remove submitting-driver reference from docs docs: zh_TW: align to submitting-drivers removal docs: zh_CN: align to submitting-drivers removal docs: ko_KR: howto: remove reference to removed submitting-drivers docs: ja_JP: howto: remove reference to removed submitting-drivers docs: it_IT: align to submitting-drivers removal docs: process: remove outdated submitting-drivers.rst ...
2022-07-11Merge tag 'x86_bugs_retbleed' of ↵Linus Torvalds9-22/+356
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 retbleed fixes from Borislav Petkov: "Just when you thought that all the speculation bugs were addressed and solved and the nightmare is complete, here's the next one: speculating after RET instructions and leaking privileged information using the now pretty much classical covert channels. It is called RETBleed and the mitigation effort and controlling functionality has been modelled similar to what already existing mitigations provide" * tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) x86/speculation: Disable RRSBA behavior x86/kexec: Disable RET on kexec x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported x86/entry: Move PUSH_AND_CLEAR_REGS() back into error_entry x86/bugs: Add Cannon lake to RETBleed affected CPU list x86/retbleed: Add fine grained Kconfig knobs x86/cpu/amd: Enumerate BTC_NO x86/common: Stamp out the stepping madness KVM: VMX: Prevent RSB underflow before vmenter x86/speculation: Fill RSB on vmexit for IBRS KVM: VMX: Fix IBRS handling after vmexit KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS KVM: VMX: Convert launched argument to flags KVM: VMX: Flatten __vmx_vcpu_run() objtool: Re-add UNWIND_HINT_{SAVE_RESTORE} x86/speculation: Remove x86_spec_ctrl_mask x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit x86/speculation: Fix SPEC_CTRL write on SMT state change x86/speculation: Fix firmware entry SPEC_CTRL handling x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n ...
2022-07-07objtool: update objtool.txt referencesMauro Carvalho Chehab1-1/+1
Changeset a8e35fece49b ("objtool: Update documentation") renamed: tools/objtool/Documentation/stack-validation.txt to: tools/objtool/Documentation/objtool.txt. Update the cross-references accordingly. Fixes: a8e35fece49b ("objtool: Update documentation") Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Link: https://lore.kernel.org/r/ec285ece6348a5be191aebe45f78d06b3319056b.1656234456.git.mchehab@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-07-07x86/ibt, objtool: Don't discard text references from tracepoint sectionPeter Zijlstra1-2/+1
On Tue, Jun 28, 2022 at 04:28:58PM +0800, Pengfei Xu wrote: > # ./ftracetest > === Ftrace unit tests === > [1] Basic trace file check [PASS] > [2] Basic test for tracers [PASS] > [3] Basic trace clock test [PASS] > [4] Basic event tracing check [PASS] > [5] Change the ringbuffer size [PASS] > [6] Snapshot and tracing setting [PASS] > [7] trace_pipe and trace_marker [PASS] > [8] Test ftrace direct functions against tracers [UNRESOLVED] > [9] Test ftrace direct functions against kprobes [UNRESOLVED] > [10] Generic dynamic event - add/remove eprobe events [FAIL] > [11] Generic dynamic event - add/remove kprobe events > > It 100% reproduced in step 11 and then missing ENDBR BUG generated: > " > [ 9332.752836] mmiotrace: enabled CPU7. > [ 9332.788612] mmiotrace: disabled. > [ 9337.103426] traps: Missing ENDBR: syscall_regfunc+0x0/0xb0 It turns out that while syscall_regfunc() does have an ENDBR when generated, it gets sealed by objtool's .ibt_endbr_seal list. Since the only text references to this function: $ git grep syscall_regfunc include/linux/tracepoint.h:extern int syscall_regfunc(void); include/trace/events/syscalls.h: syscall_regfunc, syscall_unregfunc include/trace/events/syscalls.h: syscall_regfunc, syscall_unregfunc kernel/tracepoint.c:int syscall_regfunc(void) appear in the __tracepoint section which is excluded by objtool. Fixes: 3c6f9f77e618 ("objtool: Rework ibt and extricate from stack validation") Reported-by: Pengfei Xu <pengfei.xu@intel.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Yrrepdaow4F5kqG0@hirez.programming.kicks-ass.net
2022-06-29x86/retbleed: Add fine grained Kconfig knobsPeter Zijlstra3-2/+15
Do fine-grained Kconfig for all the various retbleed parts. NOTE: if your compiler doesn't support return thunks this will silently 'upgrade' your mitigation to IBPB, you might not like this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}Josh Poimboeuf2-9/+50
Commit c536ed2fffd5 ("objtool: Remove SAVE/RESTORE hints") removed the save/restore unwind hints because they were no longer needed. Now they're going to be needed again so re-add them. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27objtool: Add entry UNRET validationPeter Zijlstra4-11/+184
Since entry asm is tricky, add a validation pass that ensures the retbleed mitigation has been done before the first actual RET instruction. Entry points are those that either have UNWIND_HINT_ENTRY, which acts as UNWIND_HINT_EMPTY but marks the instruction as an entry point, or those that have UWIND_HINT_IRET_REGS at +0. This is basically a variant of validate_branch() that is intra-function and it will simply follow all branches from marked entry points and ensures that all paths lead to ANNOTATE_UNRET_END. If a path hits RET or an indirection the path is a fail and will be reported. There are 3 ANNOTATE_UNRET_END instances: - UNTRAIN_RET itself - exception from-kernel; this path doesn't need UNTRAIN_RET - all early exceptions; these also don't need UNTRAIN_RET Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>