Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
"This contains:
- a PTI bugfix to avoid setting reserved CR3 bits when PCID is
disabled. This seems to cause issues on a virtual machine at least
and is incorrect according to the AMD manual.
- a PTI bugfix which disables the perf BTS facility if PTI is
enabled. The BTS AUX buffer is not globally visible and causes the
CPU to fault when the mapping disappears on switching CR3 to user
space. A full fix which restores BTS on PTI is non trivial and will
be worked on.
- PTI bugfixes for EFI and trusted boot which make sure that the user
space visible page table entries have the NX bit cleared
- removal of dead code in the PTI pagetable setup functions
- add PTI documentation
- add a selftest for vsyscall to verify that the kernel actually
implements what it advertises.
- a sysfs interface to expose vulnerability and mitigation
information so there is a coherent way for users to retrieve the
status.
- the initial spectre_v2 mitigations, aka retpoline:
+ The necessary ASM thunk and compiler support
+ The ASM variants of retpoline and the conversion of affected ASM
code
+ Make LFENCE serializing on AMD so it can be used as speculation
trap
+ The RSB fill after vmexit
- initial objtool support for retpoline
As I said in the status mail this is the most of the set of patches
which should go into 4.15 except two straight forward patches still on
hold:
- the retpoline add on of LFENCE which waits for ACKs
- the RSB fill after context switch
Both should be ready to go early next week and with that we'll have
covered the major holes of spectre_v2 and go back to normality"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
x86,perf: Disable intel_bts when PTI
security/Kconfig: Correct the Documentation reference for PTI
x86/pti: Fix !PCID and sanitize defines
selftests/x86: Add test_vsyscall
x86/retpoline: Fill return stack buffer on vmexit
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline: Add initial retpoline support
objtool: Allow alternatives to be ignored
objtool: Detect jumps to retpoline thunks
x86/pti: Make unpoison of pgd for trusted boot work for real
x86/alternatives: Fix optimize_nops() checking
sysfs/cpu: Fix typos in vulnerability documentation
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
...
|
|
Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
the corresponding thunks. Provide assembler macros for invoking the thunks
in the same way that GCC does, from native and inline assembler.
This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In
some circumstances, IBRS microcode features may be used instead, and the
retpoline can be disabled.
On AMD CPUs if lfence is serialising, the retpoline can be dramatically
simplified to a simple "lfence; jmp *\reg". A future patch, after it has
been verified that lfence really is serialising in all circumstances, can
enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition
to X86_FEATURE_RETPOLINE.
Do not align the retpoline in the altinstr section, because there is no
guarantee that it stays aligned when it's copied over the oldinstr during
alternative patching.
[ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]
[ tglx: Put actual function CALL/JMP in front of the macros, convert to
symbolic labels ]
[ dwmw2: Convert back to numeric labels, merge objtool fixes ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.uk
|
|
ARCH_HAS_REFCOUNT is no longer marked as broken ('if BROKEN'), so remove
the stale comment regarding it being broken.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171229195303.17781-1-ebiggers3@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Implement the CPU vulnerabilty show functions for meltdown, spectre_v1 and
spectre_v2.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180107214913.177414879@linutronix.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PTI preparatory patches from Thomas Gleixner:
"Todays Advent calendar window contains twentyfour easy to digest
patches. The original plan was to have twenty three matching the date,
but a late fixup made that moot.
- Move the cpu_entry_area mapping out of the fixmap into a separate
address space. That's necessary because the fixmap becomes too big
with NRCPUS=8192 and this caused already subtle and hard to
diagnose failures.
The top most patch is fresh from today and cures a brain slip of
that tall grumpy german greybeard, who ignored the intricacies of
32bit wraparounds.
- Limit the number of CPUs on 32bit to 64. That's insane big already,
but at least it's small enough to prevent address space issues with
the cpu_entry_area map, which have been observed and debugged with
the fixmap code
- A few TLB flush fixes in various places plus documentation which of
the TLB functions should be used for what.
- Rename the SYSENTER stack to CPU_ENTRY_AREA stack as it is used for
more than sysenter now and keeping the name makes backtraces
confusing.
- Prevent LDT inheritance on exec() by moving it to arch_dup_mmap(),
which is only invoked on fork().
- Make vysycall more robust.
- A few fixes and cleanups of the debug_pagetables code. Check
PAGE_PRESENT instead of checking the PTE for 0 and a cleanup of the
C89 initialization of the address hint array which already was out
of sync with the index enums.
- Move the ESPFIX init to a different place to prepare for PTI.
- Several code moves with no functional change to make PTI
integration simpler and header files less convoluted.
- Documentation fixes and clarifications"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit
init: Invoke init_espfix_bsp() from mm_init()
x86/cpu_entry_area: Move it out of the fixmap
x86/cpu_entry_area: Move it to a separate unit
x86/mm: Create asm/invpcid.h
x86/mm: Put MMU to hardware ASID translation in one place
x86/mm: Remove hard-coded ASID limit checks
x86/mm: Move the CR3 construction functions to tlbflush.h
x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what
x86/mm: Remove superfluous barriers
x86/mm: Use __flush_tlb_one() for kernel memory
x86/microcode: Dont abuse the TLB-flush interface
x86/uv: Use the right TLB-flush API
x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack
x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation
x86/mm/64: Improve the memory map documentation
x86/ldt: Prevent LDT inheritance on exec
x86/ldt: Rework locking
arch, mm: Allow arch_dup_mmap() to fail
x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
...
|
|
The recent cpu_entry_area changes fail to compile on 32-bit when BIGSMP=y
and NR_CPUS=512, because the fixmap area becomes too big.
Limit the number of CPUs with BIGSMP to 64, which is already way to big for
32-bit, but it's at least a working limitation.
We performed a quick survey of 32-bit-only machines that might be affected
by this change negatively, but found none.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
[ Note, this is a Git cherry-pick of the following commit:
d17a1d97dc20: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
... for easier x86 PTI code testing and back-porting. ]
The KASAN shadow is currently mapped using vmemmap_populate() since that
provides a semi-convenient way to map pages into init_top_pgt. However,
since that no longer zeroes the mapped pages, it is not suitable for
KASAN, which requires zeroed shadow memory.
Add kasan_populate_shadow() interface and use it instead of
vmemmap_populate(). Besides, this allows us to take advantage of
gigantic pages and use them to populate the shadow, which should save us
some memory wasted on page tables and reduce TLB pressure.
Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- topology enumeration fixes
- KASAN fix
- two entry fixes (not yet the big series related to KASLR)
- remove obsolete code
- instruction decoder fix
- better /dev/mem sanity checks, hopefully working better this time
- pkeys fixes
- two ACPI fixes
- 5-level paging related fixes
- UMIP fixes that should make application visible faults more debuggable
- boot fix for weird virtualization environment
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/decoder: Add new TEST instruction pattern
x86/PCI: Remove unused HyperTransport interrupt support
x86/umip: Fix insn_get_code_seg_params()'s return value
x86/boot/KASLR: Remove unused variable
x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow
x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing
x86/pkeys/selftests: Fix protection keys write() warning
x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey'
x86/mpx/selftests: Fix up weird arrays
x86/pkeys: Update documentation about availability
x86/umip: Print a warning into the syslog if UMIP-protected instructions are used
x86/smpboot: Fix __max_logical_packages estimate
x86/topology: Avoid wasting 128k for package id array
perf/x86/intel/uncore: Cache logical pkg id in uncore driver
x86/acpi: Reduce code duplication in mp_override_legacy_irq()
x86/acpi: Handle SCI interrupts above legacy space gracefully
x86/boot: Fix boot failure when SMP MP-table is based at 0
x86/mm: Limit mmap() of /dev/mem to valid physical addresses
x86/selftests: Add test for mapping placement for 5-level paging
...
|
|
[ Note, this commit is a cherry-picked version of:
d17a1d97dc20: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
... for easier x86 entry code testing and back-porting. ]
The KASAN shadow is currently mapped using vmemmap_populate() since that
provides a semi-convenient way to map pages into init_top_pgt. However,
since that no longer zeroes the mapped pages, it is not suitable for
KASAN, which requires zeroed shadow memory.
Add kasan_populate_shadow() interface and use it instead of
vmemmap_populate(). Besides, this allows us to take advantage of
gigantic pages and use them to populate the shadow, which should save us
some memory wasted on page tables and reduce TLB pressure.
Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The kasan shadow is currently mapped using vmemmap_populate() since that
provides a semi-convenient way to map pages into init_top_pgt. However,
since that no longer zeroes the mapped pages, it is not suitable for
kasan, which requires zeroed shadow memory.
Add kasan_populate_shadow() interface and use it instead of
vmemmap_populate(). Besides, this allows us to take advantage of
gigantic pages and use them to populate the shadow, which should save us
some memory wasted on page tables and reduce TLB pressure.
Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix up makefiles, remove references, and git rm kmemcheck.
Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
UMIP does cause any performance penalty to the vast majority of x86 code
that does not use the legacy instructions affected by UMIP.
Also describe UMIP more accurately and explain the behavior that can be
expected by the (few) applications that use the affected instructions.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: ricardo.neri@intel.com
Link: http://lkml.kernel.org/r/1510640985-18412-2-git-send-email-ricardo.neri-calderon@linux.intel.com
[ Spelling fixes, rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC updates from Thomas Gleixner:
"This update provides a major overhaul of the APIC initialization and
vector allocation code:
- Unification of the APIC and interrupt mode setup which was
scattered all over the place and was hard to follow. This also
distangles the timer setup from the APIC initialization which
brings a clear separation of functionality.
Great detective work from Dou Lyiang!
- Refactoring of the x86 vector allocation mechanism. The existing
code was based on nested loops and rather convoluted APIC callbacks
which had a horrible worst case behaviour and tried to serve all
different use cases in one go. This led to quite odd hacks when
supporting the new managed interupt facility for multiqueue devices
and made it more or less impossible to deal with the vector space
exhaustion which was a major roadblock for server hibernation.
Aside of that the code dealing with cpu hotplug and the system
vectors was disconnected from the actual vector management and
allocation code, which made it hard to follow and maintain.
Utilizing the new bitmap matrix allocator core mechanism, the new
allocator and management code consolidates the handling of system
vectors, legacy vectors, cpu hotplug mechanisms and the actual
allocation which needs to be aware of system and legacy vectors and
hotplug constraints into a single consistent entity.
This has one visible change: The support for multi CPU targets of
interrupts, which is only available on a certain subset of
CPUs/APIC variants has been removed in favour of single interrupt
targets. A proper analysis of the multi CPU target feature revealed
that there is no real advantage as the vast majority of interrupts
end up on the CPU with the lowest APIC id in the set of target CPUs
anyway. That change was agreed on by the relevant folks and allowed
to simplify the implementation significantly and to replace rather
fragile constructs like the vector cleanup IPI with straight
forward and solid code.
Furthermore this allowed to cleanly separate the allocation details
for legacy, normal and managed interrupts:
* Legacy interrupts are not longer wasting 16 vectors
unconditionally
* Managed interrupts have now a guaranteed vector reservation, but
the actual vector assignment happens when the interrupt is
requested. It's guaranteed not to fail.
* Normal interrupts no longer allocate vectors unconditionally
when the interrupt is set up (IO/APIC init or MSI(X) enable).
The mechanism has been switched to a best effort reservation
mode. The actual allocation happens when the interrupt is
requested. Contrary to managed interrupts the request can fail
due to vector space exhaustion, but drivers must handle a fail
of request_irq() anyway. When the interrupt is freed, the vector
is handed back as well.
This solves a long standing problem with large unconditional
vector allocations for a certain class of enterprise devices
which prevented server hibernation due to vector space
exhaustion when the unused allocated vectors had to be migrated
to CPU0 while unplugging all non boot CPUs.
The code has been equipped with trace points and detailed debugfs
information to aid analysis of the vector space"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
x86/vector/msi: Select CONFIG_GENERIC_IRQ_RESERVATION_MODE
PCI/MSI: Set MSI_FLAG_MUST_REACTIVATE in core code
genirq: Add config option for reservation mode
x86/vector: Use correct per cpu variable in free_moved_vector()
x86/apic/vector: Ignore set_affinity call for inactive interrupts
x86/apic: Fix spelling mistake: "symmectic" -> "symmetric"
x86/apic: Use dead_cpu instead of current CPU when cleaning up
ACPI/init: Invoke early ACPI initialization earlier
x86/vector: Respect affinity mask in irq descriptor
x86/irq: Simplify hotplug vector accounting
x86/vector: Switch IOAPIC to global reservation mode
x86/vector/msi: Switch to global reservation mode
x86/vector: Handle managed interrupts proper
x86/io_apic: Reevaluate vector configuration on activate()
iommu/amd: Reevaluate vector configuration on activate()
iommu/vt-d: Reevaluate vector configuration on activate()
x86/apic/msi: Force reactivation of interrupts at startup time
x86/vector: Untangle internal state from irq_cfg
x86/vector: Compile SMP only code conditionally
x86/apic: Remove unused callbacks
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Ingo Molnar:
"Note that in this cycle most of the x86 topics interacted at a level
that caused them to be merged into tip:x86/asm - but this should be a
temporary phenomenon, hopefully we'll back to the usual patterns in
the next merge window.
The main changes in this cycle were:
Hardware enablement:
- Add support for the Intel UMIP (User Mode Instruction Prevention)
CPU feature. This is a security feature that disables certain
instructions such as SGDT, SLDT, SIDT, SMSW and STR. (Ricardo Neri)
[ Note that this is disabled by default for now, there are some
smaller enhancements in the pipeline that I'll follow up with in
the next 1-2 days, which allows this to be enabled by default.]
- Add support for the AMD SEV (Secure Encrypted Virtualization) CPU
feature, on top of SME (Secure Memory Encryption) support that was
added in v4.14. (Tom Lendacky, Brijesh Singh)
- Enable new SSE/AVX/AVX512 CPU features: AVX512_VBMI2, GFNI, VAES,
VPCLMULQDQ, AVX512_VNNI, AVX512_BITALG. (Gayatri Kammela)
Other changes:
- A big series of entry code simplifications and enhancements (Andy
Lutomirski)
- Make the ORC unwinder default on x86 and various objtool
enhancements. (Josh Poimboeuf)
- 5-level paging enhancements (Kirill A. Shutemov)
- Micro-optimize the entry code a bit (Borislav Petkov)
- Improve the handling of interdependent CPU features in the early
FPU init code (Andi Kleen)
- Build system enhancements (Changbin Du, Masahiro Yamada)
- ... plus misc enhancements, fixes and cleanups"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (118 commits)
x86/build: Make the boot image generation less verbose
selftests/x86: Add tests for the STR and SLDT instructions
selftests/x86: Add tests for User-Mode Instruction Prevention
x86/traps: Fix up general protection faults caused by UMIP
x86/umip: Enable User-Mode Instruction Prevention at runtime
x86/umip: Force a page fault when unable to copy emulated result to user
x86/umip: Add emulation code for UMIP instructions
x86/cpufeature: Add User-Mode Instruction Prevention definitions
x86/insn-eval: Add support to resolve 16-bit address encodings
x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode
x86/insn-eval: Add wrapper function for 32 and 64-bit addresses
x86/insn-eval: Add support to resolve 32-bit address encodings
x86/insn-eval: Compute linear address in several utility functions
resource: Fix resource_size.cocci warnings
X86/KVM: Clear encryption attribute when SEV is active
X86/KVM: Decrypt shared per-cpu variables when SEV is active
percpu: Introduce DEFINE_PER_CPU_DECRYPTED
x86: Add support for changing memory encryption attribute in early boot
x86/io: Unroll string I/O when SEV is active
x86/boot: Add early boot support when running with SEV active
...
|
|
User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.
It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a user space to be protected from. Given these similarities, UMIP can be
enabled along with SMAP and SMEP.
At the moment, UMIP is disabled by default at build time. It can be enabled
at build time by selecting CONFIG_X86_INTEL_UMIP. If enabled at build time,
it can be disabled at run time by adding clearcpuid=514 to the kernel
parameters.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: ricardo.neri@intel.com
Link: http://lkml.kernel.org/r/1509935277-22138-10-git-send-email-ricardo.neri-calderon@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Conflicts:
arch/x86/kernel/cpu/Makefile
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Conflicts:
arch/x86/include/asm/x2apic.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Conflicts:
include/linux/compiler-clang.h
include/linux/compiler-gcc.h
include/linux/compiler-intel.h
include/uapi/linux/stddef.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Concentrate x86 MM and asm related changes into a single super-topic,
in preparation for larger changes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We are going to support boot-time switching between 4- and 5-level
paging. For KASAN it means we cannot have different KASAN_SHADOW_OFFSET
for different paging modes: the constant is passed to gcc to generate
code and cannot be changed at runtime.
This patch changes KASAN code to use 0xdffffc0000000000 as shadow offset
for both 4- and 5-level paging.
For 5-level paging it means that shadow memory region is not aligned to
PGD boundary anymore and we have to handle unaligned parts of the region
properly.
In addition, we have to exclude paravirt code from KASAN instrumentation
as we now use set_pgd() before KASAN is fully ready.
[kirill.shutemov@linux.intel.com: clenaup, changelog message]
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170929140821.37654-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Select CONFIG_GENERIC_IRQ_RESERVATION_MODE so PCI/MSI domains get the
MSI_FLAG_MUST_REACTIVATE flag set in pci_msi_create_irq_domain().
Remove the explicit setters of this flag in the apic/msi code as they are
not longer required.
Fixes: 4900be83602b ("x86/vector/msi: Switch to global reservation mode")
Reported-and-tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poulson <jopoulso@microsoft.com>
Cc: Mihai Costache <v-micos@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-pci@vger.kernel.org
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: devel@linuxdriverproject.org
Cc: KY Srinivasan <kys@microsoft.com>
Link: https://lkml.kernel.org/r/20171017075600.527569354@linutronix.de
|
|
Rename the unwinder config options from:
CONFIG_ORC_UNWINDER
CONFIG_FRAME_POINTER_UNWINDER
CONFIG_GUESS_UNWINDER
to:
CONFIG_UNWINDER_ORC
CONFIG_UNWINDER_FRAME_POINTER
CONFIG_UNWINDER_GUESS
... in order to give them a more logical config namespace.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/73972fc7e2762e91912c6b9584582703d6f1b8cc.1507924831.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
With the section inlining bug fixed for the x86 refcount protection,
we can turn the config back on.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1504382986-49301-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Initialize the matrix allocator and add the proper accounting points to the
code.
No functional change, just preparation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213155.108410660@linutronix.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm from Dan Williams:
"A rework of media error handling in the BTT driver and other updates.
It has appeared in a few -next releases and collected some late-
breaking build-error and warning fixups as a result.
Summary:
- Media error handling support in the Block Translation Table (BTT)
driver is reworked to address sleeping-while-atomic locking and
memory-allocation-context conflicts.
- The dax_device lookup overhead for xfs and ext4 is moved out of the
iomap hot-path to a mount-time lookup.
- A new 'ecc_unit_size' sysfs attribute is added to advertise the
read-modify-write boundary property of a persistent memory range.
- Preparatory fix-ups for arm and powerpc pmem support are included
along with other miscellaneous fixes"
* tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits)
libnvdimm, btt: fix format string warnings
libnvdimm, btt: clean up warning and error messages
ext4: fix null pointer dereference on sbi
libnvdimm, nfit: move the check on nd_reserved2 to the endpoint
dax: fix FS_DAX=n BLOCK=y compilation
libnvdimm: fix integer overflow static analysis warning
libnvdimm, nd_blk: remove mmio_flush_range()
libnvdimm, btt: rework error clearing
libnvdimm: fix potential deadlock while clearing errors
libnvdimm, btt: cache sector_size in arena_info
libnvdimm, btt: ensure that flags were also unchanged during a map_read
libnvdimm, btt: refactor map entry operations with macros
libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute
ext4: perform dax_device lookup at mount
ext2: perform dax_device lookup at mount
xfs: perform dax_device lookup at mount
dax: introduce a fs_dax_get_by_bdev() helper
libnvdimm, btt: check memory allocation failure
libnvdimm, label: fix index block size calculation
...
|
|
There are new users of memory hotplug emerging. Some of them require
different subset of arch_add_memory. There are some which only require
allocation of struct pages without mapping those pages to the kernel
address space. We currently have __add_pages for that purpose. But this
is rather lowlevel and not very suitable for the code outside of the
memory hotplug. E.g. x86_64 wants to update max_pfn which should be done
by the caller. Introduce add_pages() which should care about those
details if they are needed. Each architecture should define its
implementation and select CONFIG_ARCH_HAS_ADD_PAGES. All others use the
currently existing __add_pages.
Link: http://lkml.kernel.org/r/20170817000548.32038-7-jglisse@redhat.com
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Introduce CONFIG_ARCH_ENABLE_THP_MIGRATION to limit thp migration
functionality to x86_64, which should be safer at the first step.
Link: http://lkml.kernel.org/r/20170717193955.20207-5-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "mm,fork,security: introduce MADV_WIPEONFORK", v4.
If a child process accesses memory that was MADV_WIPEONFORK, it will get
zeroes. The address ranges are still valid, they are just empty.
If a child process accesses memory that was MADV_DONTFORK, it will get a
segmentation fault, since those address ranges are no longer valid in
the child after fork.
Since MADV_DONTFORK also seems to be used to allow very large programs
to fork in systems with strict memory overcommit restrictions, changing
the semantics of MADV_DONTFORK might break existing programs.
The use case is libraries that store or cache information, and want to
know that they need to regenerate it in the child process after fork.
Examples of this would be:
- systemd/pulseaudio API checks (fail after fork) (replacing a getpid
check, which is too slow without a PID cache)
- PKCS#11 API reinitialization check (mandated by specification)
- glibc's upcoming PRNG (reseed after fork)
- OpenSSL PRNG (reseed after fork)
The security benefits of a forking server having a re-inialized PRNG in
every child process are pretty obvious. However, due to libraries
having all kinds of internal state, and programs getting compiled with
many different versions of each library, it is unreasonable to expect
calling programs to re-initialize everything manually after fork.
A further complication is the proliferation of clone flags, programs
bypassing glibc's functions to call clone directly, and programs calling
unshare, causing the glibc pthread_atfork hook to not get called.
It would be better to have the kernel take care of this automatically.
The patchset also adds MADV_KEEPONFORK, to undo the effects of a prior
MADV_WIPEONFORK.
This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
https://man.openbsd.org/minherit.2
This patch (of 2):
MPX only seems to be available on 64 bit CPUs, starting with Skylake and
Goldmont. Move VM_MPX into the 64 bit only portion of vma->vm_flags, in
order to free up a VMA flag.
Link: http://lkml.kernel.org/r/20170811212829.29186-2-riel@redhat.com
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Colm MacCártaigh <colm@allcosts.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache quality monitoring update from Thomas Gleixner:
"This update provides a complete rewrite of the Cache Quality
Monitoring (CQM) facility.
The existing CQM support was duct taped into perf with a lot of issues
and the attempts to fix those turned out to be incomplete and
horrible.
After lengthy discussions it was decided to integrate the CQM support
into the Resource Director Technology (RDT) facility, which is the
obvious choise as in hardware CQM is part of RDT. This allowed to add
Memory Bandwidth Monitoring support on top.
As a result the mechanisms for allocating cache/memory bandwidth and
the corresponding monitoring mechanisms are integrated into a single
management facility with a consistent user interface"
* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
x86/intel_rdt: Turn off most RDT features on Skylake
x86/intel_rdt: Add command line options for resource director technology
x86/intel_rdt: Move special case code for Haswell to a quirk function
x86/intel_rdt: Remove redundant ternary operator on return
x86/intel_rdt/cqm: Improve limbo list processing
x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug
x86/intel_rdt: Modify the intel_pqr_state for better performance
x86/intel_rdt/cqm: Clear the default RMID during hotcpu
x86/intel_rdt: Show bitmask of shareable resource with other executing units
x86/intel_rdt/mbm: Handle counter overflow
x86/intel_rdt/mbm: Add mbm counter initialization
x86/intel_rdt/mbm: Basic counting of MBM events (total and local)
x86/intel_rdt/cqm: Add CPU hotplug support
x86/intel_rdt/cqm: Add sched_in support
x86/intel_rdt: Introduce rdt_enable_key for scheduling
x86/intel_rdt/cqm: Add mount,umount support
x86/intel_rdt/cqm: Add rmdir support
x86/intel_rdt: Separate the ctrl bits from rmdir
x86/intel_rdt/cqm: Add mon_data
x86/intel_rdt: Prepare for RDT monitor data support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
"PCID support, 5-level paging support, Secure Memory Encryption support
The main changes in this cycle are support for three new, complex
hardware features of x86 CPUs:
- Add 5-level paging support, which is a new hardware feature on
upcoming Intel CPUs allowing up to 128 PB of virtual address space
and 4 PB of physical RAM space - a 512-fold increase over the old
limits. (Supercomputers of the future forecasting hurricanes on an
ever warming planet can certainly make good use of more RAM.)
Many of the necessary changes went upstream in previous cycles,
v4.14 is the first kernel that can enable 5-level paging.
This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
default.
(By Kirill A. Shutemov)
- Add 'encrypted memory' support, which is a new hardware feature on
upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
RAM to be encrypted and decrypted (mostly) transparently by the
CPU, with a little help from the kernel to transition to/from
encrypted RAM. Such RAM should be more secure against various
attacks like RAM access via the memory bus and should make the
radio signature of memory bus traffic harder to intercept (and
decrypt) as well.
This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
by default.
(By Tom Lendacky)
- Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
hardware feature that attaches an address space tag to TLB entries
and thus allows to skip TLB flushing in many cases, even if we
switch mm's.
(By Andy Lutomirski)
All three of these features were in the works for a long time, and
it's coincidence of the three independent development paths that they
are all enabled in v4.14 at once"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
x86/mm: Use pr_cont() in dump_pagetable()
x86/mm: Fix SME encryption stack ptr handling
kvm/x86: Avoid clearing the C-bit in rsvd_bits()
x86/CPU: Align CR3 defines
x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
acpi, x86/mm: Remove encryption mask from ACPI page protection type
x86/mm, kexec: Fix memory corruption with SME on successive kexecs
x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
x86/mm: Allow userspace have mappings above 47-bit
x86/mm: Prepare to expose larger address space to userspace
x86/mpx: Do not allow MPX if we have mappings above 47-bit
x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
x86/mm/dump_pagetables: Fix printout of p4d level
x86/mm/dump_pagetables: Generalize address normalization
x86/boot: Fix memremap() related build failure
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Add 'cross-release' support to lockdep, which allows APIs like
completions, where it's not the 'owner' who releases the lock, to be
tracked. It's all activated automatically under
CONFIG_PROVE_LOCKING=y.
- Clean up (restructure) the x86 atomics op implementation to be more
readable, in preparation of KASAN annotations. (Dmitry Vyukov)
- Fix static keys (Paolo Bonzini)
- Add killable versions of down_read() et al (Kirill Tkhai)
- Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)
- Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)
- Remove smp_mb__before_spinlock() and convert its usages, introduce
smp_mb__after_spinlock() (Peter Zijlstra)
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
locking/lockdep/selftests: Fix mixed read-write ABBA tests
sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
smp: Avoid using two cache lines for struct call_single_data
locking/lockdep: Untangle xhlock history save/restore from task independence
locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
futex: Remove duplicated code and fix undefined behaviour
Documentation/locking/atomic: Finish the document...
locking/lockdep: Fix workqueue crossrelease annotation
workqueue/lockdep: 'Fix' flush_work() annotation
locking/lockdep/selftests: Add mixed read-write ABBA tests
mm, locking/barriers: Clarify tlb_flush_pending() barriers
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
locking/lockdep: Explicitly initialize wq_barrier::done::map
locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
locking/refcounts, x86/asm: Implement fast refcount overflow protection
locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
- Introduce the ORC unwinder, which can be enabled via
CONFIG_ORC_UNWINDER=y.
The ORC unwinder is a lightweight, Linux kernel specific debuginfo
implementation, which aims to be DWARF done right for unwinding.
Objtool is used to generate the ORC unwinder tables during build, so
the data format is flexible and kernel internal: there's no
dependency on debuginfo created by an external toolchain.
The ORC unwinder is almost two orders of magnitude faster than the
(out of tree) DWARF unwinder - which is important for perf call graph
profiling. It is also significantly simpler and is coded defensively:
there has not been a single ORC related kernel crash so far, even
with early versions. (knock on wood!)
But the main advantage is that enabling the ORC unwinder allows
CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel
measurably:
With frame pointers disabled, GCC does not have to add frame pointer
instrumentation code to every function in the kernel. The kernel's
.text size decreases by about 3.2%, resulting in better cache
utilization and fewer instructions executed, resulting in a broad
kernel-wide speedup. Average speedup of system calls should be
roughly in the 1-3% range - measurements by Mel Gorman [1] have shown
a speedup of 5-10% for some function execution intense workloads.
The main cost of the unwinder is that the unwinder data has to be
stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel
config - which is a modest cost on modern x86 systems.
Given how young the ORC unwinder code is it's not enabled by default
- but given the performance advantages the plan is to eventually make
it the default unwinder on x86.
See Documentation/x86/orc-unwinder.txt for more details.
- Remove lguest support: its intended role was that of a temporary
proof of concept for virtualization, plus its removal will enable the
reduction (removal) of the paravirt API as well, so Rusty agreed to
its removal. (Juergen Gross)
- Clean up and fix FSGS related functionality (Andy Lutomirski)
- Clean up IO access APIs (Andy Shevchenko)
- Enhance the symbol namespace (Jiri Slaby)
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
objtool: Handle GCC stack pointer adjustment bug
x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()
x86/fpu/math-emu: Add ENDPROC to functions
x86/boot/64: Extract efi_pe_entry() from startup_64()
x86/boot/32: Extract efi_pe_entry() from startup_32()
x86/lguest: Remove lguest support
x86/paravirt/xen: Remove xen_patch()
objtool: Fix objtool fallthrough detection with function padding
x86/xen/64: Fix the reported SS and CS in SYSCALL
objtool: Track DRAP separately from callee-saved registers
objtool: Fix validate_branch() return codes
x86: Clarify/fix no-op barriers for text_poke_bp()
x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
selftests/x86/fsgsbase: Test selectors 1, 2, and 3
x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
x86/asm: Fix UNWIND_HINT_REGS macro for older binutils
x86/asm/32: Fix regs_get_register() on segment registers
x86/xen/64: Rearrange the SYSCALL entries
x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads
...
|
|
|
|
mmio_flush_range() suffers from a lack of clearly-defined semantics,
and is somewhat ambiguous to port to other architectures where the
scope of the writeback implied by "flush" and ordering might matter,
but MMIO would tend to imply non-cacheable anyway. Per the rationale
in 67a3e8fe9015 ("nd_blk: change aperture mapping from WC to WB"), the
only existing use is actually to invalidate clean cache lines for
ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
cleanup of the pmem API, that also now happens to be the exact purpose
of arch_invalidate_pmem(), which would be a far more well-defined tool
for the job.
Rather than risk potentially inconsistent implementations of
mmio_flush_range() for the sake of one callsite, streamline things by
removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
definitions up to the libnvdimm level, so they can be shared by NFIT
as well. This allows NFIT to be enabled for arm64.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
There's a subtle bug in how some of the paravirt guest code handles
page table freeing on x86:
On x86 software page table walkers depend on the fact that remote TLB flush
does an IPI: walk is performed lockless but with interrupts disabled and in
case the page table is freed the freeing CPU will get blocked as remote TLB
flush is required. On other architectures which don't require an IPI to do
remote TLB flush we have an RCU-based mechanism (see
include/asm-generic/tlb.h for more details).
In virtualized environments we may want to override the ->flush_tlb_others
callback in pv_mmu_ops and use a hypercall asking the hypervisor to do a
remote TLB flush for us. This breaks the assumption about IPIs. Xen PV has
been doing this for years and the upcoming remote TLB flush for Hyper-V will
do it too.
This is not safe, as software page table walkers may step on an already
freed page.
Fix the bug by enabling the RCU-based page table freeing mechanism,
CONFIG_HAVE_RCU_TABLE_FREE=y.
Testing with kernbench and mmap/munmap microbenchmarks, and neither showed
any noticeable performance impact.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: KY Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170828082251.5562-1-vkuznets@redhat.com
[ Rewrote/fixed/clarified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Mike Galbraith bisected a boot crash back to the following commit:
7a46ec0e2f48 ("locking/refcounts, x86/asm: Implement fast refcount overflow protection")
The crash/hang pattern is:
> Symptom is a few splats as below, with box finally hanging. Network
> comes up, but neither ssh nor console login is possible.
>
> ------------[ cut here ]------------
> WARNING: CPU: 4 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
> ...
> __sk_destruct()
> rcu_process_callbacks()
> __do_softirq()
> irq_exit()
> smp_apic_timer_interrupt()
> apic_timer_interrupt()
We are at -rc7 already, and the code has grown some dependencies, so
instead of a plain revert disable the config temporarily, in the hope
of getting real fixes.
Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/tip-7a46ec0e2f4850407de5e1d19a44edee6efa58ec@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Conflicts:
arch/x86/kernel/head64.c
arch/x86/mm/mmap.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Lguest seems to be rather unused these days. It has seen only patches
ensuring it still builds the last two years and its official state is
"Odd Fixes".
Remove it in order to be able to clean up the paravirt code.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: boris.ostrovsky@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: rusty@rustcorp.com.au
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170816173157.8633-3-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull watchdog fix from Thomas Gleixner:
"A fix for the hardlockup watchdog to prevent false positives with
extreme Turbo-Modes which make the perf/NMI watchdog fire faster than
the hrtimer which is used to verify.
Slightly larger than the minimal fix, which just would increase the
hrtimer frequency, but comes with extra overhead of more watchdog
timer interrupts and thread wakeups for all users.
With this change we restrict the overhead to the extreme Turbo-Mode
systems"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kernel/watchdog: Prevent false positives with turbo modes
|
|
Commit 05a4a9527931 ("kernel/watchdog: split up config options") lost
the perf-based hardlockup detector's dependency on PERF_EVENTS, which
can result in broken builds with some powerpc configurations.
Restore the dependency. Add it in for x86 too, despite x86 always
selecting PERF_EVENTS it seems reasonable to make the dependency
explicit.
Link: http://lkml.kernel.org/r/20170810114452.6673-1-npiggin@gmail.com
Fixes: 05a4a9527931 ("kernel/watchdog: split up config options")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The hardlockup detector on x86 uses a performance counter based on unhalted
CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
performance counter period, so the hrtimer should fire 2-3 times before the
performance counter NMI fires. The NMI code checks whether the hrtimer
fired since the last invocation. If not, it assumess a hard lockup.
The calculation of those periods is based on the nominal CPU
frequency. Turbo modes increase the CPU clock frequency and therefore
shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
nominal frequency) the perf/NMI period is shorter than the hrtimer period
which leads to false positives.
A simple fix would be to shorten the hrtimer period, but that comes with
the side effect of more frequent hrtimer and softlockup thread wakeups,
which is not desired.
Implement a low pass filter, which checks the perf/NMI period against
kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
elapsed then the event is ignored and postponed to the next perf/NMI.
That solves the problem and avoids the overhead of shorter hrtimer periods
and more frequent softlockup thread wakeups.
Fixes: 58687acba592 ("lockup_detector: Combine nmi_watchdog and softlockup detector")
Reported-and-tested-by: Kan Liang <Kan.liang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: dzickus@redhat.com
Cc: prarit@redhat.com
Cc: ak@linux.intel.com
Cc: babu.moger@oracle.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: acme@redhat.com
Cc: stable@vger.kernel.org
Cc: atomlin@redhat.com
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
|
|
This implements refcount_t overflow protection on x86 without a noticeable
performance impact, though without the fuller checking of REFCOUNT_FULL.
This is done by duplicating the existing atomic_t refcount implementation
but with normally a single instruction added to detect if the refcount
has gone negative (e.g. wrapped past INT_MAX or below zero). When detected,
the handler saturates the refcount_t to INT_MIN / 2. With this overflow
protection, the erroneous reference release that would follow a wrap back
to zero is blocked from happening, avoiding the class of refcount-overflow
use-after-free vulnerabilities entirely.
Only the overflow case of refcounting can be perfectly protected, since
it can be detected and stopped before the reference is freed and left to
be abused by an attacker. There isn't a way to block early decrements,
and while REFCOUNT_FULL stops increment-from-zero cases (which would
be the state _after_ an early decrement and stops potential double-free
conditions), this fast implementation does not, since it would require
the more expensive cmpxchg loops. Since the overflow case is much more
common (e.g. missing a "put" during an error path), this protection
provides real-world protection. For example, the two public refcount
overflow use-after-free exploits published in 2016 would have been
rendered unexploitable:
http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/
http://cyseclabs.com/page?n=02012016
This implementation does, however, notice an unchecked decrement to zero
(i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it
resulted in a zero). Decrements under zero are noticed (since they will
have resulted in a negative value), though this only indicates that a
use-after-free may have already happened. Such notifications are likely
avoidable by an attacker that has already exploited a use-after-free
vulnerability, but it's better to have them reported than allow such
conditions to remain universally silent.
On first overflow detection, the refcount value is reset to INT_MIN / 2
(which serves as a saturation value) and a report and stack trace are
produced. When operations detect only negative value results (such as
changing an already saturated value), saturation still happens but no
notification is performed (since the value was already saturated).
On the matter of races, since the entire range beyond INT_MAX but before
0 is negative, every operation at INT_MIN / 2 will trap, leaving no
overflow-only race condition.
As for performance, this implementation adds a single "js" instruction
to the regular execution flow of a copy of the standard atomic_t refcount
operations. (The non-"and_test" refcount_dec() function, which is uncommon
in regular refcount design patterns, has an additional "jz" instruction
to detect reaching exactly zero.) Since this is a forward jump, it is by
default the non-predicted path, which will be reinforced by dynamic branch
prediction. The result is this protection having virtually no measurable
change in performance over standard atomic_t operations. The error path,
located in .text.unlikely, saves the refcount location and then uses UD0
to fire a refcount exception handler, which resets the refcount, handles
reporting, and returns to regular execution. This keeps the changes to
.text size minimal, avoiding return jumps and open-coded calls to the
error reporting routine.
Example assembly comparison:
refcount_inc() before:
.text:
ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp)
refcount_inc() after:
.text:
ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp)
ffffffff8154614d: 0f 88 80 d5 17 00 js ffffffff816c36d3
...
.text.unlikely:
ffffffff816c36d3: 48 8d 4d f4 lea -0xc(%rbp),%rcx
ffffffff816c36d7: 0f ff (bad)
These are the cycle counts comparing a loop of refcount_inc() from 1
to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
(refcount_t-full), and this overflow-protected refcount (refcount_t-fast):
2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
cycles protections
atomic_t 82249267387 none
refcount_t-fast 82211446892 overflow, untested dec-to-zero
refcount_t-full 144814735193 overflow, untested dec-to-zero, inc-from-zero
This code is a modified version of the x86 PAX_REFCOUNT atomic_t
overflow defense from the last public patch of PaX/grsecurity, based
on my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code. Thanks
to PaX Team for various suggestions for improvement for repurposing this
code to be a refcount-only protection.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Jann Horn <jannh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arozansk@redhat.com
Cc: axboe@kernel.dk
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170815161924.GA133115@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We currently have a CONFIG_RDT_A which is for RDT(Resource directory
technology) allocation based resctrl filesystem interface. As a
preparation to add support for RDT monitoring as well into the same
resctrl filesystem, change the config option to be CONFIG_RDT which
would include both RDT allocation and monitoring code.
No functional change.
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-4-git-send-email-vikas.shivappa@linux.intel.com
|
|
There are three mutually exclusive unwinders. Make that more obvious by
combining them into a multiple-choice selection:
CONFIG_FRAME_POINTER_UNWINDER
CONFIG_ORC_UNWINDER
CONFIG_GUESS_UNWINDER (if CONFIG_EXPERT=y)
Frame pointers are still the default (for now).
The old CONFIG_FRAME_POINTER option is still used in some
arch-independent places, so keep it around, but make it
invisible to the user on x86 - it's now selected by
CONFIG_FRAME_POINTER_UNWINDER=y.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/20170725135424.zukjmgpz3plf5pmt@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y.
It plugs into the existing x86 unwinder framework.
It relies on objtool to generate the needed .orc_unwind and
.orc_unwind_ip sections.
For more details on why ORC is used instead of DWARF, see
Documentation/x86/orc-unwinder.txt - but the short version is
that it's a simplified, fundamentally more robust debugninfo
data structure, which also allows up to two orders of magnitude
faster lookups than the DWARF unwinder - which matters to
profiling workloads like perf.
Thanks to Andy Lutomirski for the performance improvement ideas:
splitting the ORC unwind table into two parallel arrays and creating a
fast lookup table to search a subset of the unwind table.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com
[ Extended the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Most of things are in place and we can enable support for 5-level paging.
The patch makes XEN_PV and XEN_PVH dependent on !X86_5LEVEL. Both are
not ready to work with 5-level paging.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170716225954.74185-9-kirill.shutemov@linux.intel.com
[ Minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add early_memremap() support to be able to specify encrypted and
decrypted mappings with and without write-protection. The use of
write-protection is necessary when encrypting data "in place". The
write-protect attribute is considered cacheable for loads, but not
stores. This implies that the hardware will never give the core a
dirty line with this memtype.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/479b5832c30fae3efa7932e48f81794e86397229.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add support for Secure Memory Encryption (SME). This initial support
provides a Kconfig entry to build the SME support into the kernel and
defines the memory encryption mask that will be used in subsequent
patches to mark pages as encrypted.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/a6c34d16caaed3bc3e2d6f0987554275bd291554.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|