Age | Commit message (Collapse) | Author | Files | Lines |
|
This callback is only required because the partition table init comes
before process table allocation on powernv (aka bare metal aka native).
Change the order to allocate the process table first, and remove the
callback.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190902152931.17840-2-npiggin@gmail.com
|
|
None of these routines were ever used anywhere in the kernel tree
since they were added to the kernel.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
pgtable_t is now identical for all subarches, move it to the
top level asm/mmu.h
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Now that slice_mask_for_size() is in mmu.h, the mm_ctx_slice_mask_xxx()
are not needed anymore, so drop them. Note that the 8xx ones where
not used anyway.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Move slice_mask_for_size() into subarch mmu.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Retain the BUG_ON()s, rather than converting to VM_BUG_ON()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Allocate subpage protect related variables only if we use the feature.
This helps in reducing the hash related mm context struct by around 4K
Before the patch
sizeof(struct hash_mm_context) = 8288
After the patch
sizeof(struct hash_mm_context) = 4160
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Currently, our mm_context_t on book3s64 include all hash specific
context details like slice mask and subpage protection details. We
can skip allocating these with radix translation. This will help us to save
8K per mm_context with radix translation.
With the patch applied we have
sizeof(mm_context_t) = 136
sizeof(struct hash_mm_context) = 8288
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We want to switch to allocating them runtime only when hash translation is
enabled. Add helpers so that both book3s and nohash can be adapted to
upcoming change easily.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Book3s64 always have PPC_MM_SLICES enabled. So remove the unncessary #ifdef
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The current value of MAX_PHYSMEM_BITS cannot work with 32 bit configs.
We used to have MAX_PHYSMEM_BITS not defined without SPARSEMEM and 32
bit configs never expected a value to be set for MAX_PHYSMEM_BITS.
Dependent code such as zsmalloc derived the right values based on other
fields. Instead of finding a value that works with different configs,
use new values only for book3s_64. For 64 bit booke, use the definition
of MAX_PHYSMEM_BITS as per commit a7df61a0e2b6 ("[PATCH] ppc64: Increase sparsemem defaults")
That change was done in 2005 and hopefully will work with book3e 64.
Fixes: 8bc086899816 ("powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This patch move pgtable_t into platform headers.
It gets rid of the CONFIG_PPC_64K_PAGES case for PPC64
as nohash/64 doesn't support CONFIG_PPC_64K_PAGES.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We will be adding get_kernel_context later. Update function name to indicate
this handle context allocation user space address.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
4K config use one full page at level 4 of the pagetable. Add support for single
fragment allocation in pagetable fragment code and and use that for 4K config.
This makes both 4k and 64k use the same code path. Later we will switch pmd to
use the page table fragment code. This is done only for 64bit platforms which
is using page table fragment support.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
For addresses above 512TB we allocate additional mmu contexts. To make
it all easy, addresses above 512TB are handled with IR/DR=1 and with
stack frame setup.
The mmu_context_t is also updated to track the new extended_ids. To
support upto 4PB we need a total 8 contexts.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Minor formatting tweaks and comment wording, switch BUG to WARN
in get_ea_context().]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Merge our fixes branch from the 4.16 cycle.
There were a number of important fixes merged, in particular some Power9
workarounds that we want in next for testing purposes. There's also been
some conflicting changes in the CPU features code which are best merged
and tested before going upstream.
|
|
Currently, when using coprocessors (which use the Nest MMU), we
simply increment the active_cpu count to force all TLB invalidations
to be come broadcast.
Unfortunately, due to an errata in POWER9, we will need to know
more specifically that coprocessors are in use.
This maintains a separate copros counter in the MMU context for
that purpose.
NB. The commit mentioned in the fixes tag below is not at fault for
the bug we're fixing in this commit and the next, but this fix applies
on top the infrastructure it introduced.
Fixes: 03b8abedf4f4 ("cxl: Enable global TLBIs for cxl contexts")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Calculating the slice mask can become a signifcant overhead for
get_unmapped_area. This patch adds a struct slice_mask for
each page size in the mm_context, and keeps these in synch with
the slices psize arrays and slb_addr_limit.
On Book3S/64 this adds 288 bytes to the mm_context_t for the
slice mask caches.
On POWER8, this increases vfork+exec+exit performance by 9.9%
and reduces time to mmap+munmap a 64kB page by 28%.
Reduces time to mmap+munmap by about 10% on 8xx.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
While the implementation of the "slices" address space allows
a significant amount of high slices, it limits the number of
low slices to 16 due to the use of a single u64 low_slices_psize
element in struct mm_context_t
On the 8xx, the minimum slice size is the size of the area
covered by a single PMD entry, ie 4M in 4K pages mode and 64M in
16K pages mode. This means we could have at least 64 slices.
In order to override this limitation, this patch switches the
handling of low_slices_psize to char array as done already for
high_slices_psize.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This patch provides the implementation of execute-only pkey.
The architecture-independent layer expects the arch-dependent
layer, to support the ability to create and enable a special
key which has execute-only permission.
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Total 32 keys are available on power7 and above. However
pkey 0,1 are reserved. So effectively we have 30 pkeys.
On 4K kernels, we do not have 5 bits in the PTE to
represent all the keys; we only have 3bits. Two of those
keys are reserved; pkey 0 and pkey 1. So effectively we
have 6 pkeys.
This patch keeps track of reserved keys, allocated keys
and keys that are currently free.
Also it adds skeletal functions and macros, that the
architecture-independent code expects to be available.
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"A bit of a small release, I suspect in part due to me travelling for
KS. But my backlog of patches to review is smaller than usual, so I
think in part folks just didn't send as much this cycle.
Non-highlights:
- Five fixes for the >128T address space handling, both to fix bugs
in our implementation and to bring the semantics exactly into line
with x86.
Highlights:
- Support for a new OPAL call on bare metal machines which gives us a
true NMI (ie. is not masked by MSR[EE]=0) for debugging etc.
- Support for Power9 DD2 in the CXL driver.
- Improvements to machine check handling so that uncorrectable errors
can be reported into the generic memory_failure() machinery.
- Some fixes and improvements for VPHN, which is used under PowerVM
to notify the Linux partition of topology changes.
- Plumbing to enable TM (transactional memory) without suspend on
some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND).
- Support for emulating vector loads form cache-inhibited memory, on
some Power9 revisions.
- Disable the fast-endian switch "syscall" by default (behind a
CONFIG), we believe it has never had any users.
- A major rework of the API drivers use when initiating and waiting
for long running operations performed by OPAL firmware, and changes
to the powernv_flash driver to use the new API.
- Several fixes for the handling of FP/VMX/VSX while processes are
using transactional memory.
- Optimisations of TLB range flushes when using the radix MMU on
Power9.
- Improvements to the VAS facility used to access coprocessors on
Power9, and related improvements to the way the NX crypto driver
handles requests.
- Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew
Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin
Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard,
Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven,
Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley,
Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu,
Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia
Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee,
Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel
Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A.
Kennington III"
* tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits)
powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature
powerpc/64s: Fix masking of SRR1 bits on instruction fault
powerpc/64s: mm_context.addr_limit is only used on hash
powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation
powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary
powerpc/64s/hash: Fix fork() with 512TB process address space
powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation
powerpc/64s/hash: Fix 512T hint detection to use >= 128T
powerpc: Fix DABR match on hash based systems
powerpc/signal: Properly handle return value from uprobe_deny_signal()
powerpc/fadump: use kstrtoint to handle sysfs store
powerpc/lib: Implement UACCESS_FLUSHCACHE API
powerpc/lib: Implement PMEM API
powerpc/powernv/npu: Don't explicitly flush nmmu tlb
powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm()
powerpc/powernv/idle: Round up latency and residency values
powerpc/kprobes: refactor kprobe_lookup_name for safer string operations
powerpc/kprobes: Blacklist emulate_update_regs() from kprobes
powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace
powerpc/kprobes: Disable preemption before invoking probe handler for optprobes
...
|
|
Radix keeps no meaningful state in addr_limit, so remove it from radix
code and rename to slb_addr_limit to make it clear it applies to hash
only.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Instead of comparing the whole CPU mask every time, let's
keep a counter of how many bits are set in the mask. Thus
testing for a local mm only requires testing if that counter
is 1 and the current CPU bit is set in the mask.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
There's a non-trivial dependency between some commits we want to put in
next and the KVM prefetch work around that went into fixes. So merge
fixes into next.
|
|
We have a whole pile of unused code to maintain the ACOP register,
allocate coprocessor PIDs and handle ACOP faults. This mechanism
was used for the HFI adapter on POWER7 which is dead and gone and
whose driver never went upstream. It was used on some A2 core based
stuff that also never saw the light of day.
Take out all that code.
There is still some POWER8 coprocessor code that uses icswx but it's
kernel only and thus doesn't use any of that infrastructure.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
There's a somewhat architectural issue with Radix MMU and KVM.
When coming out of a guest with AIL (Alternate Interrupt Location, ie,
MMU enabled), we start executing hypervisor code with the PID register
still containing whatever the guest has been using.
The problem is that the CPU can (and will) then start prefetching or
speculatively load from whatever host context has that same PID (if
any), thus bringing translations for that context into the TLB, which
Linux doesn't know about.
This can cause stale translations and subsequent crashes.
Fixing this in a way that is neither racy nor a huge performance
impact is difficult. We could just make the host invalidations always
use broadcast forms but that would hurt single threaded programs for
example.
We chose to fix it instead by partitioning the PID space between guest
and host. This is possible because today Linux only use 19 out of the
20 bits of PID space, so existing guests will work if we make the host
use the top half of the 20 bits space.
We additionally add support for a property to indicate to Linux the
size of the PID register which will be useful if we eventually have
processors with a larger PID space available.
There is still an issue with malicious guests purposefully setting the
PID register to a value in the hosts PID range. Hopefully future HW
can prevent that, but in the meantime, we handle it with a pair of
kludges:
- On the way out of a guest, before we clear the current VCPU in the
PACA, we check the PID and if it's outside of the permitted range
we flush the TLB for that PID.
- When context switching, if the mm is "new" on that CPU (the
corresponding bit was set for the first time in the mm cpumask), we
check if any sibling thread is in KVM (has a non-NULL VCPU pointer
in the PACA). If that is the case, we also flush the PID for that
CPU (core).
This second part is needed to handle the case where a process is
migrated (or starts a new pthread) on a sibling thread of the CPU
coming out of KVM, as there's a window where stale translations can
exist before we detect it and flush them out.
A future optimization could be added by keeping track of whether the
PID has ever been used and avoid doing that for completely fresh PIDs.
We could similarily mark PIDs that have been the subject of a global
invalidation as "fresh". But for now this will do.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop
unneeded include of kvm_book3s_asm.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Nvlink2 supports address translation services (ATS) allowing devices
to request address translations from an mmu known as the nest MMU
which is setup to walk the CPU page tables.
To access this functionality certain firmware calls are required to
setup and manage hardware context tables in the nvlink processing unit
(NPU). The NPU also manages forwarding of TLB invalidates (known as
address translation shootdowns/ATSDs) to attached devices.
This patch exports several methods to allow device drivers to register
a process id (PASID/PID) in the hardware tables and to receive
notification of when a device should stop issuing address translation
requests (ATRs). It also adds a fault handler to allow device drivers
to demand fault pages in.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
[mpe: Fix up comment formatting, use flush_tlb_mm()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
In the followup patch, we will increase the slice array size to handle
512TB range, but will limit the max addr to 128TB. Avoid doing
unnecessary computation and avoid doing slice mask related operation
above address limit.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Complete the split of the radix vs hash mm context initialisation.
This is mostly code movement, with the exception that we now limit the
context allocation to PRTB_ENTRIES - 1 on radix.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Pull more KVM updates from Radim Krčmář:
"Second batch of KVM changes for the 4.11 merge window:
PPC:
- correct assumption about ASDR on POWER9
- fix MMIO emulation on POWER9
x86:
- add a simple test for ioperm
- cleanup TSS (going through KVM tree as the whole undertaking was
caused by VMX's use of TSS)
- fix nVMX interrupt delivery
- fix some performance counters in the guest
... and two cleanup patches"
* tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: Fix pending events injection
x86/kvm/vmx: remove unused variable in segment_base()
selftests/x86: Add a basic selftest for ioperm
x86/asm: Tidy up TSS limit code
kvm: convert kvm.users_count from atomic_t to refcount_t
KVM: x86: never specify a sample period for virtualized in_tx_cp counters
KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9
KVM: PPC: Book3S HV: Fix software walk of guest process page tables
|
|
This fixes some bugs in the code that walks the guest's page tables.
These bugs cause MMIO emulation to fail whenever the guest is in
virtial mode (MMU on), leading to the guest hanging if it tried to
access a virtio device.
The first bug was that when reading the guest's process table, we were
using the whole of arch->process_table, not just the field that contains
the process table base address. The second bug was that the mask used
when reading the process table entry to get the radix tree base address,
RPDB_MASK, had the wrong value.
Fixes: 9e04ba69beec ("KVM: PPC: Book3S HV: Add basic infrastructure for radix guests")
Fixes: e99833448c5f ("powerpc/mm/radix: Add partition table format & callback")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Fix typos and add the following to the scripts/spelling.txt:
partiton||partition
Link: http://lkml.kernel.org/r/1481573103-11329-7-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This adds definitions for bits in the DSISR register which are used
by POWER9 for various translation-related exception conditions, and
for some more bits in the partition table entry that will be needed
by KVM.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
To use radix as a guest, we first need to tell the hypervisor via
the ibm,client-architecture call first that we support POWER9 and
architecture v3.00, and that we can do either radix or hash and
that we would like to choose later using an hcall (the
H_REGISTER_PROC_TBL hcall).
Then we need to check whether the hypervisor agreed to us using
radix. We need to do this very early on in the kernel boot process
before any of the MMU initialization is done. If the hypervisor
doesn't agree, we can't use radix and therefore clear the radix
MMU feature bit.
Later, when we have set up our process table, which points to the
radix tree for each process, we need to install that using the
H_REGISTER_PROC_TBL hcall.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We want to initialise register_process_table() before ppc_md is setup,
so that it can be called as part of MMU init (at least on Radix ATM).
That no longer works because probe_machine() requires that ppc_md be
empty before it's called, and we now do probe_machine() much later.
So make register_process_table a global for now. It will probably move
into a mmu_radix_ops struct at some point in the future.
This was broken by me when applying commit 7025776ed1eb "powerpc/mm:
Move hash table ops to a separate structure" due to conflicts with other
patches.
Fixes: 7025776ed1eb ("powerpc/mm: Move hash table ops to a separate structure")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This switches early feature checks to use the non static key variant of
the function. In later patches we will be switching cpu_has_feature()
and mmu_has_feature() to use static keys and we can use them only after
static key/jump label is initialized. Any check for feature before jump
label init should be done using this new helper.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Currently we have radix_enabled() three times, twice in asm/book3s/64/mmu.h
and then a fallback in asm/mmu.h.
Consolidate them in asm/mmu.h. While we're at it convert them to be
static inlines, and change the fallback case to returning a bool, like
mmu_has_feature().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
MMU feature bits are defined such that we use the lower half to
present MMU family features. Remove the strict split of half and
also move Radix to a mmu family feature. Radix introduce a new MMU
model and strictly speaking it is a new MMU family. This also free
up bits which can be used for individual features later.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Like we just did for hash, split the device tree scanning parts out and
call them from mmu_early_init_devtree().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Currently MMU initialisation (early_init_mmu()) consists of a mixture of
scanning the device tree, setting MMU feature bits, and then also doing
actual initialisation of MMU data structures.
We'd like to decouple the setting of the MMU features from the actual
setup. So split out the device tree scanning, and associated code, and
call it from mmu_init_early_devtree().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Empty for now, but we'll add to it in the next patch.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Currently we depend on mmu_has_feature to evalute to zero based on
MMU_FTRS_POSSIBLE mask. In a later patch, we want to update
radix_enabled() to runtime update the conditional operation to a jump
instruction. This implies we cannot depend on MMU_FTRS_POSSIBLE mask.
Instead define radix_enabled to return 0 if RADIX_MMU is not enabled.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We are going to add asm changes in the follow up patches. Add the
feature bit now so that we can get it all build.
mpe: When CONFIG_PPC_RADIX_MMU=n we omit MMU_FTR_RADIX from the
MMU_FTRS_POSSIBLE mask. This allows the compiler to work out that those
checks will always be false and so the code can be elided completely.
Note we do *not* define MMU_FTR_RADIX to 0 in the RADIX_MMU=n case,
because that doesn't work with the ASM_FTR patching. In particular an
IF_SET section will result in a mask and value of zero, which is always
true, meaning the section *won't* be patched, which is the opposite of
what we want.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This adds routines for early setup for radix. We use device tree
property "ibm,processor-radix-AP-encodings" to find supported page
sizes. If we don't find the above we consider 64K and 4K as supported
page sizes.
We do map vmemap using 2M page size if we can. The linear mapping is
done such that we use required page size for that range. For example
memory of 3.5G is mapped such that we use 1G mapping till 3G range and
use 2M mapping for the rest.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
In this patch we add the radix Kconfig and conditional check.
radix_enabled() is written to always return 0 here. Once we have all
needed radix changes added, we will update this to an mmu_feature check.
We need to add this early so that we can get it all build in the early
stage.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Add structs and #defines related to the radix MMU partition table
format. We also add a ppc_md callback for updating a partition table
entry.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Start moving code that is generic between radix and hash to book3s64
specific headers from the book3s64 hash specific one.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|