summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2014-10-08cxl: Add documentation for userspace APIsIan Munsie6-3/+527
This documentation gives an overview of the hardware architecture, userspace APIs via /dev/cxl/afuM.N and the syfs files. It also adds a MAINTAINERS file entry for cxl. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08cxl: Add driver to Kbuild and MakefilesIan Munsie2-0/+19
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08cxl: Add userspace header fileIan Munsie3-0/+90
This adds a header file for use by userspace programs wanting to interact with the kernel cxl driver. It defines structs and magic numbers required for userspace to interact with devices in /dev/cxl/afuM.N. Further documentation on this interface is added in a subsequent patch in Documentation/powerpc/cxl.txt. It also adds this new userspace header file to Kbuild so it's exported when doing "make headers_installs". Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08cxl: Driver code for powernv PCIe based cards for userspace accessIan Munsie10-0/+4453
This is the core of the cxl driver. It adds support for using cxl cards in the powernv environment only (ie POWER8 bare metal). It allows access to cxl accelerators by userspace using the /dev/cxl/afuM.N char devices. The kernel driver has no knowledge of the function implemented by the accelerator. It provides services to userspace via the /dev/cxl/afuM.N devices. When a program opens this device and runs the start work IOCTL, the accelerator will have coherent access to that processes memory using the same virtual addresses. That process may mmap the device to access any MMIO space the accelerator provides. Also, reads on the device will allow interrupts to be received. These services are further documented in a later patch in Documentation/powerpc/cxl.txt. Documentation of the cxl hardware architecture and userspace API is provided in subsequent patches. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08cxl: Add base builtin supportIan Munsie5-0/+97
This adds the base cxl support that cannot be built as a module. Specifically it adds the cxl callbacks that are called from the core powerpc mm code which must always exist irrespective of if the cxl module is loaded or not. This is similar to how cell works with CONFIG_SPU_BASE. This adds a cxl_slbia() call (similar to spu_flush_all_slbs()) which checks if the cxl module is loaded and in use, returning immediately if it is not. If it is in use it calls into the cxl SLB invalidation code. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08powerpc/mm: Add hooks for cxlIan Munsie2-1/+7
This adds hooks into the core powerpc mm code for cxl. The core powerpc code sometimes uses local tlbie. Unfortunately this won't work with the current cxl driver as it relies on snooping tlbie broadcasts. The cxl hardware can have TLB entries invalidated via MMIO but this is not currently supported by the driver. In future we can make local tlbie smarter so that it invalidates cxl contexts via MMIO when it needs to but for now we have this workaround. This workaround checks for any active cxl contexts and if so, disables local tlbie. This also adds a hook for when SLBs are invalidated. This ensures any corresponding SLBs in cxl are also invalidated at the same time. This is required for segment demotion. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08powerpc/opal: Add PHB to cxl mode callIan Munsie2-0/+3
This adds the OPAL call to change a PHB into cxl mode. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08powerpc/mm: Add new hash_page_mm()Ian Munsie2-7/+18
This adds a new function hash_page_mm() based on the existing hash_page(). This version allows any struct mm to be passed in, rather than assuming current. This is useful for servicing co-processor faults which are not in the context of the current running process. We need to be careful here as the current hash_page() assumes current in a few places. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08powerpc/powerpc: Add new PCIe functions for allocating cxl interruptsIan Munsie2-0/+185
This adds a number of functions for allocating IRQs under powernv PCIe for cxl. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08cxl: Add new header for call backs and structsIan Munsie1-0/+48
This new header adds callbacks and structs needed by the rest of the kernel to hook into the cxl infrastructure. This adds the cxl_ctx_in_use() function for use in the mm code to see if any cxl contexts are currently in use. This is used by the tlbie() to determine if it can do local TLB invalidations or not. This also adds get/put calls for the cxl driver module to refcount the active cxl contexts. cxl_ctx_get/put/in_use are static inlined here as they are called in tlbie which we want to be fast (mpe's suggestion). Empty functions are provided when CONFIG_CXL_BASE is not enabled. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08powerpc/powernv: Split out set MSI IRQ chip codeIan Munsie1-18/+24
Some of the MSI IRQ code in pnv_pci_ioda_msi_setup() is generically useful so split it out. This will be used by some of the cxl PCIe code later. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psizeIan Munsie1-0/+2
Export mmu_kernel_ssize and mmu_linear_psize. These are needed by the cxl driver which has it's own MMU. To setup the MMU cxl needs access to these. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08powerpc/msi: Improve IRQ bitmap allocatorIan Munsie1-11/+25
Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests to the nearest power of 2. eg. ask for 5 IRQs and you'll get 8. This wastes a lot of IRQs which can be a scarce resource. For cxl we may require multiple IRQs for every context that is attached to the accelerator. There may be 1000s of contexts attached, hence we can easily run out of IRQs, especially if we are needlessly wasting them. This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number of IRQs, hence avoiding this wastage. It keeps the natural alignment requirement though. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08powerpc/cell: Make spu_flush_all_slbs() genericIan Munsie4-14/+21
This moves spu_flush_all_slbs() into a generic call copro_flush_all_slbs(). This will be useful when we add cxl which also needs a similar SLB flush call. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08powerpc/cell: Move data segment faulting code out of cell platformIan Munsie5-49/+69
__spu_trap_data_seg() currently contains code to determine the VSID and ESID required for a particular EA and mm struct. This code is generically useful for other co-processors. This moves the code of the cell platform so it can be used by other powerpc code. It also adds 1TB segment handling which Cell didn't support. The new function is called copro_calculate_slb(). This also moves the internal struct spu_slb to a generic struct copro_slb which is now used in the Cell and copro code. We use this new struct instead of passing around esid and vsid parameters. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08powerpc/cell: Move spu_handle_mm_fault() out of cell platformIan Munsie8-14/+33
Currently spu_handle_mm_fault() is in the cell platform. This code is generically useful for other non-cell co-processors on powerpc. This patch moves this function out of the cell platform into arch/powerpc/mm so that others may use it. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-07powerpc/pseries: Use new defines when calling H_SET_MODEMichael Neuling1-6/+6
Now that we define these in the KVM code, use these defines when we call H_SET_MODE. No functional change. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-07powerpc: Update contact info in Documentation filessukadev@linux.vnet.ibm.com2-9/+9
Cody's email address has changed. Update the contact information for the 24x7 and GPCI counters to the PowerPC developers mailing list. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-07powerpc/perf/hv-24x7: Simplify catalog_read()sukadev@linux.vnet.ibm.com1-87/+14
catalog_read() implements the read interface for the sysfs file /sys/bus/event_source/devices/hv_24x7/interface/catalog It essentially takes a buffer, an offset and count as parameters to the read() call. It makes a hypervisor call to read a specific page from the catalog and copy the required bytes into the given buffer. Each call to catalog_read() returns at most one 4K page. Given these requirements, we should be able to simplify the catalog_read(). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-07powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack allocationsCody P Schafer1-18/+37
Ian pointed out the use of __aligned(4096) caused rather large stack consumption in single_24x7_request(), so use the kmem_cache hv_page_cache (which we've already got set up for other allocations) insead of allocating locally. CC: Haren Myneni <hbabu@us.ibm.com> Reported-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Cody P Schafer <dev@codyps.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-07powerpc/powernv: Fix endian bug in LPC bus debugfs accessorsBenjamin Herrenschmidt1-1/+3
When reading from the LPC, the OPAL FW calls return the value via pointer to a uint32_t which is always returned big endian. Our internal inb/outb implementation byteswaps that fine but our debugfs code is still broken. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-04Merge branch 'next' of ↵Michael Ellerman27-204/+610
git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git Freescale updates from Scott (27 commits): "Highlights include DMA32 zone support (SATA, USB, etc now works on 64-bit FSL kernels), MSI changes, 8xx optimizations and cleanup, t104x board support, and PrPMC PCI enumeration."
2014-10-03powerpc: Enable CONFIG_CRASH_DUMP=y for ppc64_defconfigMichael Ellerman1-0/+1
It pulls in more code, including causing us to build a relocatable kernel, which is good for testing. The resulting kernel is still usable as a non-crash dump kernel. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-03powerpc/kdump: crash_dump.c needs to include io.hMichael Ellerman1-0/+1
For __ioremap(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-03powerpc: Don't build powernv for other platform defconfigsMichael Ellerman5-0/+5
Because powernv arrived after these other platforms, the defconfigs didn't have PPC_POWERNV disabled, and being default y it gets turned on. If we're going to bother having defconfigs for the specific platforms then they should only build the code required for those platforms. The grab bag of everything config is ppc64_defconfig. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-03powerpc/pci: remove duplicate declaration of pci_bus_find_capabilityWei Yang1-1/+0
pci_bus_find_capability() is decleared in pci.h, so it is not necessary to do it again. This patch removes it. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-03powerpc/iommu/ddw: Fix endiannessAlexey Kardashevskiy1-23/+28
rtas_call() accepts and returns values in CPU endianness. The ddw_query_response and ddw_create_response structs members are defined and treated as BE but as they are passed to rtas_call() as (u32 *) and they get byteswapped automatically, the data is CPU-endian. This fixes ddw_query_response and ddw_create_response definitions and use. of_read_number() is designed to work with device tree cells - it assumes the input is big-endian and returns data in CPU-endian. However due to the ddw_create_response struct fix, create.addr_hi/lo are already CPU-endian so do not byteswap them. ddw_avail is a pointer to the "ibm,ddw-applicable" property which contains 3 cells which are big-endian as it is a device tree. rtas_call() accepts a RTAS token in CPU-endian. This makes use of of_property_read_u32_array to byte swap and avoid the need for a number of be32_to_cpu calls. Cc: stable@vger.kernel.org # v3.13+ Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> [aik: folded Anton's patch with of_property_read_u32_array] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-02powerpc: Add printk levels to powerpc codeAnton Blanchard5-10/+10
Add printk levels to some places in the powerpc port. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-02powerpc: Add printk levels to powernv platform codeAnton Blanchard2-4/+4
Add printk levels to powernv platform code, and convert to pr_err() etc while here. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-02powerpc: Remove powerpc specific cmd_lineAnton Blanchard10-18/+12
There is no need for yet another copy of the command line, just use boot_command_line like everyone else. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-02powerpc: Use pr_fmt in module loader codeAnton Blanchard2-36/+31
Use pr_fmt to give some context to the error messages in the module code, and convert open coded debug printk to pr_debug. Use pr_err for error messages. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-02powerpc: Fill in si_addr_lsb siginfo fieldAnton Blanchard1-0/+8
Fill in the si_addr_lsb siginfo field so the hwpoison code can pass to userspace the length of memory that has been corrupted. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-02powerpc: Add VM_FAULT_HWPOISON handling to powerpc page fault handlerAnton Blanchard1-6/+11
do_page_fault was missing knowledge of HWPOISON, and we would oops if userspace tried to access a poisoned page: kernel BUG at arch/powerpc/mm/fault.c:180! Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-02powerpc: Simplify do_sigbusAnton Blanchard1-10/+10
Exit out early for a kernel fault, avoiding indenting of most of the function. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-02powerpc: Speed up clear_page by unrolling itAnton Blanchard1-11/+31
Unroll clear_page 8 times. A simple microbenchmark which allocates and frees a zeroed page: for (i = 0; i < iterations; i++) { unsigned long p = __get_free_page(GFP_KERNEL | __GFP_ZERO); free_page(p); } improves 20% on POWER8. This assumes cacheline sizes won't grow beyond 512 bytes or page sizes wont drop below 1kB, which is unlikely, but we could add a runtime check during early init if it makes people nervous. Michael found that some versions of gcc produce quite bad code (all multiplies), so we give gcc a hand by using shifts and adds. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-01powerpc/eeh: Show hex prefix for PE state sysfsGavin Shan1-1/+1
As Michael suggested, the hex prefix for the output of EEH PE state sysfs entry (/sys/bus/pci/devices/xxx/eeh_pe_state) is always informative to users. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/powernv: Override dma_get_required_mask()Gavin Shan7-4/+62
The dma_get_required_mask() function is used by some drivers to query the platform about what DMA mask is needed to cover all of memory. This is a bit of a strange semantic when we have to choose between IOMMU translation or bypass, but essentially what it means is "what DMA mask will give best performances". Currently, our IOMMU backend always returns a 32-bit mask here, we don't do anything special to it when we have bypass available. This causes some drivers to choose a 32-bit mask, thus losing the ability to use the bypass window, thinking this is more efficient. The problem was reported from the driver of following device: 0004:03:00.0 0107: 1000:0087 (rev 05) 0004:03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios \ Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05) This patch adds an override of that function in order to, instead, return a 64-bit mask whenever a bypass window is available in order for drivers to prefer this configuration. Reported-by: Murali N. Iyer <mniyer@us.ibm.com> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/powernv: Fetch frozen PE on top levelGavin Shan1-14/+34
It should have been part of commit 1ad7a72c5 ("powerpc/eeh: Report frozen parent PE prior to child PE"). There are 2 ways to report EEH errors: proactively polling because of 0xFF's returned from PCI config or IO read, or interrupt driven event. We missed to report and handle parent frozen PE prior to child frozen PE for the later case on PowerNV platform. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Dump PCI config space for all child devicesGavin Shan1-15/+20
The PEs can be organized as nested. Current implementation doesn't dump PCI config space for subordinate devices of child PEs. However, the frozen PE could be caused by those subordinate devices of its child PEs. The patch dumps PCI config space for all subordinate devices of the problematic PE. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Emulate EEH recovery for VFIO devicesGavin Shan3-6/+144
When enabling EEH functionality on passed through devices (PE) with VFIO, the devices in the PE would be removed permanently from guest side. In that case, the PE remains frozen state. When returning PE to host, or restarting the guest again, we had mechanism unfreezing the PE by clearing PESTA/B frozen bits. However, that's not enough for some adapters, which are indicated as following "lspci" shows. Those adapters require hot reset on the parent bus to bring their firmware back to workable state. Otherwise, those adaptrs won't be operative and the host (for returning case) or the guest will fail to load the drivers for those adapters without exception. 0000:01:00.0 Ethernet controller: Emulex Corporation OneConnect \ 10Gb NIC (be3) (rev 02) 0000:01:00.0 0200: 19a2:0710 (rev 02) 0001:03:00.0 Ethernet controller: Emulex Corporation OneConnect \ NIC (Lancer) (rev 10) 0001:03:00.0 0200: 10df:e220 (rev 10) The patch adds mechanism to emulate EEH recovery (for hot reset on parent PCI bus) on 3 gates to fix the issue: open/release one adapter of the PE, enable EEH functionality on one adapter of the PE. Reported-by: Murilo Fossa Vicentini <muvic@br.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Tag reset state for user owned PEGavin Shan1-0/+2
PE would be owned by userland, which probably request PE reset done in host side. During the reset, we should drop the PCI config accesses to the PE with help of flag EEH_PE_RESET. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/powernv: Sync OpalPciResetScope with firmwareGavin Shan3-11/+14
The names of PCI reset scopes aren't sychronized with firmware. The patch fixes it. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/pseries: Decrease message level on EEH initializationGavin Shan1-25/+10
As Anton suggested, the patch decreases the message level on EEH initialization to avoid unnecessary messages if required. Also, we have unified hint if any of needful RTAS calls is missed, and then we can check /proc/device-tree to figure out the missed RTAS calls. Suggested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Block PCI config access during resetGavin Shan1-0/+4
Function pcibios_set_pcie_reset_state() can be used to do PCI reset. PCI config access during the reset usually causes EEH errors unexpectedly. In order to avoid the EEH error, the patch blocks PCI config access during reset with the help of flag EEH_PE_RESET, which is similar to what we did in EEH PE reset path. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Use eeh_unfreeze_pe()Gavin Shan3-50/+8
The patch uses eeh_unfreeze_pe() to replace the logic clearing frozen IO and DMA, in order to simplify the code. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Unfreeze PE on enabling EEH functionalityGavin Shan2-28/+33
When passing through PE to guest, that's possibly in frozen state. The driver for the pass-through devices on guest side can't be loaded successfully as reported. We already had one gate in eeh_dev_open() to clear PE frozen state accordingly, but that's not enough because the function is only called at QEMU startup for once. The patch adds another gate in eeh_pe_set_option() so that the PE frozen state can be cleared at QEMU restart time. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Fix improper condition in eeh_pci_enable()Gavin Shan1-16/+42
The function eeh_pci_enable() is called to apply various requests to one particular PE: Enabling EEH, Disabling EEH, Enabling IO, Enabling DMA, Freezing PE. When enabling IO or DMA on one specific PE, we need check that IO or DMA isn't enabled previously. But the condition used to do the check isn't completely correct because one PE would be in DMA frozen state with workable IO path, or vice versa. The patch fixes the improper condition. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Clear frozen device state in timeGavin Shan1-3/+18
The problem was reported by Carol: In the scenario of passing mlx4 adapter to guest, EEH error could be recovered successfully. When returning the device back to host, the driver (mlx4_core.ko) couldn't be loaded successfully because of error number -5 (-EIO) returned from mlx4_get_ownership(), which hits offlined PCI device. The root cause is that we missed to put the affected devices into normal state on clearing PE isolated state right after PE reset. The patch fixes above issue by putting the affected devices to normal state when clearing PE isolated state in eeh_pe_state_clear(). Cc: stable@vger.kernel.org Reported-by: Carol L. Soto <clsoto@us.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/powernv: Clear PAPR error injection registersGavin Shan1-0/+25
The frozen state on one specific PE is probably caused by error injection, which is done with help of PAPR error injection registers. According to the hardware spec, those registers should be cleared automatically after one-shot frozen PE. However, that's not always true, at least on P7IOC of Firebird-L. So we have to clear them before doing PE reset to avoid recursive EEH errors at recovery stage. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/powernv: Add PCI error injection debugfs entryMike Qiu1-0/+52
The patch adds debugfs file (/sys/kernel/debug/powerpc/PCIxxxx/ err_injct), which accepts following formated string, to support error injection. It will be used to support userland utility "errinjct" in future. "pe_no:0:function:address:mask" - 32-bits PCI errors "pe_no:1:function:address:mask" - 64-bits PCI errors Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>