summaryrefslogtreecommitdiffstats
path: root/drivers/pci/pcie/err.c
AgeCommit message (Collapse)AuthorFilesLines
2018-10-02PCI/ERR: Remove duplicated include from err.cYueHaibing1-1/+0
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-10-02PCI: Unify device inaccessibleKeith Busch1-6/+4
Bring surprise removals and permanent failures together so we no longer need separate flags. The implementation enforces that error handling will not be able to override a surprise removal's permanent channel failure. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Sinan Kaya <okaya@kernel.org>
2018-10-02PCI/ERR: Always report current recovery status for udevKeith Busch1-3/+2
A device still participates in error recovery even if it doesn't have the error callbacks. Always provide the status for user event watchers. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Sinan Kaya <okaya@kernel.org>
2018-10-02PCI/ERR: Simplify broadcast calloutsKeith Busch1-69/+38
There is no point in having a generic broadcast function if it needs to have special cases for each callback it broadcasts. Abstract the error broadcast to only the necessary information and removes the now unnecessary helper to walk the bus. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Sinan Kaya <okaya@kernel.org>
2018-09-26PCI/ERR: Run error recovery callbacks for all affected devicesKeith Busch1-64/+21
If an Endpoint reported an error with ERR_FATAL, we previously ran driver error recovery callbacks only for the Endpoint's driver. But if we reset a Link to recover from the error, all downstream components are affected, including the Endpoint, any multi-function peers, and children of those peers. Initiate the Link reset from the deepest Downstream Port that is reliable, and call the error recovery callbacks for all its children. If a Downstream Port (including a Root Port) reports an error, we assume the Port itself is reliable and we need to reset its downstream Link. In all other cases (Switch Upstream Ports, Endpoints, Bridges, etc), we assume the Link leading to the component needs to be reset, so we initiate the reset at the parent Downstream Port. This allows two other clean-ups. First, we currently only use a Link reset, which can only be initiated using a Downstream Port, so we can remove checks for Endpoints. Second, the Downstream Port where we initiate the Link reset is reliable (unlike components downstream from it), so the special cases for error detect and resume are no longer necessary. Signed-off-by: Keith Busch <keith.busch@intel.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Sinan Kaya <okaya@kernel.org>
2018-09-26PCI/ERR: Handle fatal error recoveryKeith Busch1-69/+6
We don't need to be paranoid about the topology changing while handling an error. If the device has changed in a hotplug capable slot, we can rely on the presence detection handling to react to a changing topology. Restore the fatal error handling behavior that existed before merging DPC with AER with 7e9084b36740 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices"). Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Sinan Kaya <okaya@kernel.org>
2018-09-21PCI/ERR: Use slot reset if availableKeith Busch1-1/+1
The secondary bus reset may have link side effects that a hotplug capable port may incorrectly react to. Use the slot specific reset for hotplug ports, fixing the undesirable link down-up handling during error recovering. Signed-off-by: Keith Busch <keith.busch@intel.com> [bhelgaas: fold in https://lore.kernel.org/linux-pci/20180926152326.14821-1-keith.busch@intel.com for issue reported by Stephen Rothwell <sfr@canb.auug.org.au>] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Sinan Kaya <okaya@kernel.org>
2018-09-17PCI: Simplify disconnected markingLukas Wunner1-6/+2
Commit 89ee9f768003 ("PCI: Add device disconnected state") iterates over the devices on a parent bus, marks each as disconnected, then marks each device's children as disconnected using pci_walk_bus(). The same can be achieved more succinctly by calling pci_walk_bus() on the parent bus. Moreover, this does not need to wait until acquiring pci_lock_rescan_remove(), so move it out of that critical section. The critical section in err.c contains a pci_dev_get() / pci_dev_put() pair which was apparently copy-pasted from pciehp_pci.c. In the latter it serves the purpose of holding the struct pci_dev in place until the Command register is updated. err.c doesn't do anything like that, hence the pair is unnecessary. Remove it. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Oza Pawandeep <poza@codeaurora.org> Cc: Sinan Kaya <okaya@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2018-08-15Merge branch 'pci/virtualization'Bjorn Helgaas1-2/+4
- To avoid bus errors, enable PASID only if entire path supports End-End TLP prefixes (Sinan Kaya) - Unify slot and bus reset functions and remove hotplug knowledge from callers (Sinan Kaya) - Add Function-Level Reset quirks for Intel and Samsung NVMe devices to fix guest reboot issues (Alex Williamson) - Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD Controller (Bjorn Helgaas) * pci/virtualization: PCI: Add function 1 DMA alias quirk for Marvell 88SS9183 PCI: Delay after FLR of Intel DC P3700 NVMe PCI: Disable Samsung SM961/PM961 NVMe before FLR PCI: Export pcie_has_flr() PCI: Rename pci_try_reset_bus() to pci_reset_bus() PCI: Deprecate pci_reset_bus() and pci_reset_slot() functions PCI: Unify try slot and bus reset API PCI: Hide pci_reset_bridge_secondary_bus() from drivers IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset PCI: Handle error return from pci_reset_bridge_secondary_bus() PCI/IOV: Tidy pci_sriov_set_totalvfs() PCI: Enable PASID only if entire path supports End-End TLP prefixes # Conflicts: # drivers/pci/hotplug/pciehp_hpc.c
2018-08-15Merge branch 'pci/aer'Bjorn Helgaas1-9/+6
- Decode AER errors with names similar to "lspci" (Tyler Baicar) - Expose AER statistics in sysfs (Rajat Jain) - Clear AER status bits selectively based on the type of recovery (Oza Pawandeep) - Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru Gagniuc) - Don't clear AER status bits if we're using the "Firmware-First" strategy where firmware owns the registers (Alexandru Gagniuc) * pci/aer: PCI/AER: Don't clear AER bits if error handling is Firmware-First PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset PCI/AER: Clear device status bits during ERR_COR handling PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path PCI/AER: Factor out ERR_NONFATAL status bit clearing PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery PCI/AER: Clear only ERR_FATAL status bits during fatal recovery PCI/AER: Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST PCI/AER: Add sysfs attributes for rootport cumulative stats PCI/AER: Add sysfs attributes to provide AER stats and breakdown PCI/AER: Define aer_stats structure for AER capable devices PCI/AER: Move internal declarations to drivers/pci/pci.h PCI/AER: Adopt lspci names for AER error decoding PCI/AER: Expose internal API for obtaining AER information # Conflicts: # drivers/pci/pci.h
2018-07-26PCI/AER: Work around use-after-free in pcie_do_fatal_recovery()Thomas Tai1-0/+2
When an fatal error is received by a non-bridge device, the device is removed, and pci_stop_and_remove_bus_device() deallocates the device structure. The freed device structure is used by subsequent code to send uevents and print messages. Hold a reference on the device until we're finished using it. This is not an ideal fix because pcie_do_fatal_recovery() should not use the device at all after removing it, but that's too big a project for right now. Fixes: 7e9084b36740 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices") Signed-off-by: Thomas Tai <thomas.tai@oracle.com> [bhelgaas: changelog, reduce get/put coverage] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATALOza Pawandeep1-0/+2
Clear the device status bits while handling both ERR_FATAL and ERR_NONFATAL cases. Signed-off-by: Oza Pawandeep <poza@codeaurora.org> [bhelgaas: rename to pci_aer_clear_device_status(), declare internal to PCI core instead of exposing it everywhere] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL pathOza Pawandeep1-8/+3
broadcast_error_message() is only used for ERR_NONFATAL events, when the state is always pci_channel_io_normal, so remove the unused alternate path. Signed-off-by: Oza Pawandeep <poza@codeaurora.org> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20PCI/AER: Clear only ERR_FATAL status bits during fatal recoveryBjorn Helgaas1-1/+1
During recovery from fatal errors, we previously called pci_cleanup_aer_uncorrect_error_status(), which cleared *all* uncorrectable error status bits (both ERR_FATAL and ERR_NONFATAL). Instead, call a new pci_aer_clear_fatal_status() that clears only the ERR_FATAL bits (as indicated by the PCI_ERR_UNCOR_SEVER register). Based-on-patch-by: Oza Pawandeep <poza@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19PCI: Hide pci_reset_bridge_secondary_bus() from driversSinan Kaya1-1/+1
Rename pci_reset_bridge_secondary_bus() to pci_bridge_secondary_bus_reset() and move the declaration from linux/pci.h to drivers/pci.h to be used internally in PCI directory only. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19PCI: Handle error return from pci_reset_bridge_secondary_bus()Sinan Kaya1-2/+4
Commit 01fd61c0b9bd ("PCI: Add a return type for pci_reset_bridge_secondary_bus()") added a return value to the function to return if a device is accessible following a reset. Callers are not checking the value. Pass error code up high in the stack if device is not accessible. Fixes: 01fd61c0b9bd ("PCI: Add a return type for pci_reset_bridge_secondary_bus()") Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-06-02PCI/AER: Pass service type to pcie_do_fatal_recovery()Oza Pawandeep1-5/+6
Pass the service type to pcie_do_fatal_recovery() instead of assuming AER. We will make DPC also use pcie_do_fatal_recovery(), and it needs to do things a little differently for AER and DPC. Signed-off-by: Oza Pawandeep <poza@codeaurora.org> [bhelgaas: split to separate patch] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-05-17PCI/portdrv: Add generic pcie_port_find_service()Oza Pawandeep1-3/+1
Add generic pcie_port_find_service() routine. Signed-off-by: Oza Pawandeep <poza@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com>
2018-05-17PCI/AER: Factor out error reporting to drivers/pci/pcie/err.cOza Pawandeep1-0/+389
Move the error reporting callbacks from aerdrv_core.c to err.c, where they can be used by DPC in addition to AER. As part of aerdrv_core.c, these callbacks were built under CONFIG_PCIEAER. Moving them to the new err.c means they will now be built under CONFIG_PCIEPORTBUS, so adjust the definition of pci_uevent_ers() to match. Signed-off-by: Oza Pawandeep <poza@codeaurora.org> [bhelgaas: in reset_link(), initialize "driver" even if CONFIG_PCIEAER is unset, update pci_uevent_ers() #ifdef wrapper] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>