linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2018-08-09	PCI: Add ACS Redirect disable quirk for Intel Sunrise Point	Logan Gunthorpe	1	-0/+25
	Intel Sunrise Point PCH hardware has an implementation of the ACS bits that does not comply with the PCIe standard. Add a device-specific quirk, pci_quirk_disable_intel_spt_pch_acs_redir() to disable ACS Redirection on this system. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> [bhelgaas: changelog, split to separate patch] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-09	PCI: Add device-specific ACS Redirect disable infrastructure	Logan Gunthorpe	3	-8/+43
	Intel Sunrise Point (SPT) PCH hardware has an implementation of the ACS bits that does not comply with the PCIe standard. To deal with this we need device-specific quirks to disable ACS redirection. Add a new pci_dev_specific_disable_acs_redir() quirk and a new .disable_acs_redir() function pointer for use by non-compliant devices. No functional change intended. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> [bhelgaas: split to separate patch, move pci_dev_specific_disable_acs_redir() declarations to drivers/pci/pci.h] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-09	PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE	Logan Gunthorpe	1	-9/+9
	Convert the search for device-specific ACS enable quirks from searching a NULL-terminated array to iterating through the array, which is always fixed-size anyway. No functional change intended. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> [bhelgaas: changelog, split to separate patch for reviewability] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-09	PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support	Logan Gunthorpe	1	-2/+71
	To support peer-to-peer traffic on a segment of the PCI hierarchy, we must disable the ACS redirect bits for select PCI bridges. The bridges must be selected before the devices are discovered by the kernel and the IOMMU groups created. Therefore, add a kernel command line parameter to specify devices which must have their ACS bits disabled. The new parameter takes a list of devices separated by a semicolon. Each device specified will have its ACS redirect bits disabled. This is similar to the existing 'resource_alignment' parameter. The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P Egress Control bits are disabled, which is sufficient to always allow passing P2P traffic uninterrupted. The bits are set after the kernel (optionally) enables the ACS bits itself. It is also done regardless of whether the kernel or platform firmware sets the bits. If the user tries to disable the ACS redirect for a device without the ACS capability, print a warning to dmesg. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> [bhelgaas: reorder to add the generic code first and move the device-specific quirk to subsequent patches] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Christian König <christian.koenig@amd.com>
2018-08-09	PCI: Allow specifying devices using a base bus and path of devfns	Logan Gunthorpe	1	-21/+97
	When specifying PCI devices on the kernel command line using a bus/device/function address, bus numbers can change when adding or replacing a device, changing motherboard firmware, or applying kernel parameters like "pci=assign-buses". When bus numbers change, it's likely the command line tweak will be applied to the wrong device. Therefore, it is useful to be able to specify devices with a base bus number and the path of devfns needed to get to it, similar to the "device scope" structure in the Intel VT-d spec, Section 8.3.1. Thus, we add an option to specify devices in the following format: [<domain>:]<bus>:<device>.<func>[/<device>.<func>]* The path can be any segment within the PCI hierarchy of any length and determined through the use of 'lspci -t'. When specified this way, it is less likely that a renumbered bus will result in a valid device specification and the tweak won't be applied to the wrong device. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> [bhelgaas: use "device" instead of "slot" in documentation since that's the usual language in the PCI specs] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Christian König <christian.koenig@amd.com>
2018-08-09	PCI: Make specifying PCI devices in kernel parameters reusable	Logan Gunthorpe	1	-54/+103
	Separate out the code to match a PCI device with a string (typically originating from a kernel parameter) from the pci_specified_resource_alignment() function into its own helper function. While we are at it, this change fixes the kernel style of the function (fixing a number of long lines and extra parentheses). Additionally, make the analogous change to the kernel parameter documentation: Separate the description of how to specify a PCI device into its own section at the head of the "pci=" parameter. This patch should have no functional alterations. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> [bhelgaas: use "device" instead of "slot" in documentation since that's the usual language in the PCI specs] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Christian König <christian.koenig@amd.com>
2018-08-09	PCI: Hide ACS quirk declarations inside PCI core	Bjorn Helgaas	1	-0/+14
	Move declarations for these functions: pci_dev_specific_acs_enabled() pci_dev_specific_enable_acs() from include/linux/pci.h to drivers/pci/pci.h because nothing outside the PCI core needs to use them. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-08-09	PCI: Delay after FLR of Intel DC P3700 NVMe	Alex Williamson	1	-0/+22
	Add a device-specific reset for Intel DC P3700 NVMe device which exhibits a timeout failure in drivers waiting for the ready status to update after NVMe enable if the driver interacts with the device too soon after FLR. As this has been observed in device assignment scenarios, resolve this with a device-specific reset quirk to add an additional, heuristically determined, delay after the FLR completes. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1592654 Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-08-09	PCI: Disable Samsung SM961/PM961 NVMe before FLR	Alex Williamson	1	-0/+83
	The Samsung SM961/PM961 (960 EVO) sometimes fails to return from FLR with the PCI config space reading back as -1. A reproducible instance of this behavior is resolved by clearing the enable bit in the NVMe configuration register and waiting for the ready status to clear (disabling the NVMe controller) prior to FLR. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1542494 Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-08-09	PCI: Export pcie_has_flr()	Alex Williamson	1	-1/+2
	pcie_flr() suggests pcie_has_flr() to ensure that PCIe FLR support is present prior to calling. pcie_flr() is exported while pcie_has_flr() is not. Resolve this. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-08-08	PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers()	Thomas Petazzoni	1	-5/+0
	This comment has been there since the driver was introduced, but seems to be a leftover from previous iterations of the driver. Indeed, we do not lookup in a list to find the register ranges that matches the given port/lane, as the "reg" property is in each sub-node representing a PCI port. There is no lookup involved at all. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2018-08-08	PCI: mvebu: Convert to use pci_host_bridge directly	Thomas Petazzoni	1	-73/+63
	Rather than using the ARM-specific pci_common_init_dev() API, use the pci_host_bridge logic directly. Unfortunately, we can't use devm_of_pci_get_host_bridge_resources(), because the DT binding for describing PCIe apertures for this PCI controller is a bit special, and we cannot retrieve them from the 'ranges' property. Therefore, we still have some special code to handle this. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2018-08-08	PCI: mvebu: Use resource_size() to remap I/O space	Thomas Petazzoni	1	-2/+2
	Instead of hardcoding the remapping of IO_SPACE_LIMIT - SZ_64K, use resource_size(). However, we cannot use just IO_SPACE_LIMIT, because pci_ioremap_io() has a bug and doesn't allow remapping the last 64 KB before IO_SPACE_LIMIT, so we ensure that we do not exceed this limit. When the pci_ioremap_io() issue is fixed, this work around can be dropped. Note that this workaround already existed, since we were mapping only up to IO_SPACE_LIMIT - SZ_64K. Suggested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> [lorenzo.pieralisi@arm.com: tweaked the commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2018-08-08	PCI: mvebu: Only remap I/O space if configured	Thomas Petazzoni	1	-3/+3
	If there is no PCI I/O aperture configured in the Device Tree, it does not make sense to create the virtual mapping for the PCI I/O space, since we will anyway not create the MBus window that will allow to access it. Therefore, do the pci_ioremap_io() only if necessary. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2018-08-08	PCI: mvebu: Fix I/O space end address calculation	Thomas Petazzoni	1	-1/+1
	pcie->realio.end should be the address of last byte of the area, therefore using resource_size() of another resource is not correct, we must substract 1 to get the address of the last byte. Fixes: 11be65472a427 ("PCI: mvebu: Adapt to the new device tree layout") Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2018-08-08	PCI: mvebu: Remove redundant platform_set_drvdata() call	Thomas Petazzoni	1	-2/+0
	This is already done earlier in mvebu_pcie_probe(). Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2018-08-06	PCI: Remove unnecessary include of <linux/pci-aspm.h>	Bjorn Helgaas	4	-4/+0
	Several PCI core files include pci-aspm.h even though they don't need anything provided by that file. Remove the unnecessary includes of it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Sinan Kaya <okaya@kernel.org>
2018-08-06	PCI/ASPM: Convert to use sysfs_match_string() helper	Andy Shevchenko	1	-5/+3
	The sysfs_match_string() helper returns index of the matching string in an array. Use it in pcie_aspm_set_policy() to simplify the code. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [bhelgaas: squash sysfs_match_string() fix into original patch for issue Reported-by: Heiner Kallweit <hkallweit1@gmail.com>] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-08-06	PCI/xilinx: Depend on OF instead of the ARCH	Christoph Hellwig	1	-1/+1
	There isn't a hard dependency of the Xilinx AXI-PCIe host bridge on any architecture. For example: at SiFive we map RISC-V cores to Xilinx FPGAs and connect the Xilinx IP via a TileLink adapter, so the RISC-V Linux port will need to be able to enable PCIE_XILINX in order to have PCIe support. This patch decouples the PCIE_XILINX support from ARCH. Instead it just depends on OF, which is the only true dependency. Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com> [hch: switch to OF instead of OF_PCI now that the latter is gone] Signed-off-by: Christoph Hellwig <hch@lst.de> [lorenzo.pieralisi@arm.com: trimmed the commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2018-08-05	Merge 4.18-rc7 into master to pick up the KVM dependcy	Thomas Gleixner	23	-42/+159
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05	x86: Don't include linux/irq.h from asm/hardirq.h	Nicolai Stange	1	-0/+1
	The next patch in this series will have to make the definition of irq_cpustat_t available to entering_irq(). Inclusion of asm/hardirq.h into asm/apic.h would cause circular header dependencies like asm/smp.h asm/apic.h asm/hardirq.h linux/irq.h linux/topology.h linux/smp.h asm/smp.h or linux/gfp.h linux/mmzone.h asm/mmzone.h asm/mmzone_64.h asm/smp.h asm/apic.h asm/hardirq.h linux/irq.h linux/irqdesc.h linux/kobject.h linux/sysfs.h linux/kernfs.h linux/idr.h linux/gfp.h and others. This causes compilation errors because of the header guards becoming effective in the second inclusion: symbols/macros that had been defined before wouldn't be available to intermediate headers in the #include chain anymore. A possible workaround would be to move the definition of irq_cpustat_t into its own header and include that from both, asm/hardirq.h and asm/apic.h. However, this wouldn't solve the real problem, namely asm/harirq.h unnecessarily pulling in all the linux/irq.h cruft: nothing in asm/hardirq.h itself requires it. Also, note that there are some other archs, like e.g. arm64, which don't have that #include in their asm/hardirq.h. Remove the linux/irq.h #include from x86' asm/hardirq.h. Fix resulting compilation errors by adding appropriate #includes to .c files as needed. Note that some of these .c files could be cleaned up a bit wrt. to their set of #includes, but that should better be done from separate patches, if at all. Signed-off-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-02	Merge tag 'pci-v4.18-fixes-5' of ↵	Linus Torvalds	6	-9/+21
	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - Fix integer overflow in new mobiveil driver (Dan Carpenter) - Fix race during NVMe removal/rescan (Hari Vyas) * tag 'pci-v4.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Fix is_added/is_busmaster race condition PCI: mobiveil: Avoid integer overflow in IB_WIN_SIZE
2018-07-31	PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition	Bjorn Helgaas	1	-2/+0
	PCI_EXP_AER_FLAGS was defined twice (with identical definitions), once under #ifdef CONFIG_ACPI_APEI, and again at the top level. This looks like my merge error from these commits: fd3362cb73de ("PCI/AER: Squash aerdrv_core.c into aerdrv.c") 41cbc9eb1a82 ("PCI/AER: Squash ecrc.c into aerdrv.c") Remove the duplicate PCI_EXP_AER_FLAGS definition. Fixes: 41cbc9eb1a82 ("PCI/AER: Squash ecrc.c into aerdrv.c") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Oza Pawandeep <poza@codeaurora.org>
2018-07-31	PCI: pciehp: Deduplicate presence check on probe & resume	Lukas Wunner	2	-31/+46
	On driver probe and on resume from system sleep, pciehp checks the Presence Detect State bit in the Slot Status register to bring up an occupied slot or bring down an unoccupied slot. Both code paths are identical, so deduplicate them per Mika's request. On probe, an additional check is performed to disable power of an unoccupied slot. This can e.g. happen if power was enabled by BIOS. It cannot happen once pciehp has taken control, hence is not necessary on resume: The Slot Control register is set to the same value that it had on suspend by pci_restore_state(), so if the slot was occupied, power is enabled and if it wasn't, power is disabled. Should occupancy have changed during the system sleep transition, power is adjusted by bringing up or down the slot per the paragraph above. To allow for deduplication of the presence check, move the power check to pcie_init(). This seems safer anyway, because right now it is performed while interrupts are already enabled, and although I can't think of a scenario where pciehp_power_off_slot() and the IRQ thread collide, it does feel brittle. However this means that pcie_init() may now write to the Slot Control register before the IRQ is requested. If both the CCIE and HPIE bits happen to be set, pcie_wait_cmd() will wait for an interrupt (instead of polling the Command Completed bit) and eventually emit a timeout message. Additionally, if a level-triggered INTx interrupt is used, the user may see a spurious interrupt splat. Avoid by disabling interrupts before disabling power. (Normally the HPIE and CCIE bits should be clear on probe, but conceivably they may already have been set e.g. by BIOS.) Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-31	PCI: pciehp: Avoid implicit fallthroughs in switch statements	Lukas Wunner	1	-0/+5
	Per Mika's request, add an explicit break to the last case of switch statements everywhere in pciehp to be more defensive towards future amendments. Per Gustavo's request, mark all non-empty implicit fallthroughs with a comment to silence warnings triggered by -Wimplicit-fallthrough=2. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2018-07-31	PCI: Fix is_added/is_busmaster race condition	Hari Vyas	5	-8/+20
	When a PCI device is detected, pdev->is_added is set to 1 and proc and sysfs entries are created. When the device is removed, pdev->is_added is checked for one and then device is detached with clearing of proc and sys entries and at end, pdev->is_added is set to 0. is_added and is_busmaster are bit fields in pci_dev structure sharing same memory location. A strange issue was observed with multiple removal and rescan of a PCIe NVMe device using sysfs commands where is_added flag was observed as zero instead of one while removing device and proc,sys entries are not cleared. This causes issue in later device addition with warning message "proc_dir_entry" already registered. Debugging revealed a race condition between the PCI core setting the is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue setting the is_busmaster bit in pci_set_master(). As these fields are not handled atomically, that clears the is_added bit. Move the is_added bit to a separate private flag variable and use atomic functions to set and retrieve the device addition state. This avoids the race because is_added no longer shares a memory location with is_busmaster. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283 Signed-off-by: Hari Vyas <hari.vyas@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31	PCI: Whitelist Thunderbolt ports for runtime D3	Lukas Wunner	1	-1/+5
	Thunderbolt controllers can be runtime suspended to D3cold to save ~1.5W. This requires that runtime D3 is allowed on its PCIe ports, so whitelist them. The 2015 BIOS cutoff that we've instituted for runtime D3 on PCIe ports is unnecessary on Thunderbolt because we know that even the oldest controller, Light Ridge (2010), is able to suspend its ports to D3 just fine -- specifically including its hotplug ports. And the power saving should be afforded to machines even if their BIOS predates 2015. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Andreas Noever <andreas.noever@gmail.com>
2018-07-31	PCI: Whitelist native hotplug ports for runtime D3	Lukas Wunner	1	-5/+10
	Previously we blacklisted PCIe hotplug ports for runtime D3 because: (a) Ports handled by the firmware must not be transitioned to D3 by the OS behind the firmware's back: https://bugzilla.kernel.org/show_bug.cgi?id=53811 (b) Ports handled natively by the OS lacked runtime D3 support in the pciehp driver. We've just rectified the latter, so allow users to manually enable and test it by passing pcie_port_pm=force on the command line. Vendors are thus put in a position to validate hotplug ports for runtime D3 and perhaps we can someday enable it by default, but with a BIOS cutoff date. Ashok Raj tested runtime D3 on hotplug ports of a SkyLake Xeon-SP in 2017 and encountered Hardware Error NMIs, so this feature clearly cannot be enabled for everyone yet: https://lkml.kernel.org/r/20170503180426.GA4058@otc-nc-03 While at it, remove an erroneous code comment I added with 97a90aee5dab ("PCI: Consolidate conditions to allow runtime PM on PCIe ports") which claims that parents of a hotplug port must stay awake lest interrupts cannot be delivered. That has turned out to be wrong at least for Thunderbolt hotplug ports. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-31	PCI: sysfs: Resume to D0 on function reset	Lukas Wunner	1	-0/+2
	When performing a function reset via sysfs, the device's config space is accessed in places such as pcie_flr() and its MMIO space is accessed e.g. in reset_ivb_igd(), so ensure accessibility by resuming the device to D0. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-31	PCI: pciehp: Resume parent to D0 on config space access	Lukas Wunner	2	-0/+21
	Ensure accessibility of a hotplug port's config space when accessed via sysfs by resuming its parent to D0. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-31	PCI: pciehp: Resume to D0 on enable/disable	Lukas Wunner	1	-0/+6
	pciehp's IRQ thread ensures accessibility of the port by runtime resuming its parent to D0. However when the slot is enabled/disabled, the port itself needs to be in D0 because its secondary bus is accessed in: pciehp_check_link_status(), pciehp_configure_device() (both called from board_added()) and pciehp_unconfigure_device() (called from remove_board()). Thus, acquire a runtime PM ref on enable/disablement of the slot. Yinghai Lu additionally discovered that some SkyLake servers feature a Power Controller for their PCIe hotplug ports (PCIe r3.1, sec 6.7.1.8) which requires the port to be in D0 when invoking pciehp_power_on_slot() (likewise called from board_added()). If slot power is turned on while in D3hot, link training later fails: https://lkml.kernel.org/r/20170205073454.GA253@wunner.de The spec is silent about such a requirement, but it seems prudent to assume that any hotplug port with a Power Controller may need this. The present commit holds a runtime PM ref whenever slot power is turned on and off, but it doesn't keep the port in D0 as long as slot power is on. If vendors determine that's necessary, they need to amend pciehp to acquire a runtime PM ref in pciehp_power_on_slot() and release one in pciehp_power_off_slot(). Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-31	PCI: pciehp: Support interrupts sent from D3hot	Lukas Wunner	2	-2/+48
	If a hotplug port is able to send an interrupt, one would naively assume that it is accessible at that moment. After all, if it wouldn't be accessible, i.e. if its parent is in D3hot and the link to the hotplug port is thus down, how should an interrupt come through? It turns out that assumption is wrong at least for Thunderbolt: Even though its parents are in D3hot, a Thunderbolt hotplug port is able to signal interrupts. Because the port's config space is inaccessible and resuming the parents may sleep, the hard IRQ handler has to defer runtime resuming the parents and reading the Slot Status register to the IRQ thread. If the hotplug port uses a level-triggered INTx interrupt, it needs to be masked until the IRQ thread has cleared the signaled events. For simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts. Note that if the interrupt is shared (which can only happen for INTx), other devices are starved from receiving interrupts until the IRQ thread is scheduled, has runtime resumed the hotplug port's parents and has read and cleared the Slot Status register. That delay is dominated by the 10 ms D3hot->D0 transition time of each parent port. The worst case is a Thunderbolt downstream port at the end of a daisy chain: There may be up to six Thunderbolt controllers in-between it and the root port, each comprising an upstream and downstream port, plus its own upstream port. That's 13 x 10 = 130 ms. Possible mitigations are polling the interrupt while it's disabled or reducing the d3_delay of Thunderbolt ports if possible. Open code masking of the interrupt instead of requesting it with the IRQF_ONESHOT flag to minimize the period during which it is masked. (IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.) PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the associated form factor specification, a hotplug capable Downstream Port must support generation of a wakeup event (using the PME mechanism) on hotplug events that occur when the system is in a sleep state or the Port is in device state D1, D2, or D3Hot." This would seem to imply that PME needs to be enabled on the hotplug port when it is runtime suspended. pci_enable_wake() currently doesn't enable PME on bridges, it may be necessary to add an exemption for hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the PME_Status bit is not set when an interrupt occurs while the hotplug port is in D3hot, even if PME is enabled. (I've tested this on a Mac and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in negotiate_os_control(), modifying it to 1 didn't change the behavior.) (Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event interrupts (when both are implemented) always share the same MSI or MSI-X vector". That would only seem to apply to Root Ports, however the section never mentions Root Ports, only Downstream Ports. This is explained in the definition of "Downstream Port" in the "Terms and Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that are not the Upstream Port are Downstream Ports. All Ports on a Root Complex are Downstream Ports.") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-31	PCI: pciehp: Obey compulsory command delay after resume	Lukas Wunner	1	-0/+4
	Upon resume from system sleep, the Slot Control register is written via: pci_pm_resume_noirq() pci_pm_default_resume_early() pci_restore_state() pci_restore_pcie_state() PCIe r4.0, sec 6.7.3.2 says that after "issuing a write transaction that targets any portion of the Port's Slot Control register, [...] software must wait for [the] command to complete before issuing the next command". pciehp currently fails to enforce that rule after the above-mentioned write. Fix it. (Moving restoration of the Slot Control register to pciehp doesn't seem to make sense because the other PCIe hotplug drivers may need it as well.) Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-31	PCI: pciehp: Clear spurious events earlier on resume	Lukas Wunner	6	-16/+30
	Thunderbolt hotplug ports that were occupied before system sleep resume with their downstream link in "off" state. Only after the Thunderbolt controller has reestablished the PCIe tunnels does the link go up. As a result, a spurious Presence Detect Changed and/or Data Link Layer State Changed event occurs. The events are not immediately acted upon because tunnel reestablishment happens in the ->resume_noirq phase, when interrupts are still disabled. Also, notification of events may initially be disabled in the Slot Control register when coming out of system sleep and is reenabled in the ->resume_noirq phase through: pci_pm_resume_noirq() pci_pm_default_resume_early() pci_restore_state() pci_restore_pcie_state() It is not guaranteed that the events are acted upon at all: PCIe r4.0, sec 6.7.3.4 says that "a port may optionally send an MSI when there are hot-plug events that occur while interrupt generation is disabled, and interrupt generation is subsequently enabled." Note the "optionally". If an MSI is sent, pciehp will gratuitously turn the slot off and back on once the ->resume_early phase has commenced. If an MSI is not sent, the extant, unacknowledged events in the Slot Status register will prevent future notification of presence or link changes. Commit 13c65840feab ("PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume") fixed the latter by clearing the events in the ->resume phase. Move this to the ->resume_noirq phase to also fix the gratuitous disable/enablement of the slot. The commit further restored the Slot Control register in the ->resume phase, but that's dispensable because as shown above it's already been done in the ->resume_noirq phase. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-31	PCI: portdrv: Deduplicate PM callback iterator	Lukas Wunner	1	-18/+12
	Replace suspend_iter() and resume_iter() with a single function pm_iter() to allow addition of port service callbacks for further power management phases without having to add another iterator each time. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-31	PCI: pciehp: Avoid slot access during reset	Lukas Wunner	3	-7/+14
	The ->reset_slot callback introduced by commits: 2e35afaefe64 ("PCI: pciehp: Add reset_slot() method") and 06a8d89af551 ("PCI: pciehp: Disable link notification across slot reset") disables notification of Presence Detect Changed and Data Link Layer State Changed events for the duration of a secondary bus reset. However a bus reset not only triggers these events, but may also clear the Presence Detect State bit in the Slot Status register and the Data Link Layer Link Active bit in the Link Status register momentarily. According to Sinan Kaya: "I know for a fact that bus reset clears the Data Link Layer Active bit as soon as link goes down. It gets set again following link up. Presence detect depends on the HW implementation. QDT root ports don't change presence detect for instance since nobody actually removed the card. If an implementation supports in-band presence detect, the answer is yes. As soon as the link goes down, presence detect bit will get cleared until recovery." https://lkml.kernel.org/r/42e72f83-3b24-f7ef-e5bc-290fae99259a@codeaurora.org In-band presence detect is also covered in Table 4-15 in PCIe r4.0, sec 4.2.6. pciehp should therefore ensure that any parts of the driver that access those bits do not run concurrently to a bus reset. The only precaution the commits took to that effect was to halt interrupt polling. They made no effort to drain the slot workqueue, cancel an outstanding Attention Button work, or block slot enable/disable requests via sysfs and in the ->probe hook. Now that pciehp is converted to enable/disable the slot exclusively from the IRQ thread, the only places accessing the two above-mentioned bits are the IRQ thread and the ->probe hook. Add locking to serialize them with a bus reset. This obviates the need to halt interrupt polling. Do not add locking to the ->get_adapter_status sysfs callback to afford users unfettered access to that bit. Use an rw_semaphore in lieu of a regular mutex to allow parallel execution of the non-reset code paths accessing the critical bits, i.e. the IRQ thread and the ->probe hook. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rajat Jain <rajatja@google.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Sinan Kaya <okaya@kernel.org>
2018-07-31	PCI: Use IRQF_ONESHOT if pci_request_irq() called with no handler	Heiner Kallweit	1	-1/+5
	If we have a threaded interrupt with the handler being NULL, then request_threaded_irq() -> __setup_irq() will complain and bail out if the IRQF_ONESHOT flag isn't set. Therefore check for the handler being NULL and set IRQF_ONESHOT in this case. This change is needed to migrate the mei_me driver to pci_alloc_irq_vectors() and pci_request_irq(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2018-07-30	PCI: Call dma_debug_add_bus() for pci_bus_type from PCI core	Christoph Hellwig	1	-1/+1
	There is nothing arch-specific about PCI or dma-debug, so call dma_debug_add_bus() from the PCI core just after registering the bus type. Most of dma-debug is already generic; this just adds reporting of pending dma-allocations on driver unload for arches other than powerpc, sh, and x86. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
2018-07-30	PCI: mobiveil: Add Kconfig/Makefile entries	Lorenzo Pieralisi	2	-0/+11
	commit 9af6bcb11e12 ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver") did not add the configuration and build infrastructure to configure and build the mobiveil controller driver, so at present the driver code is in the kernel but cannot be compiled. Add the mobiveil controller driver Kconfig/Makefile infrastructure. Fixes: 9af6bcb11e12 ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Subrahmanya Lingappa <l.subrahmanya@mobiveil.co.in>
2018-07-30	PCI: mobiveil: Add missing ../pci.h include	Lorenzo Pieralisi	1	-0/+2
	PCI mobiveil host controller driver currently fails to compile with the following error: drivers/pci/controller/pcie-mobiveil.c: In function 'mobiveil_pcie_probe': drivers/pci/controller/pcie-mobiveil.c:788:8: error: implicit declaration of function 'devm_of_pci_get_host_bridge_resources'; did you mean 'pci_get_host_bridge_device'? [-Werror=implicit-function-declaration] ret = devm_of_pci_get_host_bridge_resources(dev, 0, 0xff, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ pci_get_host_bridge_device Add the missing include file to pull in the required function declaration. Fixes: 9af6bcb11e12 ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Subrahmanya Lingappa <l.subrahmanya@mobiveil.co.in>
2018-07-30	PCI: mobiveil: Fix struct mobiveil_pcie.pcie_reg_base address type	Lorenzo Pieralisi	1	-1/+1
	The field pcie_reg_base in struct mobiveil_pcie represents a physical address so it should be of phys_addr_t type rather than void __iomem; this results in the following compilation warnings: drivers/pci/controller/pcie-mobiveil.c: In function 'mobiveil_pcie_parse_dt': drivers/pci/controller/pcie-mobiveil.c:326:22: warning: assignment makes pointer from integer without a cast [-Wint-conversion] pcie->pcie_reg_base = res->start; ^ drivers/pci/controller/pcie-mobiveil.c: In function 'mobiveil_pcie_enable_msi': drivers/pci/controller/pcie-mobiveil.c:485:25: warning: initialization makes integer from pointer without a cast [-Wint-conversion] phys_addr_t msg_addr = pcie->pcie_reg_base; ^~~~ drivers/pci/controller/pcie-mobiveil.c: In function 'mobiveil_compose_msi_msg': drivers/pci/controller/pcie-mobiveil.c:640:21: warning: initialization makes integer from pointer without a cast [-Wint-conversion] phys_addr_t addr = pcie->pcie_reg_base + (data->hwirq sizeof(int)); Fix the type and with it the compilation warnings. Fixes: 9af6bcb11e12 ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Subrahmanya Lingappa <l.subrahmanya@mobiveil.co.in>
2018-07-30	BackMerge v4.18-rc7 into drm-next	Dave Airlie	20	-35/+147
	rmk requested this for armada and I think we've had a few conflicts build up. Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-07-27	PCI: mobiveil: Avoid integer overflow in IB_WIN_SIZE	Dan Carpenter	1	-1/+1
	IB_WIN_SIZE is larger than INT_MAX so we need to cast it to u64. Fixes: 9af6bcb11e12 ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-27	Merge tag 'pci-v4.18-fixes-4' of ↵	Linus Torvalds	1	-0/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Fix a use-after-free error in fatal error recovery (Thomas Tai)" * tag 'pci-v4.18-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI/AER: Work around use-after-free in pcie_do_fatal_recovery()
2018-07-26	PCI/AER: Work around use-after-free in pcie_do_fatal_recovery()	Thomas Tai	1	-0/+2
	When an fatal error is received by a non-bridge device, the device is removed, and pci_stop_and_remove_bus_device() deallocates the device structure. The freed device structure is used by subsequent code to send uevents and print messages. Hold a reference on the device until we're finished using it. This is not an ideal fix because pcie_do_fatal_recovery() should not use the device at all after removing it, but that's too big a project for right now. Fixes: 7e9084b36740 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices") Signed-off-by: Thomas Tai <thomas.tai@oracle.com> [bhelgaas: changelog, reduce get/put coverage] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-26	PCI: mobiveil: Integer overflow in IB_WIN_SIZE	Dan Carpenter	1	-1/+1
	IB_WIN_SIZE is larger than INT_MAX so we need to cast it to u64. Fixes: 9af6bcb11e12 ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2018-07-23	PCI: pciehp: Always enable occupied slot on probe	Lukas Wunner	2	-15/+6
	Per PCIe r4.0, sec 6.7.3.4, a "port may optionally send an MSI when there are hot-plug events that occur while interrupt generation is disabled, and interrupt generation is subsequently enabled." On probe, we currently clear all event bits in the Slot Status register with the notable exception of the Presence Detect Changed bit. Thereby we seek to receive an interrupt for an already occupied slot once event notification is enabled. But because the interrupt is optional, users may have to specify the pciehp_force parameter on the command line, which is inconvenient. Moreover, now that pciehp's event handling has become resilient to missed events, a Presence Detect Changed interrupt for a slot which is powered on is interpreted as removal of the card. If the slot has already been brought up by the BIOS, receiving such an interrupt on probe causes the slot to be powered off and immediately back on, which is likewise undesirable. Avoid both issues by making the behavior of pciehp_force the default and clearing the Presence Detect Changed bit on probe. Note that the stated purpose of pciehp_force per the MODULE_PARM_DESC ("Force pciehp, even if OSHP is missing") seems nonsensical because the OSHP control method is only relevant for SHCP slots according to the PCI Firmware specification r3.0, sec 4.8. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-23	PCI: pciehp: Become resilient to missed events	Lukas Wunner	3	-53/+40
	A hotplug port's Slot Status register does not count how often each type of event occurred, it only records the fact that an event has occurred. Previously pciehp queued a work item for each event. But if it missed an event, e.g. removal of a card in-between two back-to-back insertions, it queued up the wrong work item or no work item at all. Commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before looking for new ones") sought to improve the situation by shrinking the window during which events may be missed. But Stefan Roese reports unbalanced Card present and Link Up events, suggesting that we're still missing events if they occur very rapidly. Bjorn Helgaas responds that he considers pciehp's event handling "baroque" and calls for its simplification and rationalization: https://lkml.kernel.org/r/20180202192045.GA53759@bhelgaas-glaptop.roam.corp.google.com It gets worse once a hotplug port is runtime suspended: The port can signal an interrupt while it and its parents are in D3hot, i.e. while it is inaccessible. By the time we've runtime resumed all parents to D0 and read the port's Slot Status register, we may have missed an arbitrary number of events. Event handling therefore needs to be reworked to become resilient to missed events. Assume that a Presence Detect Changed event has occurred. Consider the following truth table: - Slot is in OFF_STATE and is currently empty. => Do nothing. (The event is trailing a Link Down or we've missed an insertion and subsequent removal.) - Slot is in OFF_STATE and is currently occupied. => Turn the slot on. - Slot is in ON_STATE and is currently empty. => Turn the slot off. - Slot is in ON_STATE and is currently occupied. => Turn the slot off, (Be cautious and assume the card in then back on. the slot isn't the same as before.) This leads to the following simple algorithm: 1 If the slot is in ON_STATE, turn it off unconditionally. 2 If the slot is currently occupied, turn it on. Because those actions are now carried out synchronously, rather than by scheduled work items, pciehp reacts to the current situation and missed events no longer matter. Data Link Layer State Changed events can be handled identically to Presence Detect Changed events. Note that in the above truth table, a Link Up trailing a Card present event didn't have to be accounted for: It is filtered out by pciehp_check_link_status(). As for Attention Button Pressed events, PCIe r4.0, sec 6.7.1.5 says: "Once the Power Indicator begins blinking, a 5-second abort interval exists during which a second depression of the Attention Button cancels the operation." In other words, the user can only expect the system to react to a button press after it starts blinking. Missed button presses that occur in-between are irrelevant. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Stefan Roese <sr@denx.de> Cc: Mayurkumar Patel <mayurkumar.patel@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-23	PCI: pciehp: Tolerate initially unstable link	Lukas Wunner	1	-0/+5
	When a device is hotplugged, Presence Detect and Link Up events often do not occur simultaneously, but with a lag of a few milliseconds. Only the first event received is relevant, the other one can be disregarded. Moreover, Stefan Roese reports that on certain platforms, Link State and Presence Detect may flap for up to 100 ms before stabilizing, suggesting that such events should be disregarded for at least this long: https://lkml.kernel.org/r/20180130084121.18653-1-sr@denx.de On slot enablement, pciehp_check_link_status() waits for 100 ms per PCIe r4.0, sec 6.7.3.3, then probes the hotplugged device's vendor register for up to 1 second. If this succeeds, the link is definitely up, so ignore any Presence Detect or Link State events that occurred up to this point. pciehp_check_link_status() then checks the Link Training bit in the Link Status register. This is the final opportunity to detect inaccessibility of the device and abort slot enablement. Any link or presence change that occurs afterwards will cause the slot to be disabled again immediately after attempting to enable it. The astute reviewer may appreciate that achieving this behavior would be more complicated had pciehp not just been converted to enable/disable the slot exclusively from the IRQ thread: When the slot is enabled via sysfs, each link or presence flap would otherwise cause the IRQ thread to run and it would have to sense that those events are belonging to a concurrent slot enablement operation and disregard them. It would be much more difficult than this mere 3 line change. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Stefan Roese <sr@denx.de>
2018-07-23	PCI: pciehp: Declare pciehp_enable/disable_slot() static	Lukas Wunner	2	-4/+5
	No callers of pciehp_enable/disable_slot() outside of pciehp_ctrl.c remain, so declare the functions static. For now this requires forward declarations. Those can be eliminated by reshuffling functions once the ongoing effort to refactor the driver has settled. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>