summaryrefslogtreecommitdiffstats
path: root/drivers/nvme/host
AgeCommit message (Collapse)AuthorFilesLines
2019-12-03Merge tag 'pci-v5.5-changes' of ↵Linus Torvalds1-10/+0
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Warn if a host bridge has no NUMA info (Yunsheng Lin) - Add PCI_STD_NUM_BARS for the number of standard BARs (Denis Efremov) Resource management: - Fix boot-time Embedded Controller GPE storm caused by incorrect resource assignment after ACPI Bus Check Notification (Mika Westerberg) - Protect pci_reassign_bridge_resources() against concurrent addition/removal (Benjamin Herrenschmidt) - Fix bridge dma_ranges resource list cleanup (Rob Herring) - Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters to control the MMIO and prefetchable MMIO window sizes of hotplug bridges independently (Nicholas Johnson) - Fix MMIO/MMIO_PREF window assignment that assigned more space than desired (Nicholas Johnson) - Only enforce bus numbers from bridge EA if the bridge has EA devices downstream (Subbaraya Sundeep) - Consolidate DT "dma-ranges" parsing and convert all host drivers to use shared parsing (Rob Herring) Error reporting: - Restore AER capability after resume (Mayurkumar Patel) - Add PoisonTLPBlocked AER counter (Rajat Jain) - Use for_each_set_bit() to simplify AER code (Andy Shevchenko) - Fix AER kernel-doc (Andy Shevchenko) - Add "pcie_ports=dpc-native" parameter to allow native use of DPC even if platform didn't grant control over AER (Olof Johansson) Hotplug: - Avoid returning prematurely from sysfs requests to enable or disable a PCIe hotplug slot (Lukas Wunner) - Don't disable interrupts twice when suspending hotplug ports (Mika Westerberg) - Fix deadlocks when PCIe ports are hot-removed while suspended (Mika Westerberg) Power management: - Remove unnecessary ASPM locking (Bjorn Helgaas) - Add support for disabling L1 PM Substates (Heiner Kallweit) - Allow re-enabling Clock PM after it has been disabled (Heiner Kallweit) - Add sysfs attributes for controlling ASPM link states (Heiner Kallweit) - Remove CONFIG_PCIEASPM_DEBUG, including "link_state" and "clk_ctl" sysfs files (Heiner Kallweit) - Avoid AMD FCH XHCI USB PME# from D0 defect that prevents wakeup on USB 2.0 or 1.1 connect events (Kai-Heng Feng) - Move power state check out of pci_msi_supported() (Bjorn Helgaas) - Fix incorrect MSI-X masking on resume and revert related nvme quirk for Kingston NVME SSD running FW E8FK11.T (Jian-Hong Pan) - Always return devices to D0 when thawing to fix hibernation with drivers like mlx4 that used legacy power management (previously we only did it for drivers with new power management ops) (Dexuan Cui) - Clear PCIe PME Status even for legacy power management (Bjorn Helgaas) - Fix PCI PM documentation errors (Bjorn Helgaas) - Use dev_printk() for more power management messages (Bjorn Helgaas) - Apply D2 delay as milliseconds, not microseconds (Bjorn Helgaas) - Convert xen-platform from legacy to generic power management (Bjorn Helgaas) - Removed unused .resume_early() and .suspend_late() legacy power management hooks (Bjorn Helgaas) - Rearrange power management code for clarity (Rafael J. Wysocki) - Decode power states more clearly ("4" or "D4" really refers to "D3cold") (Bjorn Helgaas) - Notice when reading PM Control register returns an error (~0) instead of interpreting it as being in D3hot (Bjorn Helgaas) - Add missing link delays required by the PCIe spec (Mika Westerberg) Virtualization: - Move pci_prg_resp_pasid_required() to CONFIG_PCI_PRI (Bjorn Helgaas) - Allow VFs to use PRI (the PF PRI is shared by the VFs, but the code previously didn't recognize that) (Kuppuswamy Sathyanarayanan) - Allow VFs to use PASID (the PF PASID capability is shared by the VFs, but the code previously didn't recognize that) (Kuppuswamy Sathyanarayanan) - Disconnect PF and VF ATS enablement, since ATS in PFs and associated VFs can be enabled independently (Kuppuswamy Sathyanarayanan) - Cache PRI and PASID capability offsets (Kuppuswamy Sathyanarayanan) - Cache the PRI PRG Response PASID Required bit (Bjorn Helgaas) - Consolidate ATS declarations in linux/pci-ats.h (Krzysztof Wilczynski) - Remove unused PRI and PASID stubs (Bjorn Helgaas) - Removed unnecessary EXPORT_SYMBOL_GPL() from ATS, PRI, and PASID interfaces that are only used by built-in IOMMU drivers (Bjorn Helgaas) - Hide PRI and PASID state restoration functions used only inside the PCI core (Bjorn Helgaas) - Add a DMA alias quirk for the Intel VCA NTB (Slawomir Pawlowski) - Serialize sysfs sriov_numvfs reads vs writes (Pierre Crégut) - Update Cavium ACS quirk for ThunderX2 and ThunderX3 (George Cherian) - Fix the UPDCR register address in the Intel ACS quirk (Steffen Liebergeld) - Unify ACS quirk implementations (Bjorn Helgaas) Amlogic Meson host bridge driver: - Fix meson PERST# GPIO polarity problem (Remi Pommarel) - Add DT bindings for Amlogic Meson G12A (Neil Armstrong) - Fix meson clock names to match DT bindings (Neil Armstrong) - Add meson support for Amlogic G12A SoC with separate shared PHY (Neil Armstrong) - Add meson extended PCIe PHY functions for Amlogic G12A USB3+PCIe combo PHY (Neil Armstrong) - Add arm64 DT for Amlogic G12A PCIe controller node (Neil Armstrong) - Add commented-out description of VIM3 USB3/PCIe mux in arm64 DT (Neil Armstrong) Broadcom iProc host bridge driver: - Invalidate iProc PAXB address mapping before programming it (Abhishek Shah) - Fix iproc-msi and mvebu __iomem annotations (Ben Dooks) Cadence host bridge driver: - Refactor Cadence PCIe host controller to use as a library for both host and endpoint (Tom Joseph) Freescale Layerscape host bridge driver: - Add layerscape LS1028a support (Xiaowei Bao) Intel VMD host bridge driver: - Add VMD bus 224-255 restriction decode (Jon Derrick) - Add VMD 8086:9A0B device ID (Jon Derrick) - Remove Keith from VMD maintainer list (Keith Busch) Marvell ARMADA 3700 / Aardvark host bridge driver: - Use LTSSM state to build link training flag since Aardvark doesn't implement the Link Training bit (Remi Pommarel) - Delay before training Aardvark link in case PERST# was asserted before the driver probe (Remi Pommarel) - Fix Aardvark issues with Root Control reads and writes (Remi Pommarel) - Don't rely on jiffies in Aardvark config access path since interrupts may be disabled (Remi Pommarel) - Fix Aardvark big-endian support (Grzegorz Jaszczyk) Marvell ARMADA 370 / XP host bridge driver: - Make mvebu_pci_bridge_emul_ops static (Ben Dooks) Microsoft Hyper-V host bridge driver: - Add hibernation support for Hyper-V virtual PCI devices (Dexuan Cui) - Track Hyper-V pci_protocol_version per-hbus, not globally (Dexuan Cui) - Avoid kmemleak false positive on hv hbus buffer (Dexuan Cui) Mobiveil host bridge driver: - Change mobiveil csr_read()/write() function names that conflict with riscv arch functions (Kefeng Wang) NVIDIA Tegra host bridge driver: - Fix Tegra CLKREQ dependency programming (Vidya Sagar) Renesas R-Car host bridge driver: - Remove unnecessary header include from rcar (Andrew Murray) - Tighten register index checking for rcar inbound range programming (Marek Vasut) - Fix rcar inbound range alignment calculation to improve packing of multiple entries (Marek Vasut) - Update rcar MACCTLR setting to match documentation (Yoshihiro Shimoda) - Clear bit 0 of MACCTLR before PCIETCTLR.CFINIT per manual (Yoshihiro Shimoda) - Add Marek Vasut and Yoshihiro Shimoda as R-Car maintainers (Simon Horman) Rockchip host bridge driver: - Make rockchip 0V9 and 1V8 power regulators non-optional (Robin Murphy) Socionext UniPhier host bridge driver: - Set uniphier to host (RC) mode always (Kunihiko Hayashi) Endpoint drivers: - Fix endpoint driver sign extension problem when shifting page number to phys_addr_t (Alan Mikhak) Misc: - Add NumaChip SPDX header (Krzysztof Wilczynski) - Replace EXTRA_CFLAGS with ccflags-y (Krzysztof Wilczynski) - Remove unused includes (Krzysztof Wilczynski) - Removed unused sysfs attribute groups (Ben Dooks) - Remove PTM and ASPM dependencies on PCIEPORTBUS (Bjorn Helgaas) - Add PCIe Link Control 2 register field definitions to replace magic numbers in AMDGPU and Radeon CIK/SI (Bjorn Helgaas) - Fix incorrect Link Control 2 Transmit Margin usage in AMDGPU and Radeon CIK/SI PCIe Gen3 link training (Bjorn Helgaas) - Use pcie_capability_read_word() instead of pci_read_config_word() in AMDGPU and Radeon CIK/SI (Frederick Lawler) - Remove unused pci_irq_get_node() Greg Kroah-Hartman) - Make asm/msi.h mandatory and simplify PCI_MSI_IRQ_DOMAIN Kconfig (Palmer Dabbelt, Michal Simek) - Read all 64 bits of Switchtec part_event_bitmap (Logan Gunthorpe) - Fix erroneous intel-iommu dependency on CONFIG_AMD_IOMMU (Bjorn Helgaas) - Fix bridge emulation big-endian support (Grzegorz Jaszczyk) - Fix dwc find_next_bit() usage (Niklas Cassel) - Fix pcitest.c fd leak (Hewenliang) - Fix typos and comments (Bjorn Helgaas) - Fix Kconfig whitespace errors (Krzysztof Kozlowski)" * tag 'pci-v5.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (160 commits) PCI: Remove PCI_MSI_IRQ_DOMAIN architecture whitelist asm-generic: Make msi.h a mandatory include/asm header Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T" PCI/MSI: Fix incorrect MSI-X masking on resume PCI/MSI: Move power state check out of pci_msi_supported() PCI/MSI: Remove unused pci_irq_get_node() PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer PCI: hv: Change pci_protocol_version to per-hbus PCI: hv: Add hibernation support PCI: hv: Reorganize the code in preparation of hibernation MAINTAINERS: Remove Keith from VMD maintainer PCI/ASPM: Remove PCIEASPM_DEBUG Kconfig option and related code PCI/ASPM: Add sysfs attributes for controlling ASPM link states PCI: Fix indentation drm/radeon: Prefer pcie_capability_read_word() drm/radeon: Replace numbers with PCI_EXP_LNKCTL2 definitions drm/radeon: Correct Transmit Margin masks drm/amdgpu: Prefer pcie_capability_read_word() PCI: uniphier: Set mode register to host mode drm/amdgpu: Replace numbers with PCI_EXP_LNKCTL2 definitions ...
2019-12-01Merge tag 'compat-ioctl-5.5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
2019-11-26Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"Jian-Hong Pan1-10/+0
Since e045fa29e893 ("PCI/MSI: Fix incorrect MSI-X masking on resume") is merged, we can revert the previous quirk now. This reverts commit 19ea025e1d28c629b369c3532a85b3df478cc5c6. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887 Fixes: 19ea025e1d28 ("nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T") Link: https://lore.kernel.org/r/20191031093408.9322-1-jian-hong@endlessm.com Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org
2019-11-25Merge tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-blockLinus Torvalds6-4/+300
Pull additional block driver updates from Jens Axboe: "Here's another block driver update, done to avoid conflicts with the zoned changes coming next. This contains: - Prepare SCSI sd for zone open/close/finish support - Small NVMe pull request - hwmon support (Akinobu) - add new co-maintainer (Christoph) - work-around for a discard issue on non-conformant drives (Eduard) - Small nbd leak fix" * tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-block: nbd: prevent memory leak nvme: hwmon: add quirk to avoid changing temperature threshold nvme: hwmon: provide temperature min and max values for each sensor nvmet: add another maintainer nvme: Discard workaround for non-conformant devices nvme: Add hardware monitoring support scsi: sd_zbc: add zone open, close, and finish support
2019-11-25Merge tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-blockLinus Torvalds7-55/+77
Pull block driver updates from Jens Axboe: "Here are the main block driver updates for 5.5. Nothing major in here, mostly just fixes. This contains: - a set of bcache changes via Coly - MD changes from Song - loop unmap write-zeroes fix (Darrick) - spelling fixes (Geert) - zoned additions cleanups to null_blk/dm (Ajay) - allow null_blk online submit queue changes (Bart) - NVMe changes via Keith, nothing major here either" * tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-block: (56 commits) Revert "bcache: fix fifo index swapping condition in journal_pin_cmp()" drivers/md/raid5-ppl.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET drivers/md/raid5.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET bcache: don't export symbols bcache: remove the extra cflags for request.o bcache: at least try to shrink 1 node in bch_mca_scan() bcache: add idle_max_writeback_rate sysfs interface bcache: add code comments in bch_btree_leaf_dirty() bcache: fix deadlock in bcache_allocator bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front() bcache: deleted code comments for dead code in bch_data_insert_keys() bcache: add more accurate error messages in read_super() bcache: fix static checker warning in bcache_device_free() bcache: fix a lost wake-up problem caused by mca_cannibalize_lock bcache: fix fifo index swapping condition in journal_pin_cmp() md/raid10: prevent access of uninitialized resync_pages offset md: avoid invalid memory access for array sb->dev_roles md/raid1: avoid soft lockup under high load null_blk: add zone open, close, and finish support dm: add zone open, close and finish support ...
2019-11-22nvme: hwmon: add quirk to avoid changing temperature thresholdAkinobu Mita3-2/+12
This adds a new quirk NVME_QUIRK_NO_TEMP_THRESH_CHANGE to avoid changing the value of the temperature threshold feature for specific devices that show undesirable behavior. Guenter reported: "On my Intel NVME drive (SSDPEKKW512G7), writing any minimum limit on the Composite temperature sensor results in a temperature warning, and that warning is sticky until I reset the controller. It doesn't seem to matter which temperature I write; writing -273000 has the same result." The Intel NVMe has the latest firmware version installed, so this isn't a problem that was ever fixed. Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Jean Delvare <jdelvare@suse.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-22nvme: hwmon: provide temperature min and max values for each sensorAkinobu Mita1-16/+90
According to the NVMe specification, the over temperature threshold and under temperature threshold features shall be implemented for Composite Temperature if a non-zero WCTEMP field value is reported in the Identify Controller data structure. The features are also implemented for all implemented temperature sensors (i.e., all Temperature Sensor fields that report a non-zero value). This provides the over temperature threshold and under temperature threshold for each sensor as temperature min and max values of hwmon sysfs attributes. The WCTEMP is already provided as a temperature max value for Composite Temperature, but this change isn't incompatible. Because the default value of the over temperature threshold for Composite Temperature is the WCTEMP. Now the alarm attribute for Composite Temperature indicates one of the temperature is outside of a temperature threshold. Because there is only a single bit in Critical Warning field that indicates a temperature is outside of a threshold. Example output from the "sensors" command: nvme-pci-0100 Adapter: PCI adapter Composite: +33.9°C (low = -273.1°C, high = +69.8°C) (crit = +79.8°C) Sensor 1: +34.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +31.9°C (low = -273.1°C, high = +65261.8°C) Sensor 5: +47.9°C (low = -273.1°C, high = +65261.8°C) This also adds helper macros for kelvin from/to milli Celsius conversion, and replaces the repeated code in hwmon.c. Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Jean Delvare <jdelvare@suse.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-13nvme: Discard workaround for non-conformant devicesEduard Hasenleithner1-3/+9
Users observe IOMMU related errors when performing discard on nvme from non-compliant nvme devices reading beyond the end of the DMA mapped ranges to discard. Two different variants of this behavior have been observed: SM22XX controllers round up the read size to a multiple of 512 bytes, and Phison E12 unconditionally reads the maximum discard size allowed by the spec (256 segments or 4kB). Make nvme_setup_discard unconditionally allocate the maximum DSM buffer so the driver DMA maps a memory range that will always succeed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=202665 many Signed-off-by: Eduard Hasenleithner <eduard@hasenleithner.at> [changelog, use existing define, kernel coding style] Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-12nvme: Add hardware monitoring supportGuenter Roeck5-0/+206
nvme devices report temperature information in the controller information (for limits) and in the smart log. Currently, the only means to retrieve this information is the nvme command line interface, which requires super-user privileges. At the same time, it would be desirable to be able to use NVMe temperature information for thermal control. This patch adds support to read NVMe temperatures from the kernel using the hwmon API and adds temperature zones for NVMe drives. The thermal subsystem can use this information to set thermal policies, and userspace can access it using libsensors and/or the "sensors" command. Example output from the "sensors" command: nvme0-pci-0100 Adapter: PCI adapter Composite: +39.0°C (high = +85.0°C, crit = +85.0°C) Sensor 1: +39.0°C Sensor 2: +41.0°C Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-08Merge tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-blockLinus Torvalds2-0/+10
Pull block fixes from Jens Axboe: - Two NVMe device removal crash fixes, and a compat fixup for for an ioctl that was introduced in this release (Anton, Charles, Max - via Keith) - Missing error path mutex unlock for drbd (Dan) - cgroup writeback fixup on dead memcg (Tejun) - blkcg online stats print fix (Tejun) * tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block: cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead block: drbd: remove a stray unlock in __drbd_send_protocol() blkcg: make blkcg_print_stat() print stats only for online blkgs nvme: change nvme_passthru_cmd64 to explicitly mark rsvd nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths nvme-rdma: fix a segmentation fault during module unload
2019-11-06nvme-multipath: fix crash in nvme_mpath_clear_ctrl_pathsAnton Eidelman1-0/+2
nvme_mpath_clear_ctrl_paths() iterates through the ctrl->namespaces list while holding ctrl->scan_lock. This does not seem to be the correct way of protecting from concurrent list modification. Specifically, nvme_scan_work() sorts ctrl->namespaces AFTER unlocking scan_lock. This may result in the following (rare) crash in ctrl disconnect during scan_work: BUG: kernel NULL pointer dereference, address: 0000000000000050 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 3995 Comm: nvme 5.3.5-050305-generic RIP: 0010:nvme_mpath_clear_current_path+0xe/0x90 [nvme_core] ... Call Trace: nvme_mpath_clear_ctrl_paths+0x3c/0x70 [nvme_core] nvme_remove_namespaces+0x35/0xe0 [nvme_core] nvme_do_delete_ctrl+0x47/0x90 [nvme_core] nvme_sysfs_delete+0x49/0x60 [nvme_core] dev_attr_store+0x17/0x30 sysfs_kf_write+0x3e/0x50 kernfs_fop_write+0x11e/0x1a0 __vfs_write+0x1b/0x40 vfs_write+0xb9/0x1a0 ksys_write+0x67/0xe0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x5a/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f8d02bfb154 Fix: After taking scan_lock in nvme_mpath_clear_ctrl_paths() down_read(&ctrl->namespaces_rwsem) as well to make list traversal safe. This will not cause deadlocks because taking scan_lock never happens while holding the namespaces_rwsem. Moreover, scan work downs namespaces_rwsem in the same order. Alternative: sort ctrl->namespaces in nvme_scan_work() while still holding the scan_lock. This would leave nvme_mpath_clear_ctrl_paths() without correct protection against ctrl->namespaces modification by anyone other than scan_work. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-06nvme-rdma: fix a segmentation fault during module unloadMax Gurtovoy1-0/+8
In case there are controllers that are not associated with any RDMA device (e.g. during unsuccessful reconnection) and the user will unload the module, these controllers will not be freed and will access already freed memory. The same logic appears in other fabric drivers as well. Fixes: 87fd125344d6 ("nvme-rdma: remove redundant reference between ib_device and tagset") Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-04nvme: Fix parsing of ANA log pagePrabhath Sajeepa1-4/+8
Check validity of offset into ANA log buffer before accessing nvme_ana_group_desc. This check ensures the size of ANA log buffer >= offset + sizeof(nvme_ana_group_desc) Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Prabhath Sajeepa <psajeepa@purestorage.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvme-pci: Spelling s/resdicovered/rediscovered/Geert Uytterhoeven1-1/+1
Fix misspelling of "rediscovered". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvme: Introduce nvme_lba_to_sect()Damien Le Moal2-7/+15
Introduce the new helper function nvme_lba_to_sect() to convert a device logical block number to a 512B sector number. Use this new helper in obvious places, cleaning up the code. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvme: Cleanup and rename nvme_block_nr()Damien Le Moal2-5/+8
Rename nvme_block_nr() to nvme_sect_to_lba() and use SECTOR_SHIFT instead of its hard coded value 9. Also add a comment to decribe this helper. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvme: move common call to nvme_cleanup_cmd to core layerMax Gurtovoy4-10/+8
nvme_cleanup_cmd should be called for each call to nvme_setup_cmd (symmetrical functions). Move the call for nvme_cleanup_cmd to the common core layer and call it during nvme_complete_rq for the good flow. For error flow, each transport will call nvme_cleanup_cmd independently. Also take care of a special case of path failure, where we call nvme_complete_rq without doing nvme_setup_cmd. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvme: introduce "Command Aborted By host" status codeMax Gurtovoy2-1/+2
Fix the status code of canceled requests initiated by the host according to TP4028 (Status Code 0x371): "Command Aborted By host: The command was aborted as a result of host action (e.g., the host disconnected the Fabric connection)." Also in a multipath environment, unless otherwise specified, errors of this type (path related) should be retried using a different path, if one is available. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvme: introduce nvme_is_aen_req functionIsrael Rukshin4-6/+10
This function improves code readability and reduces code duplication. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvme-fc: ensure association_id is cleared regardless of a Disconnect LSJames Smart1-3/+3
Code today only clears the association_id if a Disconnect LS is transmit. Remove ambiguity and unconditionally clear the association_id if the association has been terminated. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvme-fc: clarify error messagesJames Smart1-5/+6
Change wording on a couple of messages to clarify what happened. Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvme-fc: Set new cmd set indicator in nvme-fc cmnd iuJames Smart1-0/+5
Set the new category field in the FC-NVME CMND_IU based on queue number. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvme-fc and nvmet-fc: sync with FC-NVME-2 header changesJames Smart1-13/+11
Sync sources with revised structure and field names to correspond with FC-NVME-2 header sync-up. Tested interoperability with success: - prior initiator with new target - prior target with new initiator - new on new Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds1-1/+1
Pull networking fixes from David Miller: 1) Fix free/alloc races in batmanadv, from Sven Eckelmann. 2) Several leaks and other fixes in kTLS support of mlx5 driver, from Tariq Toukan. 3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke Høiland-Jørgensen. 4) Add an r8152 device ID, from Kazutoshi Noguchi. 5) Missing include in ipv6's addrconf.c, from Ben Dooks. 6) Use siphash in flow dissector, from Eric Dumazet. Attackers can easily infer the 32-bit secret otherwise etc. 7) Several netdevice nesting depth fixes from Taehee Yoo. 8) Fix several KCSAN reported errors, from Eric Dumazet. For example, when doing lockless skb_queue_empty() checks, and accessing sk_napi_id/sk_incoming_cpu lockless as well. 9) Fix jumbo packet handling in RXRPC, from David Howells. 10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet. 11) Fix DMA synchronization in gve driver, from Yangchun Fu. 12) Several bpf offload fixes, from Jakub Kicinski. 13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo. 14) Fix ping latency during high traffic rates in hisilicon driver, from Jiangfent Xiao. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits) net: fix installing orphaned programs net: cls_bpf: fix NULL deref on offload filter removal selftests: bpf: Skip write only files in debugfs selftests: net: reuseport_dualstack: fix uninitalized parameter r8169: fix wrong PHY ID issue with RTL8168dp net: dsa: bcm_sf2: Fix IMP setup for port different than 8 net: phylink: Fix phylink_dbg() macro gve: Fixes DMA synchronization. inet: stop leaking jiffies on the wire ixgbe: Remove duplicate clear_bit() call Documentation: networking: device drivers: Remove stray asterisks e1000: fix memory leaks i40e: Fix receive buffer starvation for AF_XDP igb: Fix constant media auto sense switching when no cable is connected net: ethernet: arc: add the missed clk_disable_unprepare igb: Enable media autosense for the i350. igb/igc: Don't warn on fatal read failures when the device is removed tcp: increase tcp_max_syn_backlog max value net: increase SOMAXCONN to 4096 netdevsim: Fix use-after-free during device dismantle ...
2019-10-29nvme-multipath: remove unused groups_only mode in ana logAnton Eidelman1-5/+4
groups_only mode in nvme_read_ana_log() is no longer used: remove it. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29nvme-multipath: fix possible io hang after ctrl reconnectAnton Eidelman1-1/+1
The following scenario results in an IO hang: 1) ctrl completes a request with NVME_SC_ANA_TRANSITION. NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered. 2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl. This fails because ctrl disconnects. Therefore nvme_update_ns_ana_state() is not called and NVME_NS_ANA_PENDING bit in ns->flags is not cleared. 3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls nvme_read_ana_log(ctrl, groups_only=true). However, nvme_update_ana_state() does not update namespaces because nr_nsids = 0 (due to groups_only mode). 4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK. Result: The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set. Consequently ctrl will never be considered a viable path by __nvme_find_path(). IO will hang if ctrl is the only or the last path to the namespace. More generally, while ctrl is reconnecting, its ANA state may change. And because nvme_mpath_init() requests ANA log in groups_only mode, these changes are not propagated to the existing ctrl namespaces. This may result in a mal-function or an IO hang. Solution: nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false. This will not harm the new ctrl case (no namespaces present), and will make sure the ANA state of namespaces gets updated after reconnect. Note: Another option would be for nvme_mpath_init() to invoke nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing namespace. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-28net: use skb_queue_empty_lockless() in busy poll contextsEric Dumazet1-1/+1
Busy polling usually runs without locks. Let's use skb_queue_empty_lockless() instead of skb_queue_empty() Also uses READ_ONCE() in __skb_try_recv_datagram() to address a similar potential problem. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-23compat_ioctl: move more drivers to compat_ptr_ioctlArnd Bergmann1-1/+1
The .ioctl and .compat_ioctl file operations have the same prototype so they can both point to the same function, which works great almost all the time when all the commands are compatible. One exception is the s390 architecture, where a compat pointer is only 31 bit wide, and converting it into a 64-bit pointer requires calling compat_ptr(). Most drivers here will never run in s390, but since we now have a generic helper for it, it's easy enough to use it consistently. I double-checked all these drivers to ensure that all ioctl arguments are used as pointers or are ignored, but are not interpreted as integer values. Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Sterba <dsterba@suse.com> Acked-by: Darren Hart (VMware) <dvhart@infradead.org> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-18nvme-pci: Set the prp2 correctly when using more than 4k pageKevin Hao1-1/+2
In the current code, the nvme is using a fixed 4k PRP entry size, but if the kernel use a page size which is more than 4k, we should consider the situation that the bv_offset may be larger than the dev->ctrl.page_size. Otherwise we may miss setting the prp2 and then cause the command can't be executed correctly. Fixes: dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-15nvme-tcp: fix possible leakage during error flowMax Gurtovoy1-0/+1
During nvme_tcp_setup_cmd_pdu error flow, one must call nvme_cleanup_cmd since it's symmetric to nvme_setup_cmd. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLLSebastian Andrzej Siewior1-0/+2
The access to sk->sk_ll_usec should be hidden behind CONFIG_NET_RX_BUSY_POLL like the definition of sk_ll_usec. Put access to ->sk_ll_usec behind CONFIG_NET_RX_BUSY_POLL. Fixes: 1a9460cef5711 ("nvme-tcp: support simple polling") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14nvme: Wait for reset state when requiredKeith Busch3-17/+74
Prevent simultaneous controller disabling/enabling tasks from interfering with each other through a function to wait until the task successfully transitioned the controller to the RESETTING state. This ensures disabling the controller will not be interrupted by another reset path, otherwise a concurrent reset may leave the controller in the wrong state. Tested-by: Edmund Nadolski <edmund.nadolski@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14nvme: Prevent resets during paused controller stateKeith Busch1-4/+25
A paused controller is doing critical internal activation work in the background. Prevent subsequent controller resets from occurring during this period by setting the controller state to RESETTING first. A helper function, nvme_try_sched_reset_work(), is introduced for these paths so they may continue with scheduling the reset_work after they've completed their uninterruptible critical section. Tested-by: Edmund Nadolski <edmund.nadolski@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14nvme: Restart request timers in resetting stateKeith Busch2-0/+16
A controller in the resetting state has not yet completed its recovery actions. The pci and fc transports were already handling this, so update the remaining transports to not attempt additional recovery in this state. Instead, just restart the request timer. Tested-by: Edmund Nadolski <edmund.nadolski@intel.com> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14nvme: Remove ADMIN_ONLY stateKeith Busch4-37/+11
The admin only state was intended to fence off actions that don't apply to a non-IO capable controller. The only actual user of this is the scan_work, and pci was the only transport to ever set this state. The consequence of having this state is placing an additional burden on every other action that applies to both live and admin only controllers. Remove the admin only state and place the admin only burden on the only place that actually cares: scan_work. This also prepares to make it easier to temporarily pause a LIVE state so that we don't need to remember which state the controller had been in prior to the pause. Tested-by: Edmund Nadolski <edmund.nadolski@intel.com> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14nvme-pci: Free tagset if no IO queuesKeith Busch1-2/+9
If a controller becomes degraded after a reset, we will not be able to perform any IO. We currently teardown previously created request queues and namespaces, but we had kept the unusable tagset. Free it after all queues using it have been released. Tested-by: Edmund Nadolski <edmund.nadolski@intel.com> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-04nvme: retain split access workaround for capability readsArd Biesheuvel1-1/+1
Commit 7fd8930f26be4 "nvme: add a common helper to read Identify Controller data" has re-introduced an issue that we have attempted to work around in the past, in commit a310acd7a7ea ("NVMe: use split lo_hi_{read,write}q"). The problem is that some PCIe NVMe controllers do not implement 64-bit outbound accesses correctly, which is why the commit above switched to using lo_hi_[read|write]q for all 64-bit BAR accesses occuring in the code. In the mean time, the NVMe subsystem has been refactored, and now calls into the PCIe support layer for NVMe via a .reg_read64() method, which fails to use lo_hi_readq(), and thus reintroduces the problem that the workaround above aimed to address. Given that, at the moment, .reg_read64() is only used to read the capability register [which is known to tolerate split reads], let's switch .reg_read64() to lo_hi_readq() as well. This fixes a boot issue on some ARM boxes with NVMe behind a Synopsys DesignWare PCIe host controller. Fixes: 7fd8930f26be4 ("nvme: add a common helper to read Identify Controller data") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-10-04nvme: fix possible deadlock when nvme_update_formats failsSagi Grimberg1-2/+1
nvme_update_formats may fail to revalidate the namespace and attempt to remove the namespace. This may lead to a deadlock as nvme_ns_remove will attempt to acquire the subsystem lock which is already acquired by the passthru command with effects. Move the invalid namepsace removal to after the passthru command releases the subsystem lock. Reported-by: Judy Brock <judy.brock@samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-09-27Merge branch 'nvme-5.4' of git://git.infradead.org/nvme into for-linusJens Axboe5-35/+142
Pull NVMe changes from Sagi: "This set consists of various fixes and cleanups: - controller removal race fix from Balbir - quirk additions from Gabriel and Jian-Hong - nvme-pci power state save fix from Mario - Add 64bit user commands (for 64bit registers) from Marta - nvme-rdma/nvme-tcp fixes from Max, Mark and Me - Minor cleanups and nits from James, Dan and John" * 'nvme-5.4' of git://git.infradead.org/nvme: nvme-rdma: fix possible use-after-free in connect timeout nvme: Move ctrl sqsize to generic space nvme: Add ctrl attributes for queue_count and sqsize nvme: allow 64-bit results in passthru commands nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T nvmet-tcp: remove superflous check on request sgl Added QUIRKs for ADATA XPG SX8200 Pro 512GB nvme-rdma: Fix max_hw_sectors calculation nvme: fix an error code in nvme_init_subsystem() nvme-pci: Save PCI state before putting drive into deepest state nvme-tcp: fix wrong stop condition in io_work nvme-pci: Fix a race in controller removal nvmet: change ppl to lpp
2019-09-27nvme-rdma: fix possible use-after-free in connect timeoutSagi Grimberg1-1/+2
If the connect times out, we may have already destroyed the queue in the timeout handler, so test if the queue is still allocated in the connect error handler. Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-09-26nvme: Move ctrl sqsize to generic spaceKeith Busch1-1/+1
This isn't specific to fabrics. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-09-25nvme: Add ctrl attributes for queue_count and sqsizeJames Smart1-0/+4
Current controller interrogation requires a lot of guesswork on how many io queues were created and what the io sq size is. The numbers are dependent upon core/fabric defaults, connect arguments, and target responses. Add sysfs attributes for queue_count and sqsize. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-09-25nvme: allow 64-bit results in passthru commandsMarta Rybczynska1-16/+92
It is not possible to get 64-bit results from the passthru commands, what prevents from getting for the Capabilities (CAP) property value. As a result, it is not possible to implement IOL's NVMe Conformance test 4.3 Case 1 for Fabrics targets [1] (page 123). This issue has been already discussed [2], but without a solution. This patch solves the problem by adding new ioctls with a new passthru structure, including 64-bit results. The older ioctls stay unchanged. [1] https://www.iol.unh.edu/sites/default/files/testsuites/nvme/UNH-IOL_NVMe_Conformance_Test_Suite_v11.0.pdf [2] http://lists.infradead.org/pipermail/linux-nvme/2018-June/018791.html Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-09-25nvme: Add quirk for Kingston NVME SSD running FW E8FK11.TJian-Hong Pan1-0/+10
Kingston NVME SSD with firmware version E8FK11.T has no interrupt after resume with actions related to suspend to idle. This patch applied NVME_QUIRK_SIMPLE_SUSPEND quirk to fix this issue. Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend") Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887 Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-09-25Added QUIRKs for ADATA XPG SX8200 Pro 512GBGabriel Craciunescu1-0/+3
Booting with default_ps_max_latency_us >6000 makes the device fail. Also SUBNQN is NULL and gives a warning on each boot/resume. $ nvme id-ctrl /dev/nvme0 | grep ^subnqn subnqn : (null) I use this device with an Acer Nitro 5 (AN515-43-R8BF) Laptop. To be sure is not a Laptop issue only, I tested the device on my server board with the same results. ( with 2x,4x link on the board and 4x link on a PCI-E card ). Signed-off-by: Gabriel Craciunescu <nix.or.die@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-09-25nvme-rdma: Fix max_hw_sectors calculationMax Gurtovoy1-5/+11
By default, the NVMe/RDMA driver should support max io_size of 1MiB (or upto the maximum supported size by the HCA). Currently, one will see that /sys/class/block/<bdev>/queue/max_hw_sectors_kb is 1020 instead of 1024. A non power of 2 value can cause performance degradation due to unnecessary splitting of IO requests and unoptimized allocation units. The number of pages per MR has been fixed here, so there is no longer any need to reduce max_sectors by 1. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-09-25nvme: fix an error code in nvme_init_subsystem()Dan Carpenter1-2/+3
"ret" should be a negative error code here, but it's either success or possibly uninitialized. Fixes: 32fd90c40768 ("nvme: change locking for the per-subsystem controller list") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-09-25nvme-pci: Save PCI state before putting drive into deepest stateMario Limonciello1-7/+10
The action of saving the PCI state will cause numerous PCI configuration space reads which depending upon the vendor implementation may cause the drive to exit the deepest NVMe state. In these cases ASPM will typically resolve the PCIe link state and APST may resolve the NVMe power state. However it has also been observed that this register access after quiesced will cause PC10 failure on some device combinations. To resolve this, move the PCI state saving to before SetFeatures has been called. This has been proven to resolve the issue across a 5000 sample test on previously failing disk/system combinations. Signed-off-by: Mario Limonciello <mario.limonciello@dell.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-09-25nvme-tcp: fix wrong stop condition in io_workWunderlich, Mark1-2/+2
Allow the do/while statement to continue if current time is not after the proposed time 'deadline'. Intent is to allow loop to proceed for a specific time period. Currently the loop, as coded, will exit after first pass. Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-09-24Merge tag 'for-5.4/post-2019-09-24' of git://git.kernel.dk/linux-blockLinus Torvalds1-9/+0
Pull more block updates from Jens Axboe: "Some later additions that weren't quite done for the first pull request, and also a few fixes that have arrived since. This contains: - Kill silly pktcdvd warning on attempting to register a non-scsi passthrough device (me) - Use symbolic constants for the block t10 protection types, and switch to handling it in core rather than in the drivers (Max) - libahci platform missing node put fix (Nishka) - Small series of fixes for BFQ (Paolo) - Fix possible nbd crash (Xiubo)" * tag 'for-5.4/post-2019-09-24' of git://git.kernel.dk/linux-block: block: drop device references in bsg_queue_rq() block: t10-pi: fix -Wswitch warning pktcdvd: remove warning on attempting to register non-passthrough dev ata: libahci_platform: Add of_node_put() before loop exit nbd: fix possible page fault for nbd disk nbd: rename the runtime flags as NBD_RT_ prefixed block, bfq: push up injection only after setting service time block, bfq: increase update frequency of inject limit block, bfq: reduce upper bound for inject limit to max_rq_in_driver+1 block, bfq: update inject limit only after injection occurred block: centralize PI remapping logic to the block layer block: use symbolic constants for t10_pi type