linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2016-11-11	lightnvm: invalid offset calculation for lba_shift	Matias Bjørling	1	-1/+1
	The ns->lba_shift assumes its value to be the logarithmic of the LA size. A previous patch duplicated the lba_shift calculation into lightnvm. It prematurely also subtracted a 512byte shift, which commonly is applied per-command. The 512byte shift being subtracted twice led to data loss when restoring the logical to physical mapping table from device and when issuing I/O commands using rrpc. Fix offset by removing the 512byte shift subtraction when calculating lba_shift. Fixes: b0b4e09c1ae7 "lightnvm: control life of nvm_dev in driver" Reported-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-21	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	Linus Torvalds	3	-31/+64
	Pull block fixes from Jens Axboe: "A set of fixes that missed the merge window, mostly due to me being away around that time. Nothing major here, a mix of nvme cleanups and fixes, and one fix for the badblocks handling" * 'for-linus' of git://git.kernel.dk/linux-block: nvmet: use symbolic constants for CNS values nvme: use symbolic constants for CNS values nvme.h: add an enum for cns values nvme.h: don't use uuid_be nvme.h: resync with nvme-cli nvme: Add tertiary number to NVME_VS nvme : Add sysfs entry for NVMe CMBs when appropriate nvme: don't schedule multiple resets nvme: Delete created IO queues on reset nvme: Stop probing a removed device badblocks: fix overlapping check for clearing
2016-10-19	nvme: use symbolic constants for CNS values	Christoph Hellwig	1	-2/+2
	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-19	nvme: Add tertiary number to NVME_VS	Gabriel Krisman Bertazi	3	-8/+8
	NVMe 1.2.1 specification adds a tertiary element to the version number. This updates the macro and its callers to include the final number and fixup a single place in nvmet where the version was generated manually. Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-12	nvme : Add sysfs entry for NVMe CMBs when appropriate	Stephen Bates	1	-9/+35
	Add a sysfs attribute that contains salient information about the NVMe Controller Memory Buffer when one is present. For now, just display the information about the CMB available from the control registers. We attach the CMB attribute file to the existing nvme_ctrl sysfs group so it can handle the sysfs teardown. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Stephen Bates <sbates@raithlin.com> Acked-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-12	nvme: don't schedule multiple resets	Keith Busch	1	-9/+13
	The queue_work only fails if the work is pending, but not yet running. If the work is running, the work item would get requeued, triggering a double reset. If the first reset fails for any reason, the second reset triggers: WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING) Hitting that schedules controller deletion for a second time, which potentially takes a reference on the device that is being deleted. If the reset occurs at the same time as a hot removal event, this causes a double-free. This patch has the reset helper function check if the work is busy prior to queueing, and changes all places that schedule resets to use this function. Since most users don't want to sync with that work, the "flush_work" is moved to the only caller that wants to sync. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg<sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-12	nvme: Delete created IO queues on reset	Keith Busch	1	-4/+5
	The driver was decrementing the online_queues prior to attempting to delete those IO queues, so the driver ended up not requesting the controller delete any. This patch saves the online_queues prior to suspending them, and adds that parameter for deleting io queues. Fixes: c21377f8 ("nvme: Suspend all queues before deletion") Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-12	nvme: Stop probing a removed device	Keith Busch	1	-0/+2
	There is no reason the nvme controller can ever return all 1's from reading the CSTS register. This patch returns an error if we observe that status. Without this, we may incorrectly proceed with controller initialization and unnecessarilly rely on error handling to clean this. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-11	nvme: use the DMA_ATTR_NO_WARN attribute	Mauricio Faria de Oliveira	1	-1/+2
	Use the DMA_ATTR_NO_WARN attribute for the dma_map_sg() call of the nvme driver that returns BLK_MQ_RQ_QUEUE_BUSY (not for BLK_MQ_RQ_QUEUE_ERROR). Link: http://lkml.kernel.org/r/1470092390-25451-4-git-send-email-mauricfo@linux.vnet.ibm.com Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09	Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block	Linus Torvalds	4	-79/+36
	Pull blk-mq irq/cpu mapping updates from Jens Axboe: "This is the block-irq topic branch for 4.9-rc. It's mostly from Christoph, and it allows drivers to specify their own mappings, and more importantly, to share the blk-mq mappings with the IRQ affinity mappings. It's a good step towards making this work better out of the box" * 'for-4.9/block-irq' of git://git.kernel.dk/linux-block: blk_mq: linux/blk-mq.h does not include all the headers it depends on blk-mq: kill unused blk_mq_create_mq_map() blk-mq: get rid of the cpumask in struct blk_mq_tags nvme: remove the post_scan callout nvme: switch to use pci_alloc_irq_vectors blk-mq: provide a default queue mapping for PCI device blk-mq: allow the driver to pass in a queue mapping blk-mq: remove ->map_queue blk-mq: only allocate a single mq_map per tag_set blk-mq: don't redistribute hardware queues on a CPU hotplug event
2016-10-09	Merge tag 'for-linus' of ↵	Linus Torvalds	1	-20/+5
	git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull main rdma updates from Doug Ledford: "This is the main pull request for the rdma stack this release. The code has been through 0day and I had it tagged for linux-next testing for a couple days. Summary: - updates to mlx5 - updates to mlx4 (two conflicts, both minor and easily resolved) - updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper resolution is to keep the code in cxgb4_main.c as it is in Linus' tree as attach_uld was refactored and moved into cxgb4_uld.c) - improvements to uAPI (moved vendor specific API elements to uAPI area) - add hns-roce driver and hns and hns-roce ACPI reset support - conversion of all rdma code away from deprecated create_singlethread_workqueue - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in staging)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits) staging/lustre: Disable InfiniBand support iw_cxgb4: add fast-path for small REG_MR operations cxgb4: advertise support for FR_NSMR_TPTE_WR IB/core: correctly handle rdma_rw_init_mrs() failure IB/srp: Fix infinite loop when FMR sg[0].offset != 0 IB/srp: Remove an unused argument IB/core: Improve ib_map_mr_sg() documentation IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets IB/mthca: Move user vendor structures IB/nes: Move user vendor structures IB/ocrdma: Move user vendor structures IB/mlx4: Move user vendor structures IB/cxgb4: Move user vendor structures IB/cxgb3: Move user vendor structures IB/mlx5: Move and decouple user vendor structures IB/{core,hw}: Add constant for node_desc ipoib: Make ipoib_warn ratelimited IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue IB/ipoib: Remove deprecated create_singlethread_workqueue ...
2016-10-07	Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-block	Linus Torvalds	6	-156/+178
	Pull block layer updates from Jens Axboe: "This is the main pull request for block layer changes in 4.9. As mentioned at the last merge window, I've changed things up and now do just one branch for core block layer changes, and driver changes. This avoids dependencies between the two branches. Outside of this main pull request, there are two topical branches coming as well. This pull request contains: - A set of fixes, and a conversion to blk-mq, of nbd. From Josef. - Set of fixes and updates for lightnvm from Matias, Simon, and Arnd. Followup dependency fix from Geert. - General fixes from Bart, Baoyou, Guoqing, and Linus W. - CFQ async write starvation fix from Glauber. - Add supprot for delayed kick of the requeue list, from Mike. - Pull out the scalable bitmap code from blk-mq-tag.c and make it generally available under the name of sbitmap. Only blk-mq-tag uses it for now, but the blk-mq scheduling bits will use it as well. From Omar. - bdev thaw error progagation from Pierre. - Improve the blk polling statistics, and allow the user to clear them. From Stephen. - Set of minor cleanups from Christoph in block/blk-mq. - Set of cleanups and optimizations from me for block/blk-mq. - Various nvme/nvmet/nvmeof fixes from the various folks" * 'for-4.9/block' of git://git.kernel.dk/linux-block: (54 commits) fs/block_dev.c: return the right error in thaw_bdev() nvme: Pass pointers, not dma addresses, to nvme_get/set_features() nvme/scsi: Remove power management support nvmet: Make dsm number of ranges zero based nvmet: Use direct IO for writes admin-cmd: Added smart-log command support. nvme-fabrics: Add host_traddr options field to host infrastructure nvme-fabrics: revise host transport option descriptions nvme-fabrics: rework nvmf_get_address() for variable options nbd: use BLK_MQ_F_BLOCKING blkcg: Annotate blkg_hint correctly cfq: fix starvation of asynchronous writes blk-mq: add flag for drivers wanting blocking ->queue_rq() blk-mq: remove non-blocking pass in blk_mq_map_request blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue() block: export bio_free_pages to other modules lightnvm: propagate device_add() error code lightnvm: expose device geometry through sysfs lightnvm: control life of nvm_dev in driver blk-mq: register device instead of disk ...
2016-09-24	nvme: Pass pointers, not dma addresses, to nvme_get/set_features()	Andy Lutomirski	3	-13/+11
	Any user I can imagine that needs a buffer at all will want to pass a pointer directly. There are no currently callers that use buffers, so this change is painless, and it will make it much easier to start using features that use buffers (e.g. APST). Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Tested-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-24	nvme/scsi: Remove power management support	Andy Lutomirski	1	-71/+3
	As far as I can tell, there is basically nothing correct about this code. It misinterprets npss (off-by-one). It hardcodes a bunch of power states, which is nonsense, because they're all just indices into a table that software needs to parse. It completely ignores the distinction between operational and non-operational states. And, until 4.8, if all of the above magically succeeded, it would dereference a NULL pointer and OOPS. Since this code appears to be useless, just delete it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Tested-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-23	nvme-fabrics: Add host_traddr options field to host infrastructure	James Smart	2	-0/+17
	Add the host_traddr field to allow specification of the host-port connection info for the transport. Will be used by FC transport. Signed-off-by: James Smart <james.smart@broadcom.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-23	nvme-fabrics: revise host transport option descriptions	James Smart	1	-3/+4
	Revise some of the comments so not so ethernet-network centric Signed-off-by: James Smart <james.smart@broadcom.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-23	nvme-fabrics: rework nvmf_get_address() for variable options	James Smart	1	-2/+10
	Revise nvmf_get_address() string to account for not all options being present. Signed-off-by: James Smart <james.smart@broadcom.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-23	nvme-rdma: use IB_PD_UNSAFE_GLOBAL_RKEY	Christoph Hellwig	1	-20/+5
	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23	IB/core: add support to create a unsafe global rkey to ib_create_pd	Christoph Hellwig	1	-1/+1
	Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-22	nvme-rdma: only clear queue flags after successful connect	Sagi Grimberg	1	-1/+1
	Otherwise, nvme_rdma_stop_and_clear_queue() will incorrectly try to stop/free rdma qps/cm_ids that are already freed. Fixes: e89ca58f9c90 ("nvme-rdma: add DELETING queue flag") Reported-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21	lightnvm: expose device geometry through sysfs	Simon A. F. Lund	3	-10/+30
	For a host to access an Open-Channel SSD, it has to know its geometry, so that it writes and reads at the appropriate device bounds. Currently, the geometry information is kept within the kernel, and not exported to user-space for consumption. This patch exposes the configuration through sysfs and enables user-space libraries, such as liblightnvm, to use the sysfs implementation to get the geometry of an Open-Channel SSD. The sysfs entries are stored within the device hierarchy, and can be found using the "lightnvm" device type. An example configuration looks like this: /sys/class/nvme/ └── nvme0n1 ├── capabilities: 3 ├── device_mode: 1 ├── erase_max: 1000000 ├── erase_typ: 1000000 ├── flash_media_type: 0 ├── media_capabilities: 0x00000001 ├── media_type: 0 ├── multiplane: 0x00010101 ├── num_blocks: 1022 ├── num_channels: 1 ├── num_luns: 4 ├── num_pages: 64 ├── num_planes: 1 ├── page_size: 4096 ├── prog_max: 100000 ├── prog_typ: 100000 ├── read_max: 10000 ├── read_typ: 10000 ├── sector_oob_size: 0 ├── sector_size: 4096 ├── media_manager: gennvm ├── ppa_format: 0x380830082808001010102008 ├── vendor_opcode: 0 ├── max_phys_secs: 64 └── version: 1 Signed-off-by: Simon A. F. Lund <slund@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21	lightnvm: control life of nvm_dev in driver	Matias Bjørling	3	-33/+46
	LightNVM compatible device drivers does not have a method to expose LightNVM specific sysfs entries. To enable LightNVM sysfs entries to be exposed, lightnvm device drivers require a struct device to attach it to. To allow both the actual device driver and lightnvm sysfs entries to coexist, the device driver tracks the lifetime of the nvm_dev structure. This patch refactors NVMe and null_blk to handle the lifetime of struct nvm_dev, which eliminates the need for struct gendisk when a lightnvm compatible device is provided. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21	nvme: refactor namespaces to support non-gendisk devices	Matias Bjørling	2	-43/+76
	With LightNVM enabled namespaces, the gendisk structure is not exposed to the user. This prevents LightNVM users from accessing the NVMe device driver specific sysfs entries, and LightNVM namespace geometry. Refactor the revalidation process, so that a namespace, instead of a gendisk, is revalidated. This later allows patches to wire up the sysfs entries up to a non-gendisk namespace. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-15	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	Linus Torvalds	2	-74/+79
	Pull block fixes from Jens Axboe: "A set of fixes for the current series in the realm of block. Like the previous pull request, the meat of it are fixes for the nvme fabrics/target code. Outside of that, just one fix from Gabriel for not doing a queue suspend if we didn't get the admin queue setup in the first place" * 'for-linus' of git://git.kernel.dk/linux-block: nvme-rdma: add back dependency on CONFIG_BLOCK nvme-rdma: fix null pointer dereference on req->mr nvme-rdma: use ib_client API to detect device removal nvme-rdma: add DELETING queue flag nvme/quirk: Add a delay before checking device ready for memblaze device nvme: Don't suspend admin queue that wasn't created nvme-rdma: destroy nvme queue rdma resources on connect failure nvme_rdma: keep a ref on the ctrl during delete/flush iw_cxgb4: block module unload until all ep resources are released iw_cxgb4: call dev_put() on l2t allocation failure
2016-09-15	nvme: remove the post_scan callout	Christoph Hellwig	2	-4/+0
	No need now that we don't have to reverse engineer the irq affinity. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-15	nvme: switch to use pci_alloc_irq_vectors	Christoph Hellwig	1	-71/+36
	Use the new helper to automatically select the right interrupt type, as well as to use the automatic interupt affinity assignment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-15	blk-mq: remove ->map_queue	Christoph Hellwig	2	-4/+0
	All drivers use the default, so provide an inline version of it. If we ever need other queue mapping we can add an optional method back, although supporting will also require major changes to the queue setup code. This provides better code generation, and better debugability as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-13	Merge branch 'nvmf-4.8-rc' of git://git.infradead.org/nvme-fabrics into ↵	Jens Axboe	2	-73/+72
	for-linus Sagi writes: Here we have: - Kconfig dependencies fix from Arnd - nvme-rdma device removal fixes from Steve - possible bad deref fix from Colin
2016-09-12	nvme-rdma: add back dependency on CONFIG_BLOCK	Arnd Bergmann	1	-0/+1
	A recent change removed the dependency on BLK_DEV_NVME, which implies the dependency on PCI and BLOCK. We don't need CONFIG_PCI, but without CONFIG_BLOCK we get tons of build errors, e.g. In file included from drivers/nvme/host/core.c:16:0: linux/blk-mq.h:182:33: error: 'struct gendisk' declared inside parameter list will not be visible outside of this definition or declaration [-Werror] drivers/nvme/host/core.c: In function 'nvme_setup_rw': drivers/nvme/host/core.c:295:21: error: implicit declaration of function 'rq_data_dir' [-Werror=implicit-function-declaration] drivers/nvme/host/nvme.h: In function 'nvme_map_len': drivers/nvme/host/nvme.h:217:6: error: implicit declaration of function 'req_op' [-Werror=implicit-function-declaration] drivers/nvme/host/scsi.c: In function 'nvme_trans_bdev_limits_page': drivers/nvme/host/scsi.c:768:85: error: implicit declaration of function 'queue_max_hw_sectors' [-Werror=implicit-function-declaration] This adds back the specific CONFIG_BLOCK dependency to avoid broken configurations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: aa71987472a9 ("nvme: fabrics drivers don't need the nvme-pci driver") Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-12	nvme-rdma: fix null pointer dereference on req->mr	Colin Ian King	1	-0/+1
	If there is an error on req->mr, req->mr is set to null, however the following statement sets req->mr->need_inval causing a null pointer dereference. Fix this by bailing out to label 'out' to immediately return and hence skip over the offending null pointer dereference. Fixes: f5b7b559e1488 ("nvme-rdma: Get rid of duplicate variable") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-12	nvme-rdma: use ib_client API to detect device removal	Steve Wise	1	-68/+40
	Change nvme-rdma to use the IB Client API to detect device removal. This has the wonderful benefit of being able to blow away all the ib/rdma_cm resources for the device being removed. No craziness about not destroying the cm_id handling the event. No deadlocks due to broken iw_cm/rdma_cm/iwarp dependencies. And no need to have a bound cm_id around during controller recovery/reconnect to catch device removal events. We don't use the device_add aspect of the ib_client service since we only want to create resources for an IB device if we have a target utilizing that device. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-12	nvme-rdma: add DELETING queue flag	Sagi Grimberg	1	-2/+9
	When we get a surprise disconnect from the target we queue a periodic reconnect (which is the sane thing to do...). We only move the queues out of CONNECTED when we retry to reconnect (after 10 seconds in the default case) but we stop the blk queues immediately so we are not bothered with traffic from now on. If delete() is kicking off in this period the queues are still in CONNECTED state. Part of the delete sequence is trying to issue ctrl shutdown if the admin queue is CONNECTED (which it is!). This request is issued but stuck in blk-mq waiting for the queues to start again. This might be the one preventing us from forward progress... The patch separates the queue flags to CONNECTED and DELETING. Now we will move out of CONNECTED as soon as error recovery kicks in (before stopping the queues) and DELETING is on when we start the queue deletion. Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-11	nvme: make NVME_RDMA depend on BLOCK	Linus Torvalds	1	-1/+1
	Commit aa71987472a9 ("nvme: fabrics drivers don't need the nvme-pci driver") removed the dependency on BLK_DEV_NVME, but the cdoe does depend on the block layer (which used to be an implicit dependency through BLK_DEV_NVME). Otherwise you get various errors from the kbuild test robot random config testing when that happens to hit a configuration with BLOCK device support disabled. Cc: Christoph Hellwig <hch@lst.de> Cc: Jay Freyensee <james_p_freyensee@linux.intel.com> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-08	nvme/quirk: Add a delay before checking device ready for memblaze device	Wenbo Wang	1	-0/+2
	Signed-off-by: Wenbo Wang <wenbo.wang@memblaze.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-07	nvme: Don't suspend admin queue that wasn't created	Gabriel Krisman Bertazi	1	-1/+6
	This fixes a regression in my previous commit c21377f8366c ("nvme: Suspend all queues before deletion"), which provoked an Oops in the removal path when removing a device that became IO incapable very early at probe (i.e. after a failed EEH recovery). Turns out, if the error occurred very early at the probe path, before even configuring the admin queue, we might try to suspend the uninitialized admin queue, accessing bad memory. Fixes: c21377f8366c ("nvme: Suspend all queues before deletion") Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-04	nvme-rdma: destroy nvme queue rdma resources on connect failure	Steve Wise	1	-3/+11
	After address resolution, the nvme_rdma_queue rdma resources are allocated. If rdma route resolution or the connect fails, or the controller reconnect times out and gives up, then the rdma resources need to be freed. Otherwise, rdma resources are leaked. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimbrg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-04	nvme_rdma: keep a ref on the ctrl during delete/flush	Steve Wise	1	-8/+18
	Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimbrg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-08-29	Merge branch 'nvmf-4.8-rc' of git://git.infradead.org/nvme-fabrics into ↵	Jens Axboe	4	-27/+46
	for-linus Sagi writes: Mostly stability fixes and cleanups: - NQN endianess fix from Daniel - possible use-after-free fix from Vincent - nvme-rdma connect semantics fixes from Jay - Remove redundant variables in rdma driver - Kbuild fix from Christoph - nvmf_host referencing fix from Christoph - uninit variable fix from Colin
2016-08-28	nvme-rdma: Get rid of redundant defines	Sagi Grimberg	1	-4/+0
	Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-08-28	nvme-rdma: Get rid of duplicate variable	Sagi Grimberg	1	-8/+7
	We already have need_inval in ib_mr, lets use that instead. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-08-24	nvme: Fix nvme_get/set_features() with a NULL result pointer	Andy Lutomirski	1	-2/+2
	nvme_set_features() callers seem to expect that passing NULL as the result pointer is acceptable. Teach nvme_set_features() not to try to write to the NULL address. For symmetry, make the same change to nvme_get_features(), despite the fact that all current callers pass a valid result pointer. I assume that this bug hasn't been reported in practice because the callers that pass NULL are all in the SCSI translation layer and no one uses the relevant operations. Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-19	nvme: fabrics drivers don't need the nvme-pci driver	Christoph Hellwig	1	-1/+1
	So select the NVME_CORE symbol instead of depending on BLK_DEV_NVME. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-08-19	nvme-fabrics: get a reference when reusing a nvme_host structure	Christoph Hellwig	1	-1/+3
	Without this we'll get a use after free after connecting two controller using the same hostnqn and then disconnecting one of them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-08-19	nvme-fabrics: change NQN UUID to big-endian format	Daniel Verkamp	2	-6/+6
	NVM Express 1.2.1 section 7.9, NVMe Qualified Names, specifies that the UUID format of NQN uses a UUID based on RFC 4122. RFC 4122 specifies that the UUID is encoded in big-endian byte order. Switch the NVMe over Fabrics host ID field from little-endian UUID to big-endian UUID to match the specification. Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-08-18	nvme-rdma: fix sqsize/hsqsize per spec	Jay Freyensee	1	-4/+10
	Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as a 0-based value. Also per spec, the RDMA binding values shall be set to sqsize, which makes hsqsize 0-based values. Thus, the sqsize during NVMf connect() is now: [root@fedora23-fabrics-host1 for-48]# dmesg [ 318.720645] nvme_fabrics: nvmf_connect_admin_queue(): sqsize for admin queue: 31 [ 318.720884] nvme nvme0: creating 16 I/O queues. [ 318.810114] nvme_fabrics: nvmf_connect_io_queue(): sqsize for i/o queue: 127 Finally, current interpretation implies hrqsize is 1's based so set it appropriately. Reported-by: Daniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-08-18	fabrics: define admin sqsize min default, per spec	Jay Freyensee	2	-3/+19
	Upon admin queue connect(), the rdma qp was being set based on NVMF_AQ_DEPTH. However, the fabrics layer was using the sqsize field value set for I/O queues for the admin queue, which threw the nvme layer and rdma layer off-whack: root@fedora23-fabrics-host1 nvmf]# dmesg [ 3507.798642] nvme_fabrics: nvmf_connect_admin_queue():admin sqsize being sent is: 128 [ 3507.798858] nvme nvme0: creating 16 I/O queues. [ 3507.896407] nvme nvme0: new ctrl: NQN "nullside-nqn", addr 192.168.1.3:4420 Thus, to have a different admin queue value, we use NVMF_AQ_DEPTH for connect() and RDMA private data as the minimum depth specified in the NVMe-over-Fabrics 1.0 spec (and in that RDMA private data we treat hrqsize as 1's-based value, per current understanding of the fabrics spec). Reported-by: Daniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-08-16	nvme-rdma: initialize ret to zero to avoid returning garbage	Colin Ian King	1	-1/+1
	ret is not initialized so it contains garbage. Ensure garbage is not returned by initializing rc to 0. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-08-15	nvme: Prevent controller state invalid transition	Gabriel Krisman Bertazi	1	-2/+5
	Acquiring the nvme_ctrl lock before reading ctrl->state in nvme_change_ctrl_state() should prevent a theoretical invalid state transition, in the event of two threads racing inside that function. I haven't been able to observe this happening with the current code, and the current state machine seems to be simple enough to not be affected by these invalid transitions, but future modifications could make it more likely to happen. Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Sagi Grimberg <sag@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-11	nvme: Suspend all queues before deletion	Gabriel Krisman Bertazi	1	-12/+8
	When nvme_delete_queue fails in the first pass of the nvme_disable_io_queues() loop, we return early, failing to suspend all of the IO queues. Later, on the nvme_pci_disable path, this causes us to disable MSI without actually having freed all the IRQs, which triggers the BUG_ON in free_msi_irqs(), as show below. This patch refactors nvme_disable_io_queues to suspend all queues before start submitting delete queue commands. This way, we ensure that we have at least returned every IRQ before continuing with the removal path. [ 487.529200] kernel BUG at ../drivers/pci/msi.c:368! cpu 0x46: Vector: 700 (Program Check) at [c0000078c5b83650] pc: c000000000627a50: free_msi_irqs+0x90/0x200 lr: c000000000627a40: free_msi_irqs+0x80/0x200 sp: c0000078c5b838d0 msr: 9000000100029033 current = 0xc0000078c5b40000 paca = 0xc000000002bd7600 softe: 0 irq_happened: 0x01 pid = 1376, comm = kworker/70:1H kernel BUG at ../drivers/pci/msi.c:368! Linux version 4.7.0.mainline+ (root@iod76) (gcc version 5.3.1 20160413 (Ubuntu/IBM 5.3.1-14ubuntu2.1) ) #104 SMP Fri Jul 29 09:20:17 CDT 2016 enter ? for help [c0000078c5b83920] d0000000363b0cd8 nvme_dev_disable+0x208/0x4f0 [nvme] [c0000078c5b83a10] d0000000363b12a4 nvme_timeout+0xe4/0x250 [nvme] [c0000078c5b83ad0] c0000000005690e4 blk_mq_rq_timed_out+0x64/0x110 [c0000078c5b83b40] c00000000056c930 bt_for_each+0x160/0x170 [c0000078c5b83bb0] c00000000056d928 blk_mq_queue_tag_busy_iter+0x78/0x110 [c0000078c5b83c00] c0000000005675d8 blk_mq_timeout_work+0xd8/0x1b0 [c0000078c5b83c50] c0000000000e8cf0 process_one_work+0x1e0/0x590 [c0000078c5b83ce0] c0000000000e9148 worker_thread+0xa8/0x660 [c0000078c5b83d80] c0000000000f2090 kthread+0x110/0x130 [c0000078c5b83e30] c0000000000095f0 ret_from_kernel_thread+0x5c/0x6c Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: linux-nvme@lists.infradead.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-08	Merge branch 'nvmf-4.8-rc' of git://git.infradead.org/nvme-fabrics into ↵	Jens Axboe	1	-38/+45
	for-linus Sagi writes: Mostly stability fixes for nvmet, rdma: - fix uninitialized rdma_cm private data from Roland. - rdma device removal handling (host and target). - fix controller disconnect during active mounts. - fix namespaces lost after fabric reconnects. - remove redundant calls to namespace removal (rdma, loop). - actually send controller shutdown when disconnecting. - reconnect fixes (ns rescan and aen requeue) - nvmet controller serial number inconsistency fix.