summaryrefslogtreecommitdiffstats
path: root/drivers/nvme/target/tcp.c
AgeCommit message (Collapse)AuthorFilesLines
2022-03-29nvmet: use a private workqueue instead of the system workqueueSagi Grimberg1-5/+5
Any attempt to flush kernel-global WQs has possibility of deadlock so we should simply stop using them, instead introduce nvmet_wq which is the generic nvmet workqueue for work elements that don't explicitly require a dedicated workqueue (by the mere fact that they are using the system_wq). Changes were done using the following replaces: - s/schedule_work(/queue_work(nvmet_wq, /g - s/schedule_delayed_work(/queue_delayed_work(nvmet_wq, /g - s/flush_scheduled_work()/flush_workqueue(nvmet_wq)/g Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28nvmet-tcp: replace ida_simple[get|remove] with the simler ida_[alloc|free]Sagi Grimberg1-3/+3
ida_simple_[get|remove] are wrappers anyways. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-12-08nvmet-tcp: fix possible list corruption for unexpected command failureSagi Grimberg1-1/+8
nvmet_tcp_handle_req_failure needs to understand weather to prepare for incoming data or the next pdu. However if we misidentify this, we will wait for 0-length data, and queue the response although nvmet_req_init already did that. The particular command was namespace management command with no data, which was incorrectly categorized as a command with incapsule data. Also, add a code comment of what we are trying to do here. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-11-23nvmet-tcp: fix incomplete data digest sendVarun Prakash1-1/+6
Current nvmet_try_send_ddgst() code does not check whether all data digest bytes are transmitted, fix this by returning -EAGAIN if all data digest bytes are not transmitted. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-11-23nvmet-tcp: fix memory leak when performing a controller resetMaurizio Lombardi1-1/+4
If a reset controller is executed while the initiator is performing some I/O the driver may leak the memory allocated for the commands' iovec. Make sure that nvmet_tcp_uninit_data_in_cmds() releases all the memory. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-11-23nvmet-tcp: add an helper to free the cmd buffersMaurizio Lombardi1-9/+19
Makes the code easier to read and to debug. Sets the freed pointers to NULL, it will be useful when destroying the queues to understand if the commands' buffers have been released already or not. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-11-23nvmet-tcp: fix a race condition between release_queue and io_workMaurizio Lombardi1-1/+3
If the initiator executes a reset controller operation while performing I/O, the target kernel will crash because of a race condition between release_queue and io_work; nvmet_tcp_uninit_data_in_cmds() may be executed while io_work is running, calling flush_work() was not sufficient to prevent this because io_work could requeue itself. Fix this bug by using cancel_work_sync() to prevent io_work from requeuing itself and set rcv_state to NVMET_TCP_RECV_ERR to make sure we don't receive any more data from the socket. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-11-01Merge tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+16
Pull block driver updates from Jens Axboe: - paride driver cleanups (Christoph) - Remove cryptoloop support (Christoph) - null_blk poll support (me) - Now that add_disk() supports proper error handling, add it to various drivers (Luis) - Make ataflop actually work again (Michael) - s390 dasd fixes (Stefan, Heiko) - nbd fixes (Yu, Ye) - Remove redundant wq flush in mtip32xx (Christophe) - NVMe updates - fix a multipath partition scanning deadlock (Hannes Reinecke) - generate uevent once a multipath namespace is operational again (Hannes Reinecke) - support unique discovery controller NQNs (Hannes Reinecke) - fix use-after-free when a port is removed (Israel Rukshin) - clear shadow doorbell memory on resets (Keith Busch) - use struct_size (Len Baker) - add error handling support for add_disk (Luis Chamberlain) - limit the maximal queue size for RDMA controllers (Max Gurtovoy) - use a few more symbolic names (Max Gurtovoy) - fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy) - add support for ->map_queues on FC (Saurav Kashyap) - support the current discovery subsystem entry (Hannes Reinecke) - use flex_array_size and struct_size (Len Baker) - bcache fixes (Christoph, Coly, Chao, Lin, Qing) - MD updates (Christoph, Guoqing, Xiao) - Misc fixes (Dan, Ding, Jiapeng, Shin'ichiro, Ye) * tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block: (117 commits) null_blk: Fix handling of submit_queues and poll_queues attributes block: ataflop: Fix warning comparing pointer to 0 bcache: replace snprintf in show functions with sysfs_emit bcache: move uapi header bcache.h to bcache code directory nvmet: use flex_array_size and struct_size nvmet: register discovery subsystem as 'current' nvmet: switch check for subsystem type nvme: add new discovery log page entry definitions block: ataflop: more blk-mq refactoring fixes block: remove support for cryptoloop and the xor transfer mtd: add add_disk() error handling rnbd: add error handling support for add_disk() um/drivers/ubd_kern: add error handling support for add_disk() m68k/emu/nfblock: add error handling support for add_disk() xen-blkfront: add error handling support for add_disk() bcache: add error handling support for add_disk() dm: add add_disk() error handling block: aoe: fixup coccinelle warnings nvmet: use struct_size over open coded arithmetic nvme: drop scan_lock and always kick requeue list when removing namespaces ...
2021-10-27nvmet-tcp: fix header digest verificationAmit Engel1-1/+1
Pass the correct length to nvmet_tcp_verify_hdgst, which is the pdu header length. This fixes a wrong behaviour where header digest verification passes although the digest is wrong. Signed-off-by: Amit Engel <amit.engel@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27nvmet-tcp: fix data digest pointer calculationVarun Prakash1-1/+1
exp_ddgst is of type __le32, &cmd->exp_ddgst + cmd->offset increases &cmd->exp_ddgst by 4 * cmd->offset, fix this by type casting &cmd->exp_ddgst to u8 *. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-26nvmet-tcp: fix a memory leak when releasing a queueMaurizio Lombardi1-0/+3
page_frag_free() won't completely release the memory allocated for the commands, the cache page must be explicitly freed by calling __page_frag_cache_drain(). This bug can be easily reproduced by repeatedly executing the following command on the initiator: $echo 1 > /sys/devices/virtual/nvme-fabrics/ctl/nvme0/reset_controller Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-20nvmet-tcp: fix use-after-free when a port is removedIsrael Rukshin1-0/+16
When removing a port, all its controllers are being removed, but there are queues on the port that doesn't belong to any controller (during connection time). This causes a use-after-free bug for any command that dereferences req->port (like in nvmet_alloc_ctrl). Those queues should be destroyed before freeing the port via configfs. Destroy the remaining queues after the accept_work was cancelled guarantees that no new queue will be created. Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-07-05nvme-tcp: can't set sk_user_data without write_lockMaurizio Lombardi1-1/+0
The sk_user_data pointer is supposed to be modified only while holding the write_lock "sk_callback_lock", otherwise we could race with other threads and crash the kernel. we can't take the write_lock in nvmet_tcp_state_change() because it would cause a deadlock, but the release_work queue will set the pointer to NULL later so we can simply remove the assignment. Fixes: b5332a9f3f3d ("nvmet-tcp: fix incorrect locking in state_change sk callback") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-05-26nvmet-tcp: fix inline data size comparison in nvmet_tcp_queue_responseHou Pu1-1/+1
Using "<=" instead "<" to compare inline data size. Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error") Signed-off-by: Hou Pu <houpu.main@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-28Merge tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-blockLinus Torvalds1-12/+67
Pull block driver updates from Jens Axboe: - MD changes via Song: - raid5 POWER fix - raid1 failure fix - UAF fix for md cluster - mddev_find_or_alloc() clean up - Fix NULL pointer deref with external bitmap - Performance improvement for raid10 discard requests - Fix missing information of /proc/mdstat - rsxx const qualifier removal (Arnd) - Expose allocated brd pages (Calvin) - rnbd via Gioh Kim: - Change maintainer - Change domain address of maintainers' email - Add polling IO mode and document update - Fix memory leak and some bug detected by static code analysis tools - Code refactoring - Series of floppy cleanups/fixes (Denis) - s390 dasd fixes (Julian) - kerneldoc fixes (Lee) - null_blk double free (Lv) - null_blk virtual boundary addition (Max) - Remove xsysace driver (Michal) - umem driver removal (Davidlohr) - ataflop fixes (Dan) - Revalidate disk removal (Christoph) - Bounce buffer cleanups (Christoph) - Mark lightnvm as deprecated (Christoph) - mtip32xx init cleanups (Shixin) - Various fixes (Tian, Gustavo, Coly, Yang, Zhang, Zhiqiang) * tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-block: (143 commits) async_xor: increase src_offs when dropping destination page drivers/block/null_blk/main: Fix a double free in null_init. md/raid1: properly indicate failure when ending a failed write request md-cluster: fix use-after-free issue when removing rdev nvme: introduce generic per-namespace chardev nvme: cleanup nvme_configure_apst nvme: do not try to reconfigure APST when the controller is not live nvme: add 'kato' sysfs attribute nvme: sanitize KATO setting nvmet: avoid queuing keep-alive timer if it is disabled brd: expose number of allocated pages in debugfs ataflop: fix off by one in ataflop_probe() ataflop: potential out of bounds in do_format() drbd: Fix fall-through warnings for Clang block/rnbd: Use strscpy instead of strlcpy block/rnbd-clt-sysfs: Remove copy buffer overlap in rnbd_clt_get_path_name block/rnbd-clt: Remove max_segment_size block/rnbd-clt: Generate kobject_uevent when the rnbd device state changes block/rnbd-srv: Remove unused arguments of rnbd_srv_rdma_ev Documentation/ABI/rnbd-clt: Add description for nr_poll_queues ...
2021-04-15nvmet-tcp: fix a segmentation fault during io parsing errorElad Grupi1-8/+31
In case there is an io that contains inline data and it goes to parsing error flow, command response will free command and iov before clearing the data on the socket buffer. This will delay the command response until receive flow is completed. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: Elad Grupi <elad.grupi@dell.com> Signed-off-by: Hou Pu <houpu.main@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02nvmet-tcp: enable optional queue idle period trackingWunderlich, Mark1-2/+34
Add 'idle_poll_period_usecs' option used by io_work() to support network devices enabled with advanced interrupt moderation supporting a relaxed interrupt model. It was discovered that such a NIC used on the target was unable to support initiator connection establishment, caused by the existing io_work() flow that immediately exits after a loop with no activity and does not re-queue itself. With this new option a queue is assigned a period of time that no activity must occur in order to become 'idle'. Until the queue is idle the work item is requeued. The new module option is defined as changeable making it flexible for testing purposes. The pre-existing legacy behavior is preserved when no module option for idle_poll_period_usecs is specified. Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02nvmet-tcp: fix incorrect locking in state_change sk callbackSagi Grimberg1-2/+2
We are not changing anything in the TCP connection state so we should not take a write_lock but rather a read lock. This caused a deadlock when running nvmet-tcp and nvme-tcp on the same system, where state_change callbacks on the host and on the controller side have causal relationship and made lockdep report on this with blktests: ================================ WARNING: inconsistent lock state 5.12.0-rc3 #1 Tainted: G I -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-R} usage. nvme/1324 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff888363151000 (clock-AF_INET){++-?}-{2:2}, at: nvme_tcp_state_change+0x21/0x150 [nvme_tcp] {IN-SOFTIRQ-W} state was registered at: __lock_acquire+0x79b/0x18d0 lock_acquire+0x1ca/0x480 _raw_write_lock_bh+0x39/0x80 nvmet_tcp_state_change+0x21/0x170 [nvmet_tcp] tcp_fin+0x2a8/0x780 tcp_data_queue+0xf94/0x1f20 tcp_rcv_established+0x6ba/0x1f00 tcp_v4_do_rcv+0x502/0x760 tcp_v4_rcv+0x257e/0x3430 ip_protocol_deliver_rcu+0x69/0x6a0 ip_local_deliver_finish+0x1e2/0x2f0 ip_local_deliver+0x1a2/0x420 ip_rcv+0x4fb/0x6b0 __netif_receive_skb_one_core+0x162/0x1b0 process_backlog+0x1ff/0x770 __napi_poll.constprop.0+0xa9/0x5c0 net_rx_action+0x7b3/0xb30 __do_softirq+0x1f0/0x940 do_softirq+0xa1/0xd0 __local_bh_enable_ip+0xd8/0x100 ip_finish_output2+0x6b7/0x18a0 __ip_queue_xmit+0x706/0x1aa0 __tcp_transmit_skb+0x2068/0x2e20 tcp_write_xmit+0xc9e/0x2bb0 __tcp_push_pending_frames+0x92/0x310 inet_shutdown+0x158/0x300 __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp] nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp] nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp] nvme_do_delete_ctrl+0x100/0x10c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write_iter+0x2c7/0x460 new_sync_write+0x36c/0x610 vfs_write+0x5c0/0x870 ksys_write+0xf9/0x1d0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae irq event stamp: 10687 hardirqs last enabled at (10687): [<ffffffff9ec376bd>] _raw_spin_unlock_irqrestore+0x2d/0x40 hardirqs last disabled at (10686): [<ffffffff9ec374d8>] _raw_spin_lock_irqsave+0x68/0x90 softirqs last enabled at (10684): [<ffffffff9f000608>] __do_softirq+0x608/0x940 softirqs last disabled at (10649): [<ffffffff9cdedd31>] do_softirq+0xa1/0xd0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(clock-AF_INET); <Interrupt> lock(clock-AF_INET); *** DEADLOCK *** 5 locks held by nvme/1324: #0: ffff8884a01fe470 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0 #1: ffff8886e435c090 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x460 #2: ffff888104d90c38 (kn->active#255){++++}-{0:0}, at: kernfs_remove_self+0x22d/0x330 #3: ffff8884634538d0 (&queue->queue_lock){+.+.}-{3:3}, at: nvme_tcp_stop_queue+0x52/0xb0 [nvme_tcp] #4: ffff888363150d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_shutdown+0x59/0x300 stack backtrace: CPU: 26 PID: 1324 Comm: nvme Tainted: G I 5.12.0-rc3 #1 Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020 Call Trace: dump_stack+0x93/0xc2 mark_lock_irq.cold+0x2c/0xb3 ? verify_lock_unused+0x390/0x390 ? stack_trace_consume_entry+0x160/0x160 ? lock_downgrade+0x100/0x100 ? save_trace+0x88/0x5e0 ? _raw_spin_unlock_irqrestore+0x2d/0x40 mark_lock+0x530/0x1470 ? mark_lock_irq+0x1d10/0x1d10 ? enqueue_timer+0x660/0x660 mark_usage+0x215/0x2a0 __lock_acquire+0x79b/0x18d0 ? tcp_schedule_loss_probe.part.0+0x38c/0x520 lock_acquire+0x1ca/0x480 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp] ? rcu_read_unlock+0x40/0x40 ? tcp_mtu_probe+0x1ae0/0x1ae0 ? kmalloc_reserve+0xa0/0xa0 ? sysfs_file_ops+0x170/0x170 _raw_read_lock+0x3d/0xa0 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp] nvme_tcp_state_change+0x21/0x150 [nvme_tcp] ? sysfs_file_ops+0x170/0x170 inet_shutdown+0x189/0x300 __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp] nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp] nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp] nvme_do_delete_ctrl+0x100/0x10c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write_iter+0x2c7/0x460 new_sync_write+0x36c/0x610 ? new_sync_read+0x600/0x600 ? lock_acquire+0x1ca/0x480 ? rcu_read_unlock+0x40/0x40 ? lock_is_held_type+0x9a/0x110 vfs_write+0x5c0/0x870 ksys_write+0xf9/0x1d0 ? __ia32_sys_read+0xa0/0xa0 ? lockdep_hardirqs_on_prepare.part.0+0x198/0x340 ? syscall_enter_from_user_mode+0x27/0x70 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-18nvmet-tcp: fix kmap leak when data digest in useElad Grupi1-1/+1
When data digest is enabled we should unmap pdu iovec before handling the data digest pdu. Signed-off-by: Elad Grupi <elad.grupi@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-21Merge tag 'for-5.12/drivers-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds1-17/+42
Pull block driver updates from Jens Axboe: - Remove the skd driver. It's been EOL for a long time (Damien) - NVMe pull requests - fix multipath handling of ->queue_rq errors (Chao Leng) - nvmet cleanups (Chaitanya Kulkarni) - add a quirk for buggy Amazon controller (Filippo Sironi) - avoid devm allocations in nvme-hwmon that don't interact well with fabrics (Hannes Reinecke) - sysfs cleanups (Jiapeng Chong) - fix nr_zones for multipath (Keith Busch) - nvme-tcp crash fix for no-data commands (Sagi Grimberg) - nvmet-tcp fixes (Sagi Grimberg) - add a missing __rcu annotation (Christoph) - failed reconnect fixes (Chao Leng) - various tracing improvements (Michal Krakowiak, Johannes Thumshirn) - switch the nvmet-fc assoc_list to use RCU protection (Leonid Ravich) - resync the status codes with the latest spec (Max Gurtovoy) - minor nvme-tcp improvements (Sagi Grimberg) - various cleanups (Rikard Falkeborn, Minwoo Im, Chaitanya Kulkarni, Israel Rukshin) - Floppy O_NDELAY fix (Denis) - MD pull request - raid5 chunk_sectors fix (Guoqing) - Use lore links (Kees) - Use DEFINE_SHOW_ATTRIBUTE for nbd (Liao) - loop lock scaling (Pavel) - mtip32xx PCI fixes (Bjorn) - bcache fixes (Kai, Dongdong) - Misc fixes (Tian, Yang, Guoqing, Joe, Andy) * tag 'for-5.12/drivers-2021-02-17' of git://git.kernel.dk/linux-block: (64 commits) lightnvm: pblk: Replace guid_copy() with export_guid()/import_guid() lightnvm: fix unnecessary NULL check warnings nvme-tcp: fix crash triggered with a dataless request submission block: Replace lkml.org links with lore nbd: Convert to DEFINE_SHOW_ATTRIBUTE nvme: add 48-bit DMA address quirk for Amazon NVMe controllers nvme-hwmon: rework to avoid devm allocation nvmet: remove else at the end of the function nvmet: add nvmet_req_subsys() helper nvmet: use min of device_path and disk len nvmet: use invalid cmd opcode helper nvmet: use invalid cmd opcode helper nvmet: add helper to report invalid opcode nvmet: remove extra variable in id-ns handler nvmet: make nvmet_find_namespace() req based nvmet: return uniform error for invalid ns nvmet: set status to 0 in case for invalid nsid nvmet-fc: add a missing __rcu annotation to nvmet_fc_tgt_assoc.queues nvme-multipath: set nr_zones for zoned namespaces nvmet-tcp: fix potential race of tcp socket closing accept_work ...
2021-02-10nvmet-tcp: fix potential race of tcp socket closing accept_workSagi Grimberg1-10/+18
When we accept a TCP connection and allocate an nvmet-tcp queue we should make sure not to fully establish it or reference it as the connection may be already closing, which triggers queue release work, which does not fence against queue establishment. In order to address such a race, we make sure to check the sk_state and contain the queue reference to be done underneath the sk_callback_lock such that the queue release work correctly fences against it. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: Elad Grupi <elad.grupi@dell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvmet-tcp: fix receive data digest calculation for multiple h2cdata PDUsSagi Grimberg1-7/+24
When a host sends multiple h2cdata PDUs for a single command, we should verify the data digest calculation per PDU and not per command. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com> Tested-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-03nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUsSagi Grimberg1-1/+2
When the host sends multiple h2cdata PDUs, we keep track on the receive progress and calculate the scatterlist index and offsets. The issue is that sg_offset should only be kept for the first iov entry we map in the iovec as this is the difference between our cursor and the sg entry offset itself. In addition, the sg index was calculated wrong because we should not round up when dividing the command byte offset with PAG_SIZE. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com> Tested-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvmet-tcp: have queue io_work context run on sock incoming cpuMark Wunderlich1-11/+10
No real good need to spread queues artificially. Usually the target will serve multiple hosts, and it's better to run on the socket incoming cpu for better affinitization rather than spread queues on all online cpus. We rely on RSS to spread the work around sufficiently. Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-08-28nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pduZiye Yang1-1/+9
When handling commands without in-capsule data, we assign the ttag assuming we already have the queue commands array allocated (based on the queue size information in the connect data payload). However if the connect itself did not send the connect data in-capsule we have yet to allocate the queue commands,and we will assign a bogus ttag and suffer a NULL dereference when we receive the corresponding h2cdata pdu. Fix this by checking if we already allocated commands before dereferencing it when handling h2cdata, if we didn't, its for sure a connect and we should use the preallocated connect command. Signed-off-by: Ziye Yang <ziye.yang@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2020-07-08nvmet-tcp: simplify nvmet_process_resp_listSagi Grimberg1-9/+3
We can make it shorter and simpler without some redundant checks. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08nvmet-tcp: remove has_keyed_sgls initializationMax Gurtovoy1-1/+0
Since the nvmet_tcp_ops is static, there is no need to initialize values to zero. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-11Merge tag 'block-5.8-2020-06-11' of git://git.kernel.dk/linux-blockLinus Torvalds1-2/+2
Pull block fixes from Jens Axboe: "Some followup fixes for this merge window. In particular: - Seqcount write missing preemption disable for stats (Ahmed) - blktrace fixes (Chaitanya) - Redundant initializations (Colin) - Various small NVMe fixes (Chaitanya, Christoph, Daniel, Max, Niklas, Rikard) - loop flag bug regression fix (Martijn) - blk-mq tagging fixes (Christoph, Ming)" * tag 'block-5.8-2020-06-11' of git://git.kernel.dk/linux-block: umem: remove redundant initialization of variable ret pktcdvd: remove redundant initialization of variable ret nvmet: fail outstanding host posted AEN req nvme-pci: use simple suspend when a HMB is enabled nvme-fc: don't call nvme_cleanup_cmd() for AENs nvmet-tcp: constify nvmet_tcp_ops nvme-tcp: constify nvme_tcp_mq_ops and nvme_tcp_admin_mq_ops nvme: do not call del_gendisk() on a disk that was never added blk-mq: fix blk_mq_all_tag_iter blk-mq: split out a __blk_mq_get_driver_tag helper blktrace: fix endianness for blk_log_remap() blktrace: fix endianness in get_pdu_int() blktrace: use errno instead of bi_status block: nr_sects_write(): Disable preemption on seqcount write block: remove the error argument to the block_bio_complete tracepoint loop: Fix wrong masking of status flags block/bio-integrity: don't free 'buf' if bio_integrity_add_page() failed
2020-06-11nvmet-tcp: constify nvmet_tcp_opsMax Gurtovoy1-2/+2
nvmet_tcp_ops is never modified and can be made const to allow the compiler to put it in read-only memory, as done in other transports. Before: text data bss dec hex filename 16164 160 12 16336 3fd0 drivers/nvme/target/tcp.o After: text data bss dec hex filename 16277 64 12 16353 3fe1 drivers/nvme/target/tcp.o Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Israel Rukshin <israelr@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds1-44/+10
Pull networking updates from David Miller: 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz Augusto von Dentz. 2) Add GSO partial support to igc, from Sasha Neftin. 3) Several cleanups and improvements to r8169 from Heiner Kallweit. 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a device self-test. From Andrew Lunn. 5) Start moving away from custom driver versions, use the globally defined kernel version instead, from Leon Romanovsky. 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin. 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet. 8) Add sriov and vf support to hinic, from Luo bin. 9) Support Media Redundancy Protocol (MRP) in the bridging code, from Horatiu Vultur. 10) Support netmap in the nft_nat code, from Pablo Neira Ayuso. 11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina Dubroca. Also add ipv6 support for espintcp. 12) Lots of ReST conversions of the networking documentation, from Mauro Carvalho Chehab. 13) Support configuration of ethtool rxnfc flows in bcmgenet driver, from Doug Berger. 14) Allow to dump cgroup id and filter by it in inet_diag code, from Dmitry Yakunin. 15) Add infrastructure to export netlink attribute policies to userspace, from Johannes Berg. 16) Several optimizations to sch_fq scheduler, from Eric Dumazet. 17) Fallback to the default qdisc if qdisc init fails because otherwise a packet scheduler init failure will make a device inoperative. From Jesper Dangaard Brouer. 18) Several RISCV bpf jit optimizations, from Luke Nelson. 19) Correct the return type of the ->ndo_start_xmit() method in several drivers, it's netdev_tx_t but many drivers were using 'int'. From Yunjian Wang. 20) Add an ethtool interface for PHY master/slave config, from Oleksij Rempel. 21) Add BPF iterators, from Yonghang Song. 22) Add cable test infrastructure, including ethool interfaces, from Andrew Lunn. Marvell PHY driver is the first to support this facility. 23) Remove zero-length arrays all over, from Gustavo A. R. Silva. 24) Calculate and maintain an explicit frame size in XDP, from Jesper Dangaard Brouer. 25) Add CAP_BPF, from Alexei Starovoitov. 26) Support terse dumps in the packet scheduler, from Vlad Buslov. 27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei. 28) Add devm_register_netdev(), from Bartosz Golaszewski. 29) Minimize qdisc resets, from Cong Wang. 30) Get rid of kernel_getsockopt and kernel_setsockopt in order to eliminate set_fs/get_fs calls. From Christoph Hellwig. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits) selftests: net: ip_defrag: ignore EPERM net_failover: fixed rollback in net_failover_open() Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv" Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv" vmxnet3: allow rx flow hash ops only when rss is enabled hinic: add set_channels ethtool_ops support selftests/bpf: Add a default $(CXX) value tools/bpf: Don't use $(COMPILE.c) bpf, selftests: Use bpf_probe_read_kernel s390/bpf: Use bcr 0,%0 as tail call nop filler s390/bpf: Maintain 8-byte stack alignment selftests/bpf: Fix verifier test selftests/bpf: Fix sample_cnt shared between two threads bpf, selftests: Adapt cls_redirect to call csum_level helper bpf: Add csum_level helper for fixing up csum levels bpf: Fix up bpf_skb_adjust_room helper's skb csum setting sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf() crypto/chtls: IPv6 support for inline TLS Crypto/chcr: Fixes a coccinile check error Crypto/chcr: Fixes compilations warnings ...
2020-05-28ipv4: add ip_sock_set_tosChristoph Hellwig1-8/+2
Add a helper to directly set the IP_TOS sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: add tcp_sock_set_nodelayChristoph Hellwig1-10/+2
Add a helper to directly set the TCP_NODELAY sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_set_priorityChristoph Hellwig1-14/+4
Add a helper to directly set the SO_PRIORITY sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_no_lingerChristoph Hellwig1-5/+1
Add a helper to directly set the SO_LINGER sockopt from kernel space with onoff set to true and a linger time of 0 without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_set_reuseaddrChristoph Hellwig1-7/+1
Add a helper to directly set the SO_REUSEADDR sockopt from kernel space without going through a fake uaccess. For this the iscsi target now has to formally depend on inet to avoid a mostly theoretical compile failure. For actual operation it already did depend on having ipv4 or ipv6 support. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27nvmet-tcp: move send/recv error handling in the send/recv methods instead of ↵Sagi Grimberg1-19/+24
call-sites Have routines handle errors and just bail out of the poll loop. This simplifies the code and will help as we may enhance the poll loop logic and these are somewhat in the way. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-05-27nvmet-tcp: set MSG_EOR if we send last payload in the batchSagi Grimberg1-0/+2
when trying to send the pdu data digest, we should set this flag. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-05-27nvmet-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to sendSagi Grimberg1-4/+4
We can signal the stack that this is not the last page coming and the stack can build a larger tso segment, so go ahead and use it. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-03-30Merge tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-blockLinus Torvalds1-3/+32
Pull block driver updates from Jens Axboe: - floppy driver cleanup series from Willy - NVMe updates and fixes (Various) - null_blk trace improvements (Chaitanya) - bcache fixes (Coly) - md fixes (via Song) - loop block size change optimizations (Martijn) - scnprintf() use (Takashi) * tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits) null_blk: add trace in null_blk_zoned.c null_blk: add tracepoint helpers for zoned mode block: add a zone condition debug helper nvme: cleanup namespace identifier reporting in nvme_init_ns_head nvme: rename __nvme_find_ns_head to nvme_find_ns_head nvme: refactor nvme_identify_ns_descs error handling nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl nvme: Fix controller creation races with teardown flow nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl nvme: Fix ctrl use-after-free during sysfs deletion nvme-pci: Re-order nvme_pci_free_ctrl nvme: Remove unused return code from nvme_delete_ctrl_sync nvme: Use nvme_state_terminal helper nvme: release ida resources nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO nvmet-tcp: optimize tcp stack TX when data digest is used nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow nvme-multipath: do not reset on unknown status nvmet-rdma: allocate RW ctxs according to mdts ...
2020-03-26nvmet-tcp: optimize tcp stack TX when data digest is usedSagi Grimberg1-2/+5
If we have a 4-byte data digest to send to the wire, but we have more data to send, set MSG_MORE to tell the stack that more is coming. Reviewed-by: Mark Wunderlich <mark.wunderlich@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-03-26nvmet-tcp: fix maxh2cdata icresp parameterSagi Grimberg1-1/+1
MAXH2CDATA is not zero based. Also no reason to limit ourselves to 1M transfers as we can do more easily. Make this an arbitrary limit of 16M. Reported-by: Wenhua Liu <liuw@vmware.com> Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-03-21nvmet-tcp: set MSG_MORE only if we actually have more to sendSagi Grimberg1-3/+9
When we send PDU data, we want to optimize the tcp stack operation if we have more data to send. So when we set MSG_MORE when: - We have more fragments coming in the batch, or - We have a more data to send in this PDU - We don't have a data digest trailer - We optimize with the SUCCESS flag and omit the NVMe completion (used if sq_head pointer update is disabled) This addresses a regression in QD=1 with SUCCESS flag optimization as we unconditionally set MSG_MORE when we didn't actually have more data to send. Fixes: 70583295388a ("nvmet-tcp: implement C2HData SUCCESS optimization") Reported-by: Mark Wunderlich <mark.wunderlich@intel.com> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-03-04nvmet-tcp: set SO_PRIORITY for accepted socketsWunderlich, Mark1-0/+26
Enable ability to associate all sockets related to NVMf TCP traffic to a priority group that will perform optimized network processing for this traffic class. Maintain initial default behavior of using priority of zero. Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-04nvmet: Open code nvmet_req_execute()Christoph Hellwig1-3/+3
Now that nvmet_req_execute does nothing, open code it. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [split patch, update changelog] Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvmet-tcp: Don't set the request's data_lenLogan Gunthorpe1-4/+2
It's not apprporiate for the transports to set the data_len field of the request which is only used by the core. In this case, just use a variable on the stack to store the length of the sgl for comparison. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvmet-tcp: Don't check data_len in nvmet_tcp_map_data()Logan Gunthorpe1-1/+1
None of the other transports check data_len which is verified in core code. The function should instead check that the sgl length is non-zero. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-25nvmet-tcp: remove superflous check on request sglSagi Grimberg1-8/+4
Now that sgl_free is null safe, drop the superflous check. Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-08-29nvmet-tcp: Add TOS for tcp transportIsrael Rukshin1-0/+11
Set the outgoing packets type of service (TOS) according to the receiving TOS. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Suggested-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-08-29nvmet-tcp: fix possible memory leakSagi Grimberg1-0/+1
when we uninit a command in error flow we also need to free an iovec if it was allocated. Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-08-29nvmet-tcp: fix possible NULL derefSagi Grimberg1-4/+8
We must only call sgl_free for sgl that we actually allocated. Signed-off-by: Sagi Grimberg <sagi@grimberg.me>