linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2022-07-18	ublk_drv: fix an IS_ERR() vs NULL check	Dan Carpenter	1	-2/+2
	The blk_mq_alloc_disk_for_queue() doesn't return error pointers, it returns NULL on error. Fixes: cebbe577cb17 ("ublk_drv: fix request queue leak") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/YtVAgedTsQVK1oTM@kili Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-18	ublk: remove UBLK_IO_F_INTEGRITY	Christoph Hellwig	1	-3/+0
	The ublk protocol has no mechanism to actually transfer the integrity metadata, so don't define this flag, which requires that an integrity payload is attached to a bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220718063013.335531-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-18	ublk_drv: remove unneeded semicolon	Yang Li	1	-2/+2
	Eliminate the following coccicheck warnings: ./drivers/block/ublk_drv.c:1467:2-3: Unneeded semicolon ./drivers/block/ublk_drv.c:1528:2-3: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220718015431.40185-1-yang.lee@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-18	ublk_drv: fix missing error return code in ublk_add_dev()	Yang Yingliang	1	-1/+3
	If blk_mq_init_queue() fails, it should return error code in ublk_add_dev() Fixes: cebbe577cb17 ("ublk_drv: fix request queue leak") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220718042408.3132835-1-yangyingliang@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-17	zram: fix unused 'zram_wb_devops' warning	Kefeng Wang	1	-0/+2
	drivers/block/zram/zram_drv.c:55:45: warning: 'zram_wb_devops' defined but not used [-Wunused-const-variable=] Fix the above warning if CONFIG_ZRAM_WRITEBACK not enabled. Link: https://lkml.kernel.org/r/20220608072534.68850-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-16	ublk_drv: fix build warning with -Wmaybe-uninitialized and one sparse warning	Ming Lei	1	-4/+2
	After applying -Wmaybe-uninitialized manually, two build warnings are triggered: drivers/block/ublk_drv.c:940:11: warning: ‘io’ may be used uninitialized [-Wmaybe-uninitialized] 940 \| io->flags &= ~UBLK_IO_FLAG_ACTIVE; drivers/block/ublk_drv.c: In function ‘ublk_ctrl_uring_cmd’: drivers/block/ublk_drv.c:1531:9: warning: ‘ret’ may be used uninitialized [-Wmaybe-uninitialized] Fix the 1st one by removing 'io->flags &= ~UBLK_IO_FLAG_ACTIVE;' which isn't needed since the function always return successfully after setting this flag. Fix the 2nd one by always initializing 'ret'. Also fix another sparse warning of 'sparse: sparse: incorrect type in return expression' by changing return type of ublk_setup_iod(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220716095344.222674-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	block/zram: Use enum req_op where appropriate	Bart Van Assche	1	-1/+1
	Improve static type checking by using the enum req_op type where appropriate. Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-19-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	xen-blkback: Use the enum req_op and blk_opf_t types	Bart Van Assche	1	-3/+3
	Improve static type checking by using the enum req_op type for request operations and the new blk_opf_t type for request flags. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-18-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	block/rnbd: Use blk_opf_t where appropriate	Bart Van Assche	1	-3/+4
	Improve static type checking by using the new blk_opf_t type to represent the combination of a request and request flags. Acked-by: Jack Wang <jinpu.wang@ionos.com> Cc: Md. Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-17-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	block/floppy: Fix a sparse warning	Bart Van Assche	1	-1/+1
	Since the type of request.cmd_flags has been changed from u32 into blk_opf_t, use the __force keyword when casting to an integer type to prevent that sparse warns about this cast. Cc: Denis Efremov <efremov@linux.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-16-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	block/drbd: Combine two drbd_submit_peer_request() arguments	Bart Van Assche	3	-11/+9
	Combine the drbd_submit_peer_request() 'op' and 'op_flags' arguments into a single argument. This patch does not change any functionality. Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-15-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	block/drbd: Use the enum req_op and blk_opf_t types	Bart Van Assche	4	-13/+15
	Improve static type checking by using the enum req_op type for variables that represent a request operation and the new blk_opf_t type for variables that represent request flags. Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-14-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	block/brd: Use the enum req_op type	Bart Van Assche	1	-1/+1
	Improve static type checking by using the enum req_op type for a function argument that represents a request operation. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-13-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	block: Change the type of req_op() and bio_op() into enum req_op	Bart Van Assche	1	-0/+2
	Improve static type checking by changing the type of the value returned by req_op() and bio_op() from unsigned int into enum req_op. Insert 'default: break;' in switch statements on the enum req_op type to prevent that the compiler warns about these switch statements. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-5-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	block: Change the type of the last .rw_page() argument	Bart Van Assche	2	-2/+2
	All .rw_page() callers pass an enum req_op value as last argument. Make this explicit by changing the type of the last argument into enum req_op. See also commit 3f289dcb4b26 ("block: make bdev_ops->rw_page() take a REQ_OP instead of bool"). Cc: Tejun Heo <tj@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-4-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	treewide: Rename enum req_opf into enum req_op	Bart Van Assche	4	-15/+12
	The type name enum req_opf is misleading since it suggests that values of this type include both an operation type and flags. Since values of this type represent an operation only, change the type name into enum req_op. Convert the enum req_op documentation into kernel-doc format. Move a few definitions such that the enum req_op documentation occurs just above the enum req_op definition. The name "req_opf" was introduced by commit ef295ecf090d ("block: better op and flags encoding"). Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	rnbd-srv: remove the name field from struct rnbd_dev	Christoph Hellwig	5	-12/+7
	Just print the block device name directly using the %pg format specifier. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220713055317.1888500-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	pktcdvd: stop using bdevname in pkt_new_dev	Christoph Hellwig	1	-4/+2
	Just use the %pg format specifier instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220713055317.1888500-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	pktcdvd: stop using bdevname in pkt_seq_show	Christoph Hellwig	1	-3/+1
	Just use the %pg format specifier instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220713055317.1888500-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	drbd: stop using bdevname in drbd_report_io_error	Christoph Hellwig	1	-4/+2
	Just use the %pg format specifier instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220713055317.1888500-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	ublk_drv: fix request queue leak	Ming Lei	1	-6/+15
	Call blk_cleanup_queue() in release code path for fixing request queue leak. Also for-5.20/block has cleaned up blk_cleanup_queue(), which is basically merged to del_gendisk() if blk_mq_alloc_disk() is used for allocating disk and queue. However, ublk may not add disk in case of starting device failure, then del_gendisk() won't be called when removing ublk device, so blk_mq_exit_queue will not be callsed, and it can be bit hard to deal with this kind of merge conflict. Turns out ublk's queue/disk use model is very similar with scsi, so switch to scsi's model by allocating disk and queue independently, then it can be quite easy to handle v5.20 merge conflict by replacing blk_cleanup_queue with blk_mq_destroy_queue. Reported-by: Jens Axboe <axboe@kernel.dk> Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220714103201.131648-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	ublk_drv: support to complete io command via task_work_add	Ming Lei	1	-8/+67
	Use task_work_add if it is available, since task_work_add can bring up better performance, especially batching signaling ->ubq_daemon can be done. It is observed that task_work_add() can boost iops by +4% on random 4k io test. Also except for completing io command, all other code paths are same with completing io command via io_uring_cmd_complete_in_task. Meantime add one flag of UBLK_F_URING_CMD_COMP_IN_TASK for comparing the mode easily. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220713140711.97356-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14	ublk_drv: add io_uring based userspace block driver	Ming Lei	3	-0/+1541
	This is the driver part of userspace block driver(ublk driver), the other part is userspace daemon part(ublksrv)[1]. The two parts communicate by io_uring's IORING_OP_URING_CMD with one shared cmd buffer for storing io command, and the buffer is read only for ublksrv, each io command is indexed by io request tag directly, and is written by ublk driver. For example, when one READ io request is submitted to ublk block driver, ublk driver stores the io command into cmd buffer first, then completes one IORING_OP_URING_CMD for notifying ublksrv, and the URING_CMD is issued to ublk driver beforehand by ublksrv for getting notification of any new io request, and each URING_CMD is associated with one io request by tag. After ublksrv gets the io command, it translates and handles the ublk io request, such as, for the ublk-loop target, ublksrv translates the request into same request on another file or disk, like the kernel loop block driver. In ublksrv's implementation, the io is still handled by io_uring, and share same ring with IORING_OP_URING_CMD command. When the target io request is done, the same IORING_OP_URING_CMD is issued to ublk driver for both committing io request result and getting future notification of new io request. Another thing done by ublk driver is to copy data between kernel io request and ublksrv's io buffer: 1) before ubsrv handles WRITE request, copy the request's data into ublksrv's userspace io buffer, so that ublksrv can handle the write request 2) after ubsrv handles READ request, copy ublksrv's userspace io buffer into this READ request, then ublk driver can complete the READ request Zero copy may be switched if mm is ready to support it. ublk driver doesn't handle any logic of the specific user space driver, so it is small/simple enough. [1] ublksrv https://github.com/ming1/ubdsrv Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220713140711.97356-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06	block: move zone related fields to struct gendisk	Christoph Hellwig	1	-1/+1
	Move the zone related fields that are currently stored in struct request_queue to struct gendisk as these are part of the highlevel block layer API and are only used for non-passthrough I/O that requires the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06	block: replace blkdev_nr_zones with bdev_nr_zones	Christoph Hellwig	1	-1/+1
	Pass a block_device instead of a request_queue as that is what most callers have at hand. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06	block: pass a gendisk to blk_queue_max_open_zones and blk_queue_max_active_zones	Christoph Hellwig	1	-2/+2
	Switch to a gendisk based API in preparation for moving all zone related fields from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06	block: pass a gendisk to blk_queue_set_zoned	Christoph Hellwig	1	-1/+1
	Prepare for storing the zone related field in struct gendisk instead of struct request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06	blk-mq: Drop 'reserved' arg of busy_tag_iter_fn	John Garry	2	-3/+3
	We no longer use the 'reserved' arg in busy_tag_iter_fn for any iter function so it may be dropped. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> #nvme Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/1657109034-206040-6-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06	blk-mq: Drop blk_mq_ops.timeout 'reserved' arg	John Garry	3	-6/+4
	With new API blk_mq_is_reserved_rq() we can tell if a request is from the reserved pool, so stop passing 'reserved' arg. There is actually only a single user of that arg for all the callback implementations, which can use blk_mq_is_reserved_rq() instead. This will also allow us to stop passing the same 'reserved' around the blk-mq iter functions next. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/1657109034-206040-4-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-03	zram: do not lookup algorithm in backends table	Sergey Senozhatsky	1	-6/+5
	Always use crypto_has_comp() so that crypto can lookup module, call usermodhelper to load the modules, wait for usermodhelper to finish and so on. Otherwise crypto will do all of these steps under CPU hot-plug lock and this looks like too much stuff to handle under the CPU hot-plug lock. Besides this can end up in a deadlock when usermodhelper triggers a code path that attempts to lock the CPU hot-plug lock, that zram already holds. An example of such deadlock: - path A. zram grabs CPU hot-plug lock, execs /sbin/modprobe from crypto and waits for modprobe to finish disksize_store zcomp_create __cpuhp_state_add_instance __cpuhp_state_add_instance_cpuslocked zcomp_cpu_up_prepare crypto_alloc_base crypto_alg_mod_lookup call_usermodehelper_exec wait_for_completion_killable do_wait_for_common schedule - path B. async work kthread that brings in scsi device. It wants to register CPUHP states at some point, and it needs the CPU hot-plug lock for that, which is owned by zram. async_run_entry_fn scsi_probe_and_add_lun scsi_mq_alloc_queue blk_mq_init_queue blk_mq_init_allocated_queue blk_mq_realloc_hw_ctxs __cpuhp_state_add_instance __cpuhp_state_add_instance_cpuslocked mutex_lock schedule - path C. modprobe sleeps, waiting for all aync works to finish. load_module do_init_module async_synchronize_full async_synchronize_cookie_domain schedule [senozhatsky@chromium.org: add comment] Link: https://lkml.kernel.org/r/20220624060606.1014474-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20220622023501.517125-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-01	xen/blkfront: force data bouncing when backend is untrusted	Roger Pau Monne	1	-15/+34
	Split the current bounce buffering logic used with persistent grants into it's own option, and allow enabling it independently of persistent grants. This allows to reuse the same code paths to perform the bounce buffering required to avoid leaking contiguous data in shared pages not part of the request fragments. Reporting whether the backend is to be trusted can be done using a module parameter, or from the xenstore frontend path as set by the toolstack when adding the device. This is CVE-2022-33742, part of XSA-403. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2022-07-01	xen/blkfront: fix leaking data in shared pages	Roger Pau Monne	1	-2/+3
	When allocating pages to be used for shared communication with the backend always zero them, this avoids leaking unintended data present on the pages. This is CVE-2022-26365, part of XSA-403. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2022-06-28	block: remove blk_cleanup_disk	Christoph Hellwig	28	-45/+45
	blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28	block: simplify disk shutdown	Christoph Hellwig	7	-9/+3
	Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for all disks that do not have separately allocated queues, and thus remove the need to call blk_cleanup_queue for them. Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that this function is intended only for separately allocated blk-mq queues. This saves an extra queue freeze for devices without a separately allocated queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28	mtip32xx: fix device removal	Christoph Hellwig	2	-114/+44
	Use the proper helper to mark a surpise removal, remove the gendisk as soon as possible when removing the device and implement the ->free_disk callback to ensure the private data is alive as long as the gendisk has references. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28	mtip32xx: remove the device_status debugfs file	Christoph Hellwig	2	-144/+1
	This file is a huge mess that iterates over all devices and is in the way of fixing the device removal in this driver, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-21	xen-blkfront: Handle NULL gendisk	Jason Andryuk	1	-7/+12
	When a VBD is not fully created and then closed, the kernel can have a NULL pointer dereference: The reproducer is trivial: [user@dom0 ~]$ sudo xl block-attach work backend=sys-usb vdev=xvdi target=/dev/sdz [user@dom0 ~]$ xl block-list work Vdev BE handle state evt-ch ring-ref BE-path 51712 0 241 4 -1 -1 /local/domain/0/backend/vbd/241/51712 51728 0 241 4 -1 -1 /local/domain/0/backend/vbd/241/51728 51744 0 241 4 -1 -1 /local/domain/0/backend/vbd/241/51744 51760 0 241 4 -1 -1 /local/domain/0/backend/vbd/241/51760 51840 3 241 3 -1 -1 /local/domain/3/backend/vbd/241/51840 ^ note state, the /dev/sdz doesn't exist in the backend [user@dom0 ~]$ sudo xl block-detach work xvdi [user@dom0 ~]$ xl block-list work Vdev BE handle state evt-ch ring-ref BE-path work is an invalid domain identifier And its console has: BUG: kernel NULL pointer dereference, address: 0000000000000050 PGD 80000000edebb067 P4D 80000000edebb067 PUD edec2067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 52 Comm: xenwatch Not tainted 5.16.18-2.43.fc32.qubes.x86_64 #1 RIP: 0010:blk_mq_stop_hw_queues+0x5/0x40 Code: 00 48 83 e0 fd 83 c3 01 48 89 85 a8 00 00 00 41 39 5c 24 50 77 c0 5b 5d 41 5c 41 5d c3 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 <8b> 47 50 85 c0 74 32 41 54 49 89 fc 55 53 31 db 49 8b 44 24 48 48 RSP: 0018:ffffc90000bcfe98 EFLAGS: 00010293 RAX: ffffffffc0008370 RBX: 0000000000000005 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000000 RBP: ffff88800775f000 R08: 0000000000000001 R09: ffff888006e620b8 R10: ffff888006e620b0 R11: f000000000000000 R12: ffff8880bff39000 R13: ffff8880bff39000 R14: 0000000000000000 R15: ffff88800604be00 FS: 0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000050 CR3: 00000000e932e002 CR4: 00000000003706e0 Call Trace: <TASK> blkback_changed+0x95/0x137 [xen_blkfront] ? read_reply+0x160/0x160 xenwatch_thread+0xc0/0x1a0 ? do_wait_intr_irq+0xa0/0xa0 kthread+0x16b/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack nft_counter nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel xen_netfront pcspkr xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn ipmi_devintf ipmi_msghandler fuse bpf_preload ip_tables overlay xen_blkfront CR2: 0000000000000050 ---[ end trace 7bc9597fd06ae89d ]--- RIP: 0010:blk_mq_stop_hw_queues+0x5/0x40 Code: 00 48 83 e0 fd 83 c3 01 48 89 85 a8 00 00 00 41 39 5c 24 50 77 c0 5b 5d 41 5c 41 5d c3 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 <8b> 47 50 85 c0 74 32 41 54 49 89 fc 55 53 31 db 49 8b 44 24 48 48 RSP: 0018:ffffc90000bcfe98 EFLAGS: 00010293 RAX: ffffffffc0008370 RBX: 0000000000000005 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000000 RBP: ffff88800775f000 R08: 0000000000000001 R09: ffff888006e620b8 R10: ffff888006e620b0 R11: f000000000000000 R12: ffff8880bff39000 R13: ffff8880bff39000 R14: 0000000000000000 R15: ffff88800604be00 FS: 0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000050 CR3: 00000000e932e002 CR4: 00000000003706e0 Kernel panic - not syncing: Fatal exception Kernel Offset: disabled info->rq and info->gd are only set in blkfront_connect(), which is called for state 4 (XenbusStateConnected). Guard against using NULL variables in blkfront_closing() to avoid the issue. The rest of blkfront_closing looks okay. If info->nr_rings is 0, then for_each_rinfo won't do anything. blkfront_remove also needs to check for non-NULL pointers before cleaning up the gendisk and request queue. Fixes: 05d69d950d9d "xen-blkfront: sanitize the removal state machine" Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220601195341.28581-1-jandryuk@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-06-04	Merge tag 'for-linus-5.19-rc1b-tag' of ↵	Linus Torvalds	1	-3/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: "Two cleanup patches for Xen related code and (more important) an update of MAINTAINERS for Xen, as Boris Ostrovsky decided to step down" * tag 'for-linus-5.19-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: replace xen_remap() with memremap() MAINTAINERS: Update Xen maintainership xen: switch gnttab_end_foreign_access() to take a struct page pointer
2022-06-03	Merge tag 'for-5.19/drivers-2022-06-02' of git://git.kernel.dk/linux-block	Linus Torvalds	5	-59/+82
	Pull more block driver updates from Jens Axboe: "A collection of stragglers that were late on sending in their changes and just followup fixes. - NVMe fixes pull request via Christoph: - set controller enable bit in a separate write (Niklas Cassel) - disable namespace identifiers for the MAXIO MAP1001 (Christoph) - fix a comment typo (Julia Lawall)" - MD fixes pull request via Song: - Remove uses of bdevname (Christoph Hellwig) - Bug fixes (Guoqing Jiang, and Xiao Ni) - bcache fixes series (Coly) - null_blk zoned write fix (Damien) - nbd fixes (Yu, Zhang) - Fix for loop partition scanning (Christoph)" * tag 'for-5.19/drivers-2022-06-02' of git://git.kernel.dk/linux-block: (23 commits) block: null_blk: Fix null_zone_write() nvmet: fix typo in comment nvme: set controller enable bit in a separate write nvme-pci: disable namespace identifiers for the MAXIO MAP1001 bcache: avoid unnecessary soft lockup in kworker update_writeback_rate() nbd: use pr_err to output error message nbd: fix possible overflow on 'first_minor' in nbd_dev_add() nbd: fix io hung while disconnecting device nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed nbd: fix race between nbd_alloc_config() and module removal nbd: call genl_unregister_family() first in nbd_cleanup() md: bcache: check the return value of kzalloc() in detached_dev_do_request() bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() block, loop: support partitions without scanning bcache: avoid journal no-space deadlock by reserving 1 journal bucket bcache: remove incremental dirty sector counting for bch_sectors_dirty_init() bcache: improve multithreaded bch_sectors_dirty_init() bcache: improve multithreaded bch_btree_check() md: fix double free of io_acct_set bioset md: Don't set mddev private to NULL in raid0 pers->free ...
2022-06-03	Merge tag 'for-5.19/block-exec-2022-06-02' of git://git.kernel.dk/linux-block	Linus Torvalds	1	-2/+2
	Pull block request execute cleanups from Jens Axboe: "This change was advertised in the initial core block pull request, but didn't actually make that branch as we deferred it to a post-merge pull request to avoid a bunch of cross branch issues. This series cleans up the block execute path quite nicely" * tag 'for-5.19/block-exec-2022-06-02' of git://git.kernel.dk/linux-block: blk-mq: remove the done argument to blk_execute_rq_nowait blk-mq: avoid a mess of casts for blk_end_sync_rq blk-mq: remove __blk_execute_rq_nowait
2022-06-03	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds	1	-19/+201
	Pull virtio updates from Michael Tsirkin: "vhost,virtio and vdpa features, fixes, and cleanups: - mac vlan filter and stats support in mlx5 vdpa - irq hardening in virtio - performance improvements in virtio crypto - polling i/o support in virtio blk - ASID support in vhost - fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits) vdpa: ifcvf: set pci driver data in probe vdpa/mlx5: Add RX MAC VLAN filter support vdpa/mlx5: Remove flow counter from steering vhost: rename vhost_work_dev_flush vhost-test: drop flush after vhost_dev_cleanup vhost-scsi: drop flush after vhost_dev_cleanup vhost_vsock: simplify vhost_vsock_flush() vhost_test: remove vhost_test_flush_vq() vhost_net: get rid of vhost_net_flush_vq() and extra flush calls vhost: flush dev once during vhost_dev_stop vhost: get rid of vhost_poll_flush() wrapper vhost-vdpa: return -EFAULT on copy_to_user() failure vdpasim: Off by one in vdpasim_set_group_asid() virtio: Directly use ida_alloc()/free() virtio: use WARN_ON() to warning illegal status value virtio: harden vring IRQ virtio: allow to unbreak virtqueue virtio-ccw: implement synchronize_cbs() virtio-mmio: implement synchronize_cbs() virtio-pci: implement synchronize_cbs() ...
2022-06-02	Merge tag 'ceph-for-5.19-rc1' of https://github.com/ceph/ceph-client	Linus Torvalds	1	-7/+6
	Pull ceph updates from Ilya Dryomov: "A big pile of assorted fixes and improvements for the filesystem with nothing in particular standing out, except perhaps that the fact that the MDS never really maintained atime was made official and thus it's no longer updated on the client either. We also have a MAINTAINERS update: Jeff is transitioning his filesystem maintainership duties to Xiubo" * tag 'ceph-for-5.19-rc1' of https://github.com/ceph/ceph-client: (23 commits) MAINTAINERS: move myself from ceph "Maintainer" to "Reviewer" ceph: fix decoding of client session messages flags ceph: switch TASK_INTERRUPTIBLE to TASK_KILLABLE ceph: remove redundant variable ino ceph: try to queue a writeback if revoking fails ceph: fix statfs for subdir mounts ceph: fix possible deadlock when holding Fwb to get inline_data ceph: redirty the page for writepage on failure ceph: try to choose the auth MDS if possible for getattr ceph: disable updating the atime since cephfs won't maintain it ceph: flush the mdlog for filesystem sync ceph: rename unsafe_request_wait() libceph: use swap() macro instead of taking tmp variable ceph: fix statx AT_STATX_DONT_SYNC vs AT_STATX_FORCE_SYNC check ceph: no need to invalidate the fscache twice ceph: replace usage of found with dedicated list iterator variable ceph: use dedicated list iterator variable ceph: update the dlease for the hashed dentry when removing ceph: stop retrying the request when exceeding 256 times ceph: stop forwarding the request when exceeding 256 times ...
2022-06-02	block: null_blk: Fix null_zone_write()	Damien Le Moal	3	-9/+10
	The bio and rq fields of struct nullb_cmd are now overlapping in a union. So we cannot use a test on ->bio being non-NULL to detect the NULL_Q_BIO queue mode. null_zone_write() use such broken test to set the sector position of a zone append write in the command bio or request. When the null_blk device uses the NULL_Q_MQ queue mode, null_zone_write() wrongly end up setting the bio sector position, resulting in the command request to be broken and random crashes following. Fix this by testing the device queue mode directly. Fixes: 8ba816b23abd ("null-blk: save memory footprint for struct nullb_cmd") Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220602120344.1365329-1-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-31	virtio-blk: support mq_ops->queue_rqs()	Suwan Kim	1	-16/+98
	This patch supports mq_ops->queue_rqs() hook. It has an advantage of batch submission to virtio-blk driver. It also helps polling I/O because polling uses batched completion of block layer. Batch submission in queue_rqs() can boost polling performance. In queue_rqs(), it iterates plug->mq_list, collects requests that belong to same HW queue until it encounters a request from other HW queue or sees the end of the list. Then, virtio-blk adds requests into virtqueue and kicks virtqueue to submit requests. If there is an error, it inserts error request to requeue_list and passes it to ordinary block layer path. For verification, I did fio test. (io_uring, randread, direct=1, bs=4K, iodepth=64 numjobs=N) I set 4 vcpu and 2 virtio-blk queues for VM and run fio test 5 times. It shows about 2% improvement. \| numjobs=2 \| numjobs=4 ----------------------------------------------------------- fio without queue_rqs() \| 291K IOPS \| 238K IOPS ----------------------------------------------------------- fio with queue_rqs() \| 295K IOPS \| 243K IOPS For polling I/O performance, I also did fio test as below. (io_uring, hipri, randread, direct=1, bs=512, iodepth=64 numjobs=4) I set 4 vcpu and 2 poll queues for VM. It shows about 2% improvement in polling I/O. \| IOPS \| avg latency ----------------------------------------------------------- fio poll without queue_rqs() \| 424K \| 613.05 usec ----------------------------------------------------------- fio poll with queue_rqs() \| 435K \| 601.01 usec Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> Message-Id: <20220406153207.163134-3-suwan.kim027@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-05-31	virtio-blk: support polling I/O	Suwan Kim	1	-3/+103
	This patch supports polling I/O via virtio-blk driver. Polling feature is enabled by module parameter "poll_queues" and it sets dedicated polling queues for virtio-blk. This patch improves the polling I/O throughput and latency. The virtio-blk driver doesn't not have a poll function and a poll queue and it has been operating in interrupt driven method even if the polling function is called in the upper layer. virtio-blk polling is implemented upon 'batched completion' of block layer. virtblk_poll() queues completed request to io_comp_batch->req_list and later, virtblk_complete_batch() calls unmap function and ends the requests in batch. virtio-blk reads the number of poll queues from module parameter "poll_queues". If VM sets queue parameter as below, ("num-queues=N" [QEMU property], "poll_queues=M" [module parameter]) It allocates N virtqueues to virtio_blk->vqs[N] and it uses [0..(N-M-1)] as default queues and [(N-M)..(N-1)] as poll queues. Unlike the default queues, the poll queues have no callback function. Regarding HW-SW queue mapping, the default queue mapping uses the existing method that condsiders MSI irq vector. But the poll queue doesn't have an irq, so it uses the regular blk-mq cpu mapping. For verifying the improvement, I did Fio polling I/O performance test with io_uring engine with the options below. (io_uring, hipri, randread, direct=1, bs=512, iodepth=64 numjobs=N) I set 4 vcpu and 4 virtio-blk queues - 2 default queues and 2 poll queues for VM. As a result, IOPS and average latency improved about 10%. Test result: - Fio io_uring poll without virtio-blk poll support -- numjobs=1 : IOPS = 339K, avg latency = 188.33us -- numjobs=2 : IOPS = 367K, avg latency = 347.33us -- numjobs=4 : IOPS = 383K, avg latency = 682.06us - Fio io_uring poll with virtio-blk poll support -- numjobs=1 : IOPS = 385K, avg latency = 165.94us -- numjobs=2 : IOPS = 408K, avg latency = 313.28us -- numjobs=4 : IOPS = 424K, avg latency = 613.05us Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> Message-Id: <20220406153207.163134-2-suwan.kim027@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-05-28	blk-mq: remove the done argument to blk_execute_rq_nowait	Christoph Hellwig	1	-2/+2
	Let the caller set it together with the end_io_data instead of passing a pointless argument. Note the the target code did in fact already set it and then just overrode it again by calling blk_execute_rq_nowait. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220524121530.943123-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-27	nbd: use pr_err to output error message	Yu Kuai	1	-22/+18
	Instead of using the long printk(KERN_ERR "nbd: ...") to output error message, defining pr_fmt and using the short pr_err("") to do that. The replacemen is done by using the following command: sed -i 's/printk(KERN_ERR "nbd: /pr_err("/g' \ drivers/block/nbd.c This patch also rewrap to 80 columns where possible. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220521073749.3146892-7-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-27	nbd: fix possible overflow on 'first_minor' in nbd_dev_add()	Zhang Wensheng	1	-11/+12
	When 'index' is a big numbers, it may become negative which forced to 'int'. then 'index << part_shift' might overflow to a positive value that is not greater than '0xfffff', then sysfs might complains about duplicate creation. Because of this, move the 'index' judgment to the front will fix it and be better. Fixes: b0d9111a2d53 ("nbd: use an idr to keep track of nbd devices") Fixes: 940c264984fd ("nbd: fix possible overflow for 'first_minor' in nbd_dev_add()") Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220521073749.3146892-6-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-27	nbd: fix io hung while disconnecting device	Yu Kuai	1	-1/+1
	In our tests, "qemu-nbd" triggers a io hung: INFO: task qemu-nbd:11445 blocked for more than 368 seconds. Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca #884 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:qemu-nbd state:D stack: 0 pid:11445 ppid: 1 flags:0x00000000 Call Trace: <TASK> __schedule+0x480/0x1050 ? _raw_spin_lock_irqsave+0x3e/0xb0 schedule+0x9c/0x1b0 blk_mq_freeze_queue_wait+0x9d/0xf0 ? ipi_rseq+0x70/0x70 blk_mq_freeze_queue+0x2b/0x40 nbd_add_socket+0x6b/0x270 [nbd] nbd_ioctl+0x383/0x510 [nbd] blkdev_ioctl+0x18e/0x3e0 __x64_sys_ioctl+0xac/0x120 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fd8ff706577 RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577 RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0 R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0 "qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following message was found: block nbd0: Send disconnect failed -32 Which indicate that something is wrong with the server. Then, "qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear requests after commit 2516ab1543fd("nbd: only clear the queue on device teardown"). And in the meantime, request can't complete through timeout because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which means such request will never be completed in this situation. Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't complete multiple times, switch back to call nbd_clear_sock() in nbd_clear_sock_ioctl(), so that inflight requests can be cleared. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220521073749.3146892-5-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-27	nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed	Yu Kuai	1	-4/+14
	Otherwise io will hung because request will only be completed if the cmd has the flag 'NBD_CMD_INFLIGHT'. Fixes: 07175cb1baf4 ("nbd: make sure request completion won't concurrent") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20220521073749.3146892-4-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>