summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2022-11-25io_uring/poll: fix poll_refs race with cancelationLin Ma1-1/+2
There is an interesting race condition of poll_refs which could result in a NULL pointer dereference. The crash trace is like: KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 30781 Comm: syz-executor.2 Not tainted 6.0.0-g493ffd6605b2 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:io_poll_remove_entry io_uring/poll.c:154 [inline] RIP: 0010:io_poll_remove_entries+0x171/0x5b4 io_uring/poll.c:190 Code: ... RSP: 0018:ffff88810dfefba0 EFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000040000 RDX: ffffc900030c4000 RSI: 000000000003ffff RDI: 0000000000040000 RBP: 0000000000000008 R08: ffffffff9764d3dd R09: fffffbfff3836781 R10: fffffbfff3836781 R11: 0000000000000000 R12: 1ffff11003422d60 R13: ffff88801a116b04 R14: ffff88801a116ac0 R15: dffffc0000000000 FS: 00007f9c07497700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffb5c00ea98 CR3: 0000000105680005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> io_apoll_task_func+0x3f/0xa0 io_uring/poll.c:299 handle_tw_list io_uring/io_uring.c:1037 [inline] tctx_task_work+0x37e/0x4f0 io_uring/io_uring.c:1090 task_work_run+0x13a/0x1b0 kernel/task_work.c:177 get_signal+0x2402/0x25a0 kernel/signal.c:2635 arch_do_signal_or_restart+0x3b/0x660 arch/x86/kernel/signal.c:869 exit_to_user_mode_loop kernel/entry/common.c:166 [inline] exit_to_user_mode_prepare+0xc2/0x160 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline] syscall_exit_to_user_mode+0x58/0x160 kernel/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x63/0xcd The root cause for this is a tiny overlooking in io_poll_check_events() when cocurrently run with poll cancel routine io_poll_cancel_req(). The interleaving to trigger use-after-free: CPU0 | CPU1 | io_apoll_task_func() | io_poll_cancel_req() io_poll_check_events() | // do while first loop | v = atomic_read(...) | // v = poll_refs = 1 | ... | io_poll_mark_cancelled() | atomic_or() | // poll_refs = IO_POLL_CANCEL_FLAG | 1 | atomic_sub_return(...) | // poll_refs = IO_POLL_CANCEL_FLAG | // loop continue | | | io_poll_execute() | io_poll_get_ownership() | // poll_refs = IO_POLL_CANCEL_FLAG | 1 | // gets the ownership v = atomic_read(...) | // poll_refs not change | | if (v & IO_POLL_CANCEL_FLAG) | return -ECANCELED; | // io_poll_check_events return | // will go into | // io_req_complete_failed() free req | | | io_apoll_task_func() | // also go into io_req_complete_failed() And the interleaving to trigger the kernel WARNING: CPU0 | CPU1 | io_apoll_task_func() | io_poll_cancel_req() io_poll_check_events() | // do while first loop | v = atomic_read(...) | // v = poll_refs = 1 | ... | io_poll_mark_cancelled() | atomic_or() | // poll_refs = IO_POLL_CANCEL_FLAG | 1 | atomic_sub_return(...) | // poll_refs = IO_POLL_CANCEL_FLAG | // loop continue | | v = atomic_read(...) | // v = IO_POLL_CANCEL_FLAG | | io_poll_execute() | io_poll_get_ownership() | // poll_refs = IO_POLL_CANCEL_FLAG | 1 | // gets the ownership | WARN_ON_ONCE(!(v & IO_POLL_REF_MASK))) | // v & IO_POLL_REF_MASK = 0 WARN | | | io_apoll_task_func() | // also go into io_req_complete_failed() By looking up the source code and communicating with Pavel, the implementation of this atomic poll refs should continue the loop of io_poll_check_events() just to avoid somewhere else to grab the ownership. Therefore, this patch simply adds another AND operation to make sure the loop will stop if it finds the poll_refs is exactly equal to IO_POLL_CANCEL_FLAG. Since io_poll_cancel_req() grabs ownership and will finally make its way to io_req_complete_failed(), the req will be reclaimed as expected. Fixes: aa43477b0402 ("io_uring: poll rework") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> [axboe: tweak description and code style] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring/filetable: fix file reference underflowLin Ma1-2/+0
There is an interesting reference bug when -ENOMEM occurs in calling of io_install_fixed_file(). KASan report like below: [ 14.057131] ================================================================== [ 14.059161] BUG: KASAN: use-after-free in unix_get_socket+0x10/0x90 [ 14.060975] Read of size 8 at addr ffff88800b09cf20 by task kworker/u8:2/45 [ 14.062684] [ 14.062768] CPU: 2 PID: 45 Comm: kworker/u8:2 Not tainted 6.1.0-rc4 #1 [ 14.063099] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 14.063666] Workqueue: events_unbound io_ring_exit_work [ 14.063936] Call Trace: [ 14.064065] <TASK> [ 14.064175] dump_stack_lvl+0x34/0x48 [ 14.064360] print_report+0x172/0x475 [ 14.064547] ? _raw_spin_lock_irq+0x83/0xe0 [ 14.064758] ? __virt_addr_valid+0xef/0x170 [ 14.064975] ? unix_get_socket+0x10/0x90 [ 14.065167] kasan_report+0xad/0x130 [ 14.065353] ? unix_get_socket+0x10/0x90 [ 14.065553] unix_get_socket+0x10/0x90 [ 14.065744] __io_sqe_files_unregister+0x87/0x1e0 [ 14.065989] ? io_rsrc_refs_drop+0x1c/0xd0 [ 14.066199] io_ring_exit_work+0x388/0x6a5 [ 14.066410] ? io_uring_try_cancel_requests+0x5bf/0x5bf [ 14.066674] ? try_to_wake_up+0xdb/0x910 [ 14.066873] ? virt_to_head_page+0xbe/0xbe [ 14.067080] ? __schedule+0x574/0xd20 [ 14.067273] ? read_word_at_a_time+0xe/0x20 [ 14.067492] ? strscpy+0xb5/0x190 [ 14.067665] process_one_work+0x423/0x710 [ 14.067879] worker_thread+0x2a2/0x6f0 [ 14.068073] ? process_one_work+0x710/0x710 [ 14.068284] kthread+0x163/0x1a0 [ 14.068454] ? kthread_complete_and_exit+0x20/0x20 [ 14.068697] ret_from_fork+0x22/0x30 [ 14.068886] </TASK> [ 14.069000] [ 14.069088] Allocated by task 289: [ 14.069269] kasan_save_stack+0x1e/0x40 [ 14.069463] kasan_set_track+0x21/0x30 [ 14.069652] __kasan_slab_alloc+0x58/0x70 [ 14.069899] kmem_cache_alloc+0xc5/0x200 [ 14.070100] __alloc_file+0x20/0x160 [ 14.070283] alloc_empty_file+0x3b/0xc0 [ 14.070479] path_openat+0xc3/0x1770 [ 14.070689] do_filp_open+0x150/0x270 [ 14.070888] do_sys_openat2+0x113/0x270 [ 14.071081] __x64_sys_openat+0xc8/0x140 [ 14.071283] do_syscall_64+0x3b/0x90 [ 14.071466] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 14.071791] [ 14.071874] Freed by task 0: [ 14.072027] kasan_save_stack+0x1e/0x40 [ 14.072224] kasan_set_track+0x21/0x30 [ 14.072415] kasan_save_free_info+0x2a/0x50 [ 14.072627] __kasan_slab_free+0x106/0x190 [ 14.072858] kmem_cache_free+0x98/0x340 [ 14.073075] rcu_core+0x427/0xe50 [ 14.073249] __do_softirq+0x110/0x3cd [ 14.073440] [ 14.073523] Last potentially related work creation: [ 14.073801] kasan_save_stack+0x1e/0x40 [ 14.074017] __kasan_record_aux_stack+0x97/0xb0 [ 14.074264] call_rcu+0x41/0x550 [ 14.074436] task_work_run+0xf4/0x170 [ 14.074619] exit_to_user_mode_prepare+0x113/0x120 [ 14.074858] syscall_exit_to_user_mode+0x1d/0x40 [ 14.075092] do_syscall_64+0x48/0x90 [ 14.075272] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 14.075529] [ 14.075612] Second to last potentially related work creation: [ 14.075900] kasan_save_stack+0x1e/0x40 [ 14.076098] __kasan_record_aux_stack+0x97/0xb0 [ 14.076325] task_work_add+0x72/0x1b0 [ 14.076512] fput+0x65/0xc0 [ 14.076657] filp_close+0x8e/0xa0 [ 14.076825] __x64_sys_close+0x15/0x50 [ 14.077019] do_syscall_64+0x3b/0x90 [ 14.077199] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 14.077448] [ 14.077530] The buggy address belongs to the object at ffff88800b09cf00 [ 14.077530] which belongs to the cache filp of size 232 [ 14.078105] The buggy address is located 32 bytes inside of [ 14.078105] 232-byte region [ffff88800b09cf00, ffff88800b09cfe8) [ 14.078685] [ 14.078771] The buggy address belongs to the physical page: [ 14.079046] page:000000001bd520e7 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88800b09de00 pfn:0xb09c [ 14.079575] head:000000001bd520e7 order:1 compound_mapcount:0 compound_pincount:0 [ 14.079946] flags: 0x100000000010200(slab|head|node=0|zone=1) [ 14.080244] raw: 0100000000010200 0000000000000000 dead000000000001 ffff88800493cc80 [ 14.080629] raw: ffff88800b09de00 0000000080190018 00000001ffffffff 0000000000000000 [ 14.081016] page dumped because: kasan: bad access detected [ 14.081293] [ 14.081376] Memory state around the buggy address: [ 14.081618] ffff88800b09ce00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 14.081974] ffff88800b09ce80: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc [ 14.082336] >ffff88800b09cf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 14.082690] ^ [ 14.082909] ffff88800b09cf80: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc [ 14.083266] ffff88800b09d000: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb [ 14.083622] ================================================================== The actual tracing of this bug is shown below: commit 8c71fe750215 ("io_uring: ensure fput() called correspondingly when direct install fails") adds an additional fput() in io_fixed_fd_install() when io_file_bitmap_get() returns error values. In that case, the routine will never make it to io_install_fixed_file() due to an early return. static int io_fixed_fd_install(...) { if (alloc_slot) { ... ret = io_file_bitmap_get(ctx); if (unlikely(ret < 0)) { io_ring_submit_unlock(ctx, issue_flags); fput(file); return ret; } ... } ... ret = io_install_fixed_file(req, file, issue_flags, file_slot); ... } In the above scenario, the reference is okay as io_fixed_fd_install() ensures the fput() is called when something bad happens, either via bitmap or via inner io_install_fixed_file(). However, the commit 61c1b44a21d7 ("io_uring: fix deadlock on iowq file slot alloc") breaks the balance because it places fput() into the common path for both io_file_bitmap_get() and io_install_fixed_file(). Since io_install_fixed_file() handles the fput() itself, the reference underflow come across then. There are some extra commits make the current code into io_fixed_fd_install() -> __io_fixed_fd_install() -> io_install_fixed_file() However, the fact that there is an extra fput() is called if io_install_fixed_file() calls fput(). Traversing through the code, I find that the existing two callers to __io_fixed_fd_install(): io_fixed_fd_install() and io_msg_send_fd() have fput() when handling error return, this patch simply removes the fput() in io_install_fixed_file() to fix the bug. Fixes: 61c1b44a21d7 ("io_uring: fix deadlock on iowq file slot alloc") Signed-off-by: Lin Ma <linma@zju.edu.cn> Link: https://lore.kernel.org/r/be4ba4b.5d44.184a0a406a4.Coremail.linma@zju.edu.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: make poll refs more robustPavel Begunkov1-1/+35
poll_refs carry two functions, the first is ownership over the request. The second is notifying the io_poll_check_events() that there was an event but wake up couldn't grab the ownership, so io_poll_check_events() should retry. We want to make poll_refs more robust against overflows. Instead of always incrementing it, which covers two purposes with one atomic, check if poll_refs is elevated enough and if so set a retry flag without attempts to grab ownership. The gap between the bias check and following atomics may seem racy, but we don't need it to be strict. Moreover there might only be maximum 4 parallel updates: by the first and the second poll entries, __io_arm_poll_handler() and cancellation. From those four, only poll wake ups may be executed multiple times, but they're protected by a spin. Cc: stable@vger.kernel.org Reported-by: Lin Ma <linma@zju.edu.cn> Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c762bc31f8683b3270f3587691348a7119ef9c9d.1668963050.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: cmpxchg for poll arm refs releasePavel Begunkov1-5/+3
Replace atomically substracting the ownership reference at the end of arming a poll with a cmpxchg. We try to release ownership by setting 0 assuming that poll_refs didn't change while we were arming. If it did change, we keep the ownership and use it to queue a tw, which is fully capable to process all events and (even tolerates spurious wake ups). It's a bit more elegant as we reduce races b/w setting the cancellation flag and getting refs with this release, and with that we don't have to worry about any kinds of underflows. It's not the fastest path for polling. The performance difference b/w cmpxchg and atomic dec is usually negligible and it's not the fastest path. Cc: stable@vger.kernel.org Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0c95251624397ea6def568ff040cad2d7926fd51.1668963050.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: keep unlock_post inlined in hot pathPavel Begunkov1-2/+9
This partially reverts 6c16fe3c16bdc ("io_uring: kill io_cqring_ev_posted() and __io_cq_unlock_post()") The redundancy of __io_cq_unlock_post() was always to keep it inlined into __io_submit_flush_completions(). Inline it back and rename with hope of clarifying the intention behind it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/372a16c485fca44c069be2e92fc5e7332a1d7fd7.1669310258.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: don't use complete_post in kbufPavel Begunkov1-9/+5
Now we're handling IOPOLL completions more generically, get rid uses of _post() and send requests through the normal path. It may have some extra mertis performance wise, but we don't care much as there is a better interface for selected buffers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4deded706587f55b006dc33adf0c13cfc3b2319f.1669310258.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: spelling fixDylan Yudaken1-1/+1
s/pushs/pushes/ Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221125103412.1425305-3-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: remove io_req_complete_post_twDylan Yudaken2-8/+1
It's only used in one place. Inline it. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221125103412.1425305-2-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: allow multishot polled reqs to defer completionDylan Yudaken1-1/+2
Until now there was no reason for multishot polled requests to defer completions as there was no functional difference. However now this will actually defer the completions, for a performance win. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-10-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: remove overflow param from io_post_aux_cqeDylan Yudaken4-10/+13
The only call sites which would not allow overflow are also call sites which would use the io_aux_cqe as they care about ordering. So remove this parameter from io_post_aux_cqe. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-9-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: add lockdep assertion in io_fill_cqe_auxDylan Yudaken1-0/+2
Add an assertion for the completion lock to io_fill_cqe_aux Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-8-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: make io_fill_cqe_aux staticDylan Yudaken2-4/+2
This is only used in io_uring.c Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-7-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: add io_aux_cqe which allows deferred completionDylan Yudaken4-5/+42
Use the just introduced deferred post cqe completion state when possible in io_aux_cqe. If not possible fallback to io_post_aux_cqe. This introduces a complication because of allow_overflow. For deferred completions we cannot know without locking the completion_lock if it will overflow (and even if we locked it, another post could sneak in and cause this cqe to be in overflow). However since overflow protection is mostly a best effort defence in depth to prevent infinite loops of CQEs for poll, just checking the overflow bit is going to be good enough and will result in at most 16 (array size of deferred cqes) overflows. Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-6-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: allow defer completion for aux posted cqesDylan Yudaken2-3/+26
Multishot ops cannot use the compl_reqs list as the request must stay in the poll list, but that means they need to run each completion without benefiting from batching. Here introduce batching infrastructure for only small (ie 16 byte) CQEs. This restriction is ok because there are no use cases posting 32 byte CQEs. In the ring keep a batch of up to 16 posted results, and flush in the same way as compl_reqs. 16 was chosen through experimentation on a microbenchmark ([1]), as well as trying not to increase the size of the ring too much. This increases the size to 1472 bytes from 1216. [1]: https://github.com/DylanZA/liburing/commit/9ac66b36bcf4477bfafeff1c5f107896b7ae31cf Run with $ make -j && ./benchmark/reg.b -s 1 -t 2000 -r 10 Gives results: baseline 8309 k/s 8 18807 k/s 16 19338 k/s 32 20134 k/s Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-5-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: defer all io_req_complete_failedDylan Yudaken3-11/+10
All failures happen under lock now, and can be deferred. To be consistent when the failure has happened after some multishot cqe has been deferred (and keep ordering), always defer failures. To make this obvious at the caller (and to help prevent a future bug) rename io_req_complete_failed to io_req_defer_failed. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-4-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: always lock in io_apoll_task_funcDylan Yudaken1-1/+2
This is required for the failure case (io_req_complete_failed) and is missing. The alternative would be to only lock in the failure path, however all of the non-error paths in io_poll_check_events that do not do not return IOU_POLL_NO_ACTION end up locking anyway. The only extraneous lock would be for the multishot poll overflowing the CQE ring, however multishot poll would probably benefit from being locked as it will allow completions to be batched. So it seems reasonable to lock always. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-3-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25zonefs: Fix active zone accountingDamien Le Moal2-2/+15
If a file zone transitions to the offline or readonly state from an active state, we must clear the zone active flag and decrement the active seq file counter. Do so in zonefs_account_active() using the new zonefs inode flags ZONEFS_ZONE_OFFLINE and ZONEFS_ZONE_READONLY. These flags are set if necessary in zonefs_check_zone_condition() based on the result of report zones operation after an IO error. Fixes: 87c9ce3ffec9 ("zonefs: Add active seq file accounting") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
2022-11-25vfs: fix copy_file_range() averts filesystem freeze protectionAmir Goldstein4-9/+28
Commit 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs copies") removed fallback to generic_copy_file_range() for cross-fs cases inside vfs_copy_file_range(). To preserve behavior of nfsd and ksmbd server-side-copy, the fallback to generic_copy_file_range() was added in nfsd and ksmbd code, but that call is missing sb_start_write(), fsnotify hooks and more. Ideally, nfsd and ksmbd would pass a flag to vfs_copy_file_range() that will take care of the fallback, but that code would be subtle and we got vfs_copy_file_range() logic wrong too many times already. Instead, add a flag to explicitly request vfs_copy_file_range() to perform only generic_copy_file_range() and let nfsd and ksmbd use this flag only in the fallback path. This choise keeps the logic changes to minimum in the non-nfsd/ksmbd code paths to reduce the risk of further regressions. Fixes: 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs copies") Tested-by: Namjae Jeon <linkinjeon@kernel.org> Tested-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25Merge tag 'amd-drm-fixes-6.1-2022-11-23' of ↵Dave Airlie32-294/+440
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.1-2022-11-23: amdgpu: - DCN 3.1.4 fixes - DP MST DSC deadlock fixes - HMM userptr fixes - Fix Aldebaran CU occupancy reporting - GFX11 fixes - PSP suspend/resume fix - DCE12 KASAN fix - DCN 3.2.x fixes - Rotated cursor fix - SMU 13.x fix - DELL platform suspend/resume fixes - VCN4 SR-IOV fix - Display regression fix for polled connectors Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221123143453.8977-1-alexander.deucher@amd.com
2022-11-25Merge tag 'drm-intel-fixes-2022-11-24' of ↵Dave Airlie3-9/+11
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Fix GVT KVM reference count handling (Sean Christopherson) - Never purge busy TTM objects (Matthew Auld) - Fix warn in intel_display_power_*_domain() functions (Imre Deak) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Y38u44hb1LZfZC+M@tursulin-desk
2022-11-25Merge tag 'drm-misc-fixes-2022-11-24' of ↵Dave Airlie4-23/+36
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v6.1-rc7: - Another amdgpu gang submit fix. - Use dma_fence_unwrap_for_each when importing sync files. - Fix race in dma_heap_add(). - Fix use of uninitialized memory in logo. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/a5721505-4823-98ef-7d6f-0ea478221391@linux.intel.com
2022-11-24Merge tag 'net-6.1-rc7' of ↵Linus Torvalds119-483/+917
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from rxrpc, netfilter and xfrm. Current release - regressions: - dccp/tcp: fix bhash2 issues related to WARN_ON() in inet_csk_get_port() - l2tp: don't sleep and disable BH under writer-side sk_callback_lock - eth: ice: fix handling of burst tx timestamps Current release - new code bugs: - xfrm: squelch kernel warning in case XFRM encap type is not available - eth: mlx5e: fix possible race condition in macsec extended packet number update routine Previous releases - regressions: - neigh: decrement the family specific qlen - netfilter: fix ipset regression - rxrpc: fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975] - eth: iavf: do not restart tx queues after reset task failure - eth: nfp: add port from netdev validation for EEPROM access - eth: mtk_eth_soc: fix potential memory leak in mtk_rx_alloc() Previous releases - always broken: - tipc: set con sock in tipc_conn_alloc - nfc: - fix potential memory leaks - fix incorrect sizing calculations in EVT_TRANSACTION - eth: octeontx2-af: fix pci device refcount leak - eth: bonding: fix ICMPv6 header handling when receiving IPv6 messages - eth: prestera: add missing unregister_netdev() in prestera_port_create() - eth: tsnep: fix rotten packets Misc: - usb: qmi_wwan: add support for LARA-L6" * tag 'net-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) net: thunderx: Fix the ACPI memory leak octeontx2-af: Fix reference count issue in rvu_sdp_init() net: altera_tse: release phylink resources in tse_shutdown() virtio_net: Fix probe failed when modprobe virtio_net net: wwan: t7xx: Fix the ACPI memory leak octeontx2-pf: Add check for devm_kcalloc net: enetc: preserve TX ring priority across reconfiguration net: marvell: prestera: add missing unregister_netdev() in prestera_port_create() nfc: st-nci: fix incorrect sizing calculations in EVT_TRANSACTION nfc: st-nci: fix memory leaks in EVT_TRANSACTION nfc: st-nci: fix incorrect validating logic in EVT_TRANSACTION Documentation: networking: Update generic_netlink_howto URL net/cdc_ncm: Fix multicast RX support for CDC NCM devices with ZLP net: usb: qmi_wwan: add u-blox 0x1342 composition l2tp: Don't sleep and disable BH under writer-side sk_callback_lock net: dm9051: Fix missing dev_kfree_skb() in dm9051_loop_rx() arcnet: fix potential memory leak in com20020_probe() ipv4: Fix error return code in fib_table_insert() net: ethernet: mtk_eth_soc: fix memory leak in error path net: ethernet: mtk_eth_soc: fix resource leak in error path ...
2022-11-24Merge tag 'soc-fixes-6.1-4' of ↵Linus Torvalds45-110/+114
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "There are a bunch of late fixes that just came in, in particular a longer series for Rockchips devicetree files, but most of those just address cosmetic errors that were found during the binding validation. There are a couple of code changes: - A regression fix to the IXP42x PCI bus - A fix for a memory leak on optee, and another one for mach-mxs - Two fixes for the sunxi rsb bus driver, to address problems with the shutdown logic The rest are small but important devicetree fixes for a number of individual boards, addressing issues across all platforms: - arm global timer on older rockchip SoCs is unstable and needs to be disabled in favor of a more reliable clocksource - Corrections to fix bluetooth, mmc, and networking on a few Rockchip boards - at91/sam9g20ek UDC needs a pin controller config change - an omap board runs into mmc probe errors because of regulator nodes in the wrong place - imx8mp-evk has a minor inaccuracy with its pin config, but without user visible impact - The Allwinner H6 Hantro G2 video decoder needs an IOMMU reference to prevent the driver from crashing" * tag 'soc-fixes-6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (30 commits) bus: ixp4xx: Don't touch bit 7 on IXP42x ARM: dts: imx6q-prti6q: Fix ref/tcxo-clock-frequency properties arm64: dts: imx8mp-evk: correct pcie pad settings ARM: mxs: fix memory leak in mxs_machine_init() ARM: dts: at91: sam9g20ek: enable udc vbus gpio pinctrl tee: optee: fix possible memory leak in optee_register_device() arm64: dts: allwinner: h6: Add IOMMU reference to Hantro G2 media: dt-bindings: allwinner: h6-vpu-g2: Add IOMMU reference property bus: sunxi-rsb: Support atomic transfers bus: sunxi-rsb: Remove the shutdown callback ARM: dts: rockchip: disable arm_global_timer on rk3066 and rk3188 arm64: dts: rockchip: Fix Pine64 Quartz4-B PMIC interrupt ARM: dts: am335x-pcm-953: Define fixed regulators in root node ARM: dts: rockchip: rk3188: fix lcdc1-rgb24 node name arm64: dts: rockchip: fix ir-receiver node names ARM: dts: rockchip: fix ir-receiver node names arm64: dts: rockchip: fix adc-keys sub node names ARM: dts: rockchip: fix adc-keys sub node names arm: dts: rockchip: remove clock-frequency from rtc arm: dts: rockchip: fix node name for hym8563 rtc ...
2022-11-24Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds4-14/+29
Pull ARM fixes from Russell King: "Two fixes for 6.1: - fix stacktraces for tracepoint events in Thumb2 mode - fix for noMMU ZERO_PAGE() implementation" * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9266/1: mm: fix no-MMU ZERO_PAGE() implementation ARM: 9251/1: perf: Fix stacktraces for tracepoint events in THUMB2 kernels
2022-11-24Merge tag 'loongarch-fixes-6.1-2' of ↵Linus Torvalds11-73/+71
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Fix two build warnings, a copy_thread() bug, two page table manipulation bugs, and some trivial cleanups" * tag 'loongarch-fixes-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: docs/zh_CN/LoongArch: Fix wrong description of FPRs Note LoongArch: Fix unsigned comparison with less than zero LoongArch: Set _PAGE_DIRTY only if _PAGE_MODIFIED is set in {pmd,pte}_mkwrite() LoongArch: Set _PAGE_DIRTY only if _PAGE_WRITE is set in {pmd,pte}_mkdirty() LoongArch: Clear FPU/SIMD thread info flags for kernel thread LoongArch: SMP: Change prefix from loongson3 to loongson LoongArch: Combine acpi_boot_table_init() and acpi_boot_init() LoongArch: Makefile: Use "grep -E" instead of "egrep"
2022-11-24Merge tag 'ext4_for_linus_stable2' of ↵Linus Torvalds2-16/+32
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix a regression in the lazytime code that was introduced in v6.1-rc1, and a use-after-free that can be triggered by a maliciously corrupted file system" * tag 'ext4_for_linus_stable2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: fs: do not update freeing inode i_io_list ext4: fix use-after-free in ext4_ext_shift_extents
2022-11-24Merge tag 'v6.2-rockchip-dts32-1' of ↵Arnd Bergmann2-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes Disabling of the unreliable arm-global-timer on earliest Rockchip SoCs, due to its frequency being bound to the changing cpu clock. * tag 'v6.2-rockchip-dts32-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: ARM: dts: rockchip: disable arm_global_timer on rk3066 and rk3188
2022-11-24MAINTAINERS: add S390 MM sectionHeiko Carstens1-1/+10
Alexander Gordeev and Gerald Schaefer are covering the whole s390 specific memory management code. Reflect that by adding a new S390 MM section to MAINTAINERS. Also rename the S390 section to S390 ARCHITECTURE to be a bit more precise. Acked-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-11-24s390/crashdump: fix TOD programmable field sizeHeiko Carstens1-1/+1
The size of the TOD programmable field was incorrectly increased from four to eight bytes with commit 1a2c5840acf9 ("s390/dump: cleanup CPU save area handling"). This leads to an elf notes section NT_S390_TODPREG which has a size of eight instead of four bytes in case of kdump, however even worse is that the contents is incorrect: it is supposed to contain only the contents of the TOD programmable field, but in fact contains a mix of the TOD programmable field (32 bit upper bits) and parts of the CPU timer register (lower 32 bits). Fix this by simply changing the size of the todpreg field within the save area structure. This will implicitly also fix the size of the corresponding elf notes sections. This also gets rid of this compile time warning: in function ‘fortify_memcpy_chk’, inlined from ‘save_area_add_regs’ at arch/s390/kernel/crash_dump.c:99:2: ./include/linux/fortify-string.h:413:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 413 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 1a2c5840acf9 ("s390/dump: cleanup CPU save area handling") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-11-24net: thunderx: Fix the ACPI memory leakYu Liao1-1/+3
The ACPI buffer memory (string.pointer) should be freed as the buffer is not used after returning from bgx_acpi_match_id(), free it to prevent memory leak. Fixes: 46b903a01c05 ("net, thunder, bgx: Add support to get MAC address from ACPI.") Signed-off-by: Yu Liao <liaoyu15@huawei.com> Link: https://lore.kernel.org/r/20221123082237.1220521-1-liaoyu15@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-24perf: Consider OS filter failPeter Zijlstra1-2/+22
Some PMUs (notably the traditional hardware kind) have boundary issues with the OS filter. Specifically, it is possible for perf_event_attr::exclude_kernel=1 events to trigger in-kernel due to SKID or errata. This can upset the sigtrap logic some and trigger the WARN. However, if this invalid sample is the first we must not loose the SIGTRAP, OTOH if it is the second, it must not override the pending_addr with a (possibly) invalid one. Fixes: ca6c21327c6a ("perf: Fix missing SIGTRAPs") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Marco Elver <elver@google.com> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Link: https://lkml.kernel.org/r/Y3hDYiXwRnJr8RYG@xpf.sh.intel.com
2022-11-24perf: Fixup SIGTRAP and sample_flags interactionPeter Zijlstra1-1/+4
The perf_event_attr::sigtrap functionality relies on data->addr being set. However commit 7b0846301531 ("perf: Use sample_flags for addr") changed this to only initialize data->addr when not 0. Fixes: 7b0846301531 ("perf: Use sample_flags for addr") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Y3426b4OimE%2FI5po%40hirez.programming.kicks-ass.net
2022-11-24octeontx2-af: Fix reference count issue in rvu_sdp_init()Xiongfeng Wang1-2/+5
pci_get_device() will decrease the reference count for the *from* parameter. So we don't need to call put_device() to decrease the reference. Let's remove the put_device() in the loop and only decrease the reference count of the returned 'pdev' for the last loop because it will not be passed to pci_get_device() as input parameter. We don't need to check if 'pdev' is NULL because it is already checked inside pci_dev_put(). Also add pci_dev_put() for the error path. Fixes: fe1939bb2340 ("octeontx2-af: Add SDP interface support") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Link: https://lore.kernel.org/r/20221123065919.31499-1-wangxiongfeng2@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-24net: altera_tse: release phylink resources in tse_shutdown()Liu Jian1-0/+1
Call phylink_disconnect_phy() in tse_shutdown() to release the resources occupied by phylink_of_phy_connect() in the tse_open(). Fixes: fef2998203e1 ("net: altera: tse: convert to phylink") Signed-off-by: Liu Jian <liujian56@huawei.com> Link: https://lore.kernel.org/r/20221123011617.332302-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-24virtio_net: Fix probe failed when modprobe virtio_netLi Zetao1-2/+1
When doing the following test steps, an error was found: step 1: modprobe virtio_net succeeded # modprobe virtio_net <-- OK step 2: fault injection in register_netdevice() # modprobe -r virtio_net <-- OK # ... FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 0 PID: 3521 Comm: modprobe Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), Call Trace: <TASK> ... should_failslab+0xa/0x20 ... dev_set_name+0xc0/0x100 netdev_register_kobject+0xc2/0x340 register_netdevice+0xbb9/0x1320 virtnet_probe+0x1d72/0x2658 [virtio_net] ... </TASK> virtio_net: probe of virtio0 failed with error -22 step 3: modprobe virtio_net failed # modprobe virtio_net <-- failed virtio_net: probe of virtio0 failed with error -2 The root cause of the problem is that the queues are not disable on the error handling path when register_netdevice() fails in virtnet_probe(), resulting in an error "-ENOENT" returned in the next modprobe call in setup_vq(). virtio_pci_modern_device uses virtqueues to send or receive message, and "queue_enable" records whether the queues are available. In vp_modern_find_vqs(), all queues will be selected and activated, but once queues are enabled there is no way to go back except reset. Fix it by reset virtio device on error handling path. This makes error handling follow the same order as normal device cleanup in virtnet_remove() which does: unregister, destroy failover, then reset. And that flow is better tested than error handling so we can be reasonably sure it works well. Fixes: 024655555021 ("virtio_net: fix use after free on allocation failure") Signed-off-by: Li Zetao <lizetao1@huawei.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20221122150046.3910638-1-lizetao1@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-24net: wwan: t7xx: Fix the ACPI memory leakHanjun Guo1-0/+2
The ACPI buffer memory (buffer.pointer) should be freed as the buffer is not used after acpi_evaluate_object(), free it to prevent memory leak. Fixes: 13e920d93e37 ("net: wwan: t7xx: Add core components") Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/1669119580-28977-1-git-send-email-guohanjun@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-24octeontx2-pf: Add check for devm_kcallocJiasheng Jiang1-0/+2
As the devm_kcalloc may return NULL pointer, it should be better to add check for the return value, as same as the others. Fixes: e8e095b3b370 ("octeontx2-af: cn10k: Bandwidth profiles config support") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20221122055449.31247-1-jiasheng@iscas.ac.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-23net: enetc: preserve TX ring priority across reconfigurationVladimir Oltean3-11/+19
In the blamed commit, a rudimentary reallocation procedure for RX buffer descriptors was implemented, for the situation when their format changes between normal (no PTP) and extended (PTP). enetc_hwtstamp_set() calls enetc_close() and enetc_open() in a sequence, and this sequence loses information which was previously configured in the TX BDR Mode Register, specifically via the enetc_set_bdr_prio() call. The TX ring priority is configured by tc-mqprio and tc-taprio, and affects important things for TSN such as the TX time of packets. The issue manifests itself most visibly by the fact that isochron --txtime reports premature packet transmissions when PTP is first enabled on an enetc interface. Save the TX ring priority in a new field in struct enetc_bdr (occupies a 2 byte hole on arm64) in order to make this survive a ring reconfiguration. Fixes: 434cebabd3a2 ("enetc: Add dynamic allocation of extended Rx BD rings") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/r/20221122130936.1704151-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-23net: marvell: prestera: add missing unregister_netdev() in ↵Zhang Changzhong1-0/+1
prestera_port_create() If prestera_port_sfp_bind() fails, unregister_netdev() should be called in error handling path. Compile tested only. Fixes: 52323ef75414 ("net: marvell: prestera: add phylink support") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/1669115432-36841-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-23Merge branch 'nfc-st-nci-restructure-validating-logic-in-evt_transaction'Jakub Kicinski1-13/+36
Martin Faltesek says: ==================== nfc: st-nci: Restructure validating logic in EVT_TRANSACTION These are the same 3 patches that were applied in st21nfca here: https://lore.kernel.org/netdev/20220607025729.1673212-1-mfaltesek@google.com with a couple minor differences. st-nci has nearly identical code to that of st21nfca for EVT_TRANSACTION, except that there are two extra validation checks that are not present in the st-nci code. The 3/3 patch as coded for st21nfca pulls those checks in, bringing both drivers into parity. ==================== Link: https://lore.kernel.org/r/20221122004246.4186422-1-mfaltesek@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-23nfc: st-nci: fix incorrect sizing calculations in EVT_TRANSACTIONMartin Faltesek1-15/+36
The transaction buffer is allocated by using the size of the packet buf, and subtracting two which seems intended to remove the two tags which are not present in the target structure. This calculation leads to under counting memory because of differences between the packet contents and the target structure. The aid_len field is a u8 in the packet, but a u32 in the structure, resulting in at least 3 bytes always being under counted. Further, the aid data is a variable length field in the packet, but fixed in the structure, so if this field is less than the max, the difference is added to the under counting. To fix, perform validation checks progressively to safely reach the next field, to determine the size of both buffers and verify both tags. Once all validation checks pass, allocate the buffer and copy the data. This eliminates freeing memory on the error path, as validation checks are moved ahead of memory allocation. Reported-by: Denis Efremov <denis.e.efremov@oracle.com> Reviewed-by: Guenter Roeck <groeck@google.com> Fixes: 5d1ceb7f5e56 ("NFC: st21nfcb: Add HCI transaction event support") Signed-off-by: Martin Faltesek <mfaltesek@google.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-23nfc: st-nci: fix memory leaks in EVT_TRANSACTIONMartin Faltesek1-1/+3
Error path does not free previously allocated memory. Add devm_kfree() to the failure path. Reported-by: Denis Efremov <denis.e.efremov@oracle.com> Reviewed-by: Guenter Roeck <groeck@google.com> Fixes: 5d1ceb7f5e56 ("NFC: st21nfcb: Add HCI transaction event support") Signed-off-by: Martin Faltesek <mfaltesek@google.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-23nfc: st-nci: fix incorrect validating logic in EVT_TRANSACTIONMartin Faltesek1-1/+1
The first validation check for EVT_TRANSACTION has two different checks tied together with logical AND. One is a check for minimum packet length, and the other is for a valid aid_tag. If either condition is true (fails), then an error should be triggered. The fix is to change && to ||. Reported-by: Denis Efremov <denis.e.efremov@oracle.com> Reviewed-by: Guenter Roeck <groeck@google.com> Fixes: 5d1ceb7f5e56 ("NFC: st21nfcb: Add HCI transaction event support") Signed-off-by: Martin Faltesek <mfaltesek@google.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-23ublk_drv: don't forward io commands in reserve orderMing Lei1-44/+38
Either ublk_can_use_task_work() is true or not, io commands are forwarded to ublk server in reverse order, since llist_add() is always to add one element to the head of the list. Even though block layer doesn't guarantee request dispatch order, requests should be sent to hardware in the sequence order generated from io scheduler, which usually considers the request's LBA, and order is often important for HDD. So forward io commands in the sequence made from io scheduler by aligning task work with current io_uring command's batch handling, and it has been observed that both can get similar performance data if IORING_SETUP_COOP_TASKRUN is set from ublk server. Reported-by: Andreas Hindborg <andreas.hindborg@wdc.com> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Link: https://lore.kernel.org/r/20221121155645.396272-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-23Merge branch 'master' of ↵Jakub Kicinski8-15/+57
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== ipsec 2022-11-23 1) Fix "disable_policy" on ipv4 early demuxP Packets after the initial packet in a flow might be incorectly dropped on early demux if there are no matching policies. From Eyal Birger. 2) Fix a kernel warning in case XFRM encap type is not available. From Eyal Birger. 3) Fix ESN wrap around for GSO to avoid a double usage of a sequence number. From Christian Langrock. 4) Fix a send_acquire race with pfkey_register. From Herbert Xu. 5) Fix a list corruption panic in __xfrm_state_delete(). Thomas Jarosch. 6) Fix an unchecked return value in xfrm6_init(). Chen Zhongjin. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: Fix ignored return value in xfrm6_init() xfrm: Fix oops in __xfrm_state_delete() af_key: Fix send_acquire race with pfkey_register xfrm: replay: Fix ESN wrap around for GSO xfrm: lwtunnel: squelch kernel warning in case XFRM encap type is not available xfrm: fix "disable_policy" on ipv4 early demux ==================== Link: https://lore.kernel.org/r/20221123093117.434274-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski3-6/+8
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix regression in ipset hash:ip with IPv4 range, from Vishwanath Pai. This is fixing up a bug introduced in the 6.0 release. 2) The "netfilter: ipset: enforce documented limit to prevent allocating huge memory" patch contained a wrong condition which makes impossible to add up to 64 clashing elements to a hash:net,iface type of set while it is the documented feature of the set type. The patch fixes the condition and thus makes possible to add the elements while keeps preventing allocating huge memory, from Jozsef Kadlecsik. This has been broken for several releases. 3) Missing locking when updating the flow block list which might lead a reader to crash. This has been broken since the introduction of the flowtable hardware offload support. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: flowtable_offload: add missing locking netfilter: ipset: restore allowing 64 clashing elements in hash:net,iface netfilter: ipset: regression in ip_set_hash_ip.c ==================== Link: https://lore.kernel.org/r/20221122212814.63177-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-23Documentation: networking: Update generic_netlink_howto URLNir Levy1-1/+1
The documentation refers to invalid web page under www.linuxfoundation.org The patch refers to a working URL under wiki.linuxfoundation.org Signed-off-by: Nir Levy <bhr166@gmail.com> Link: https://lore.kernel.org/all/20221120220630.7443-1-bhr166@gmail.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-24scripts: add rust in scripts/Makefile.packageParan Lee1-2/+2
Add rust argument at TAR_CONTENT in scripts/Makefile.package script with alphabetical order. Signed-off-by: Paran Lee <p4ranlee@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-11-24kbuild: fix "cat: .version: No such file or directory"Masahiro Yamada2-3/+3
Since commit 2df8220cc511 ("kbuild: build init/built-in.a just once"), the .version file is not touched at all when KBUILD_BUILD_VERSION is given. If KBUILD_BUILD_VERSION is specified and the .version file is missing (for example right after 'make mrproper'), "No such file or director" is shown. Even if the .version exists, it is irrelevant to the version of the current build. $ make -j$(nproc) KBUILD_BUILD_VERSION=100 mrproper defconfig all [ snip ] BUILD arch/x86/boot/bzImage cat: .version: No such file or directory Kernel: arch/x86/boot/bzImage is ready (#) Show KBUILD_BUILD_VERSION if it is given. Fixes: 2df8220cc511 ("kbuild: build init/built-in.a just once") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2022-11-23Merge branch 'kvm-dwmw2-fixes' into HEADPaolo Bonzini2-10/+29
This brings in a few important fixes for Xen emulation. While nobody should be enabling it, the bug effectively allows userspace to read arbitrary memory. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>