summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2022-12-07RDMA/cma: Change RoCE packet life time from 18 to 16Chao Leng1-1/+1
The ack timeout retransmission time is affected by the following two factors: one is packet life time, another is the HCA processing time. Now the default packet lifetime(CMA_IBOE_PACKET_LIFETIME) is 18. That means the minimum ack timeout is 2 seconds (2^(18+1)*4us=2.097seconds). The packet lifetime means the maximum transmission time of packets on the network, 2 seconds is too long. Assume the network is a clos topology with three layers, every packet will pass through five hops of switches. Assume the buffer of every switch is 128MB and the port transmission rate is 25 Gbit/s, the maximum transmission time of the packet is 200ms(128MB*5/25Gbit/s). Add double redundancy, it is less than 500ms. So change the CMA_IBOE_PACKET_LIFETIME to 16, the maximum transmission time of the packet will be about 500+ms, it is long enough. Link: https://lore.kernel.org/r/20221125010026.755-1-lengchao@huawei.com Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-07RDMA/hfi1: use sysfs_emit() to instead of scnprintf()ye xingchen1-1/+1
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Link: https://lore.kernel.org/r/202212071632188074249@zte.com.cn Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-04RDMA/mlx5: no need to kfree NULL pointerLi Zhijian1-4/+2
Goto label 'free' where it will kfree the 'in' is not needed though it's safe to kfree NULL. Return err code directly to simplify the code. 1973 free: 1974 kfree(in); 1975 return err; Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20221203033714.25870-1-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-04RDMA/srp: Fix error return code in srp_parse_options()Wang Yufen1-14/+82
In the previous iteration of the while loop, the "ret" may have been assigned a value of 0, so the error return code -EINVAL may have been incorrectly set to 0. To fix set valid return code before calling to goto. Also investigate each case separately as Andy suggessted. Fixes: e711f968c49c ("IB/srp: replace custom implementation of hex2bin()") Fixes: 2a174df0c602 ("IB/srp: Use kstrtoull() instead of simple_strtoull()") Fixes: 19f313438c77 ("IB/srp: Add RDMA/CM support") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/1669953638-11747-2-git-send-email-wangyufen@huawei.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-04RDMA/hfi1: Fix error return code in parse_platform_config()Wang Yufen1-0/+6
In the previous iteration of the while loop, the "ret" may have been assigned a value of 0, so the error return code -EINVAL may have been incorrectly set to 0. To fix set valid return code before calling to goto. Fixes: 97167e813415 ("staging/rdma/hfi1: Tune for unknown channel if configuration file is absent") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/1669953638-11747-1-git-send-email-wangyufen@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-04RDMA: Disable IB HW for UMLRandy Dunlap1-0/+2
Disable all of drivers/infiniband/hw/ and rdmavt for UML builds until someone needs it and provides patches to support it. This prevents build errors in hw/qib/qib_wc_x86_64.c. Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: linux-rdma@vger.kernel.org Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: linux-um@lists.infradead.org Link: https://lore.kernel.org/r/20221202211940.29111-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-01RDMA/rxe: Enable atomic write capability for rxe deviceXiao Yang1-0/+5
The capability shows that rxe device supports atomic write operation. Link: https://lore.kernel.org/r/1669905568-62-4-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01RDMA/rxe: Implement atomic write completionXiao Yang1-0/+4
Generate an atomic write completion when the atomic write request has been finished. Link: https://lore.kernel.org/r/1669905568-62-3-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01RDMA/rxe: Make responder support atomic write on RC serviceXiao Yang1-5/+79
Make responder process an atomic write request and send a read response on RC service. Link: https://lore.kernel.org/r/1669905568-62-2-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01RDMA/rxe: Make requester support atomic write on RC serviceXiao Yang1-2/+13
Make requester process and send an atomic write request on RC service. Link: https://lore.kernel.org/r/1669905568-62-1-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01RDMA/rxe: Extend rxe packet format to support atomic writeXiao Yang2-0/+21
Extend rxe_wr_opcode_info[] and rxe_opcode[] for new atomic write opcode. Link: https://lore.kernel.org/r/1669905432-14-5-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01RDMA/rxe: Extend rxe user ABI to support atomic writeXiao Yang1-0/+1
Define an atomic_wr array to store 8-byte value. Link: https://lore.kernel.org/r/1669905432-14-4-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01RDMA: Extend RDMA kernel ABI to support atomic writeXiao Yang2-0/+5
1) Define new atomic write request/completion in kernel. 2) Define new atomic write capability in kernel. 3) Define new atomic write opcode for RC service in packet. Link: https://lore.kernel.org/r/1669905432-14-3-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01RDMA: Extend RDMA user ABI to support atomic writeXiao Yang1-0/+4
1) Define new atomic write request/completion in userspace. 2) Define new atomic write capability in userspace. Link: https://lore.kernel.org/r/1669905432-14-2-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01IB/hfi1: Switch to netif_napi_add()Yang Yang1-1/+1
There is no need to use netif_napi_add_weight() when the weight argument is 64. See "net: drop the weight argument from netif_napi_add". Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Link: https://lore.kernel.org/r/202211301744378304494@zte.com.cn Reviewed-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Zhang Yunkai <zhang.yunkai@zte.com.cn> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-30RDMA/nldev: Fix failure to send large messagesMark Zhang1-1/+1
Return "-EMSGSIZE" instead of "-EINVAL" when filling a QP entry, so that new SKBs will be allocated if there's not enough room in current SKB. Fixes: 65959522f806 ("RDMA: Add support to dump resource tracker in RAW format") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/b5e9c62f6b8369acab5648b661bf539cbceeffdc.1669636336.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-30RDMA/nldev: Add NULL check to silence false warningsOr Har-Toov1-8/+36
Using nlmsg_put causes static analysis tools to many false positives of not checking the return value of nlmsg_put. In all uses in nldev.c, payload parameter is 0 so NULL will never be returned. So let's add useless checks to silence the warnings. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/bd924da89d5b4f5291a4a01d9b5ae47c0a9b6a3f.1669636336.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-29RDMA/mlx4: Remove NULL check before dev_{put, hold}zhang songyi1-8/+4
The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there is no need to check before using dev_{put, hold}. Fix the following coccicheck warnings: /drivers/infiniband/hw/mlx4/main.c:1311:2-10: WARNING: WARNING NULL check before dev_{put, hold} functions is not needed. /drivers/infiniband/hw/mlx4/main.c:148:2-10: WARNING: WARNING NULL check before dev_{put, hold} functions is not needed. /drivers/infiniband/hw/mlx4/main.c:1959:3-11: WARNING: WARNING NULL check before dev_{put, hold} functions is not needed. /drivers/infiniband/hw/mlx4/main.c:1962:3-10: WARNING: WARNING NULL check before dev_{put, hold} functions is not needed. Signed-off-by: zhang songyi <zhang.songyi@zte.com.cn> Link: https://lore.kernel.org/r/202211291554079687539@zte.com.cn Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-28RDMA/nldev: Add checks for nla_nest_start() in fill_stat_counter_qps()Yuan Can1-0/+2
As the nla_nest_start() may fail with NULL returned, the return value needs to be checked. Fixes: c4ffee7c9bdb ("RDMA/netlink: Implement counter dumpit calback") Signed-off-by: Yuan Can <yuancan@huawei.com> Link: https://lore.kernel.org/r/20221126043410.85632-1-yuancan@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-28RDMA: Add netdevice_tracker to ib_device_set_netdev()Jason Gunthorpe2-2/+5
This will cause an informative backtrace to print if the user of ib_device_set_netdev() isn't careful about tearing down the ibdevice before its the netdevice parent is destroyed. Such as like this: unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 leaked reference. ib_device_set_netdev+0x266/0x730 siw_newlink+0x4e0/0xfd0 nldev_newlink+0x35c/0x5c0 rdma_nl_rcv_msg+0x36d/0x690 rdma_nl_rcv+0x2ee/0x430 netlink_unicast+0x543/0x7f0 netlink_sendmsg+0x918/0xe20 sock_sendmsg+0xcf/0x120 ____sys_sendmsg+0x70d/0x8b0 ___sys_sendmsg+0x11d/0x1b0 __sys_sendmsg+0xfa/0x1d0 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd This will help debug the issues syzkaller is seeing. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-a7c81b3842ce+e5-netdev_tracker_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-24RDMA/erdma: Notify the latest PI to FW for reflushing when necessaryCheng Xu2-9/+26
Firmware is responsible for flushing WRs in HW, and it's a little difficult for firmware to get the latest PI of QPs, especially for RQs after QP state being changed to ERROR. So we introduce a new CMDQ command, by which driver can notify to latest PI to FW, and then FW can flush all posted WRs. Link: https://lore.kernel.org/r/20221116023107.82835-4-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-24RDMA/erdma: Implement the lifecycle of reflushing work for each QPCheng Xu3-0/+28
Each QP has a work for reflushing purpose. In the work, driver will report the latest pi to hardware. Link: https://lore.kernel.org/r/20221116023107.82835-3-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-24RDMA/erdma: Add a workqueue for WRs reflushingCheng Xu2-2/+13
ERDMA driver use a workqueue for asynchronous reflush command posting. Implement the lifecycle of this workqueue. Link: https://lore.kernel.org/r/20221116023107.82835-2-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-24RDMA/erdma: Fix a typo in annotationCheng Xu1-1/+1
A non-ASCII character was wrongly put in a comment, use the ACSII version. Fixes: bee85e0e31ec ("RDMA/erdma: Add main include file") Link: https://lore.kernel.org/r/20221124114933.77250-1-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-22RDMA/rxe: Fix NULL-ptr-deref in rxe_qp_do_cleanup() when socket create failedZhang Xiaoxu1-3/+3
There is a null-ptr-deref when mount.cifs over rdma: BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] Read of size 8 at addr 0000000000000018 by task mount.cifs/3046 CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 kasan_report+0xad/0x130 rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] execute_in_process_context+0x25/0x90 __rxe_cleanup+0x101/0x1d0 [rdma_rxe] rxe_create_qp+0x16a/0x180 [rdma_rxe] create_qp.part.0+0x27d/0x340 ib_create_qp_kernel+0x73/0x160 rdma_create_qp+0x100/0x230 _smbd_get_connection+0x752/0x20f0 smbd_get_connection+0x21/0x40 cifs_get_tcp_session+0x8ef/0xda0 mount_get_conns+0x60/0x750 cifs_mount+0x103/0xd00 cifs_smb3_do_mount+0x1dd/0xcb0 smb3_get_tree+0x1d5/0x300 vfs_get_tree+0x41/0xf0 path_mount+0x9b3/0xdd0 __x64_sys_mount+0x190/0x1d0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The root cause of the issue is the socket create failed in rxe_qp_init_req(). So move the reset rxe_qp_do_cleanup() after the NULL ptr check. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20221122151437.1057671-1-zhangxiaoxu5@huawei.com Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-22RDMA/rxe: Do not NULL deref on debugging failure pathJason Gunthorpe1-1/+1
Correct the mistake, mr is obviously NULL in this code path. Fixes: 2778b72b1df0 ("RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mr.c") Link: https://lore.kernel.org/r/Y3eeJW0AdyJYhYyQ@kili Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-22RDMA/hns: fix memory leak in hns_roce_alloc_mr()Zhengchao Shao1-2/+2
When hns_roce_mr_enable() failed in hns_roce_alloc_mr(), mr_key is not released. Compiled test only. Fixes: 9b2cf76c9f05 ("RDMA/hns: Optimize PBL buffer allocation process") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20221119070834.48502-1-shaozhengchao@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-22RDMA/irdma: Initialize net_type before checking itMustafa Ismail1-0/+1
The av->net_type is not initialized before it is checked in irdma_modify_qp_roce. This leads to an incorrect update to the ARP cache and QP context. RoCEv2 connections might fail as result. Set the net_type using rdma_gid_attr_network_type. Fixes: 80005c43d4c8 ("RDMA/irdma: Use net_type to check network type") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221122004410.1471-1-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-22RDMA/hfi: Decrease PCI device reference count in error pathXiongfeng Wang1-0/+2
pci_get_device() will increase the reference count for the returned pci_dev, and also decrease the reference count for the input parameter *from* if it is not NULL. If we break out the loop in node_affinity_init() with 'dev' not NULL, we need to call pci_dev_put() to decrease the reference count. Add missing pci_dev_put() in error path. Fixes: c513de490f80 ("IB/hfi1: Invalid NUMA node information can cause a divide by zero") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Link: https://lore.kernel.org/r/20221117131546.113280-1-wangxiongfeng2@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-21IB/isert: use the ISCSI_LOGIN_CURRENT_STAGE macroMaurizio Lombardi1-3/+2
Use the proper macro to get the current_stage value. Link: https://lore.kernel.org/r/20221116094535.138298-1-mlombard@redhat.com Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18RDMA/hns: Fix incorrect sge nums calculationLuoyouming5-30/+125
The user usually configures the number of sge through the max_send_sge parameter when creating qp, and configures the maximum size of inline data that can be sent through max_inline_data. Inline uses sge to fill data to send. Expect the following: 1) When the sge space cannot hold inline data, the sge space needs to be expanded to accommodate all inline data 2) When the sge space is enough to accommodate inline data, the upper limit of inline data can be increased so that users can send larger inline data Currently case one is not implemented. When the inline data is larger than the sge space, an error of insufficient sge space occurs. This part of the code needs to be reimplemented according to the expected rules. The calculation method of sge num is modified to take the maximum value of max_send_sge and the sge for max_inline_data to solve this problem. Fixes: 05201e01be93 ("RDMA/hns: Refactor process of setting extended sge") Fixes: 30b707886aeb ("RDMA/hns: Support inline data in extented sge space for RC") Link: https://lore.kernel.org/r/20221108133847.2304539-3-xuhaoyue1@hisilicon.com Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18RDMA/hns: Fix ext_sge num error when post sendLuoyouming1-2/+11
In the HNS ROCE driver, The sge is divided into standard sge and extended sge. There are 2 standard sge in RC/XRC, and the UD standard sge is 0. In the scenario of RC SQ inline, if the data does not exceed 32bytes, the standard sge will be used. If it exceeds, only the extended sge will be used to fill the data. Currently, when filling the extended sge, max_gs is directly used as the number of the extended sge, which did not subtract the number of standard sge. There is a logical error. The new algorithm subtracts the number of standard sge from max_gs to get the actual number of extended sge. Fixes: 30b707886aeb ("RDMA/hns: Support inline data in extented sge space for RC") Link: https://lore.kernel.org/r/20221108133847.2304539-2-xuhaoyue1@hisilicon.com Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18RDMA/rxe: Fix mr->map double freeLi Zhijian1-7/+2
rxe_mr_cleanup() which tries to free mr->map again will be called when rxe_mr_init_user() fails: CPU: 0 PID: 4917 Comm: rdma_flush_serv Kdump: loaded Not tainted 6.1.0-rc1-roce-flush+ #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x5d panic+0x19e/0x349 end_report.part.0+0x54/0x7c kasan_report.cold+0xa/0xf rxe_mr_cleanup+0x9d/0xf0 [rdma_rxe] __rxe_cleanup+0x10a/0x1e0 [rdma_rxe] rxe_reg_user_mr+0xb7/0xd0 [rdma_rxe] ib_uverbs_reg_mr+0x26a/0x480 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x1a2/0x250 [ib_uverbs] ib_uverbs_cmd_verbs+0x1397/0x15a0 [ib_uverbs] This issue was firstly exposed since commit b18c7da63fcb ("RDMA/rxe: Fix memory leak in error path code") and then we fixed it in commit 8ff5f5d9d8cf ("RDMA/rxe: Prevent double freeing rxe_map_set()") but this fix was reverted together at last by commit 1e75550648da (Revert "RDMA/rxe: Create duplicate mapping tables for FMRs") Simply let rxe_mr_cleanup() always handle freeing the mr->map once it is successfully allocated. Fixes: 1e75550648da ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"") Link: https://lore.kernel.org/r/1667099073-2-1-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18RDMA/rxe: Remove reliable datagram supportZhu Yanjun1-1/+0
The rdma_rxe driver does not actually support the reliable datagram transport but contains a variable with RD opcodes in driver code. And this variable is never used. So remove it. Link: https://lore.kernel.org/r/20221112023537.432912-1-yanjun.zhu@intel.com Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18IB/hfi1: Replace 1-element array with singletonKees Cook1-11/+11
Zero-length arrays are deprecated[1] and are being replaced with flexible array members in support of the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy(), correctly instrument array indexing with UBSAN_BOUNDS, and to globally enable -fstrict-flex-arrays=3. Replace zero-length array with flexible-array member "lvs" in struct opa_port_data_counters_msg and struct opa_port_error_counters64_msg. Additionally, the "port" member of several structs is defined as a single-element, but is only ever accessed at index 0. Replace it with a singleton so that flexible array usage is sane. This results in no differences in binary output. [1] https://github.com/KSPP/linux/issues/78 Link: https://lore.kernel.org/r/20221118215847.never.416-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-17RDMA/rtrs-srv: Remove kobject_del from rtrs_srv_destroy_once_sysfs_root_foldersGuoqing Jiang1-1/+0
The kobj_paths which is created dynamically by kobject_create_and_add, and per the comment above kobject_create_and_add, we only need to call kobject_put which is not same as other kobjs such as stats->kobj_stats and srv_path->kobj. Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20221117101945.6317-9-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/rtrs-srv: Fix several issues in rtrs_srv_destroy_path_filesGuoqing Jiang1-3/+9
There are several issues in the function which is supposed to be paired with rtrs_srv_create_path_files. 1. rtrs_srv_stats_attr_group is not removed though it is created in rtrs_srv_create_stats_files. 2. it makes more sense to check kobj_stats.state_in_sysfs before destroy kobj_stats instead of rely on kobj.state_in_sysfs. 3. kobject_init_and_add is used for both kobjs (srv_path->kobj and srv_path->stats->kobj_stats), however we missed to call kobject_del for srv_path->kobj which was called in free_path. 4. rtrs_srv_destroy_once_sysfs_root_folders is independent of either kobj or kobj_stats. Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20221117101945.6317-8-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/rtrs: Clean up rtrs_rdma_dev_pd_opsGuoqing Jiang2-21/+4
Let's remove them since the three members are not used. Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20221117101945.6317-7-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/rtrs-srv: Remove outdated comments from create_conGuoqing Jiang1-6/+0
Remove the orphan comments. Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20221117101945.6317-6-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/rtrs-clt: Correct the checking of ib_map_mr_sgGuoqing Jiang1-4/+2
We should check with count, also the only successful case is that all sg elements are mapped, so make it explicitly. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20221117101945.6317-5-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/rtrs-srv: Correct the checking of ib_map_mr_sgGuoqing Jiang1-1/+1
We should check with nr_sgt, also the only successful case is that all sg elements are mapped, so make it explicitly. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20221117101945.6317-4-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/rtrs-srv: Refactor the handling of failure case in map_cont_bufsGuoqing Jiang1-27/+20
Let's call unmap_cont_bufs when failure happens, and also only update mrs_num after everything is settled which means we can remove 'mri'. Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20221117101945.6317-3-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/rtrs-srv: Refactor rtrs_srv_rdma_cm_handlerGuoqing Jiang1-9/+8
The RDMA_CM_EVENT_CONNECT_REQUEST is quite different to other types, let's check it separately at the beginning of routine, then we can avoid the indentation accordingly. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20221117101945.6317-2-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/irdma: Do not request 2-level PBLEs for CQ allocMustafa Ismail1-10/+5
When allocating PBLE's for a large CQ, it is possible that a 2-level PBLE is returned which would cause the CQ allocation to fail since 1-level is assumed and checked for. Fix this by requesting a level one PBLE only. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221115011701.1379-4-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/irdma: Fix RQ completion opcodeMustafa Ismail5-45/+71
The opcode written by HW, in the RQ CQE, is the RoCEv2/iWARP protocol opcode from the received packet and not the SW opcode as currently assumed. Fix this by returning the raw operation type and queue type in the CQE to irdma_process_cqe and add 2 helpers set_ib_wc_op_sq set_ib_wc_op_rq to map IRDMA HW op types to IB op types. Note that for iWARP, only Write with Immediate is supported so the opcode can only be IB_WC_RECV_RDMA_WITH_IMM when there is immediate data present. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221115011701.1379-3-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/irdma: Fix inline for multiple SGE'sMustafa Ismail3-104/+119
Currently, inline send and inline write assume a single SGE and only copy data from the first one. Add support for multiple SGE's. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221115011701.1379-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-15RDMA/siw: Set defined status for work completion with undefined statusBernard Metzler1-2/+2
A malicious user may write undefined values into memory mapped completion queue elements status or opcode. Undefined status or opcode values will result in out-of-bounds access to an array mapping siw internal representation of opcode and status to RDMA core representation when reaping CQ elements. While siw detects those undefined values, it did not correctly set completion status to a defined value, thus defeating the whole purpose of the check. This bug leads to the following Smatch static checker warning: drivers/infiniband/sw/siw/siw_cq.c:96 siw_reap_cqe() error: buffer overflow 'map_cqe_status' 10 <= 21 Fixes: bdf1da5df9da ("RDMA/siw: Fix immediate work request flush to completion queue") Link: https://lore.kernel.org/r/20221115170747.1263298-1-bmt@zurich.ibm.com Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-15RDMA/nldev: Return "-EAGAIN" if the cm_id isn't from expected portMark Zhang1-1/+1
When filling a cm_id entry, return "-EAGAIN" instead of 0 if the cm_id doesn'the have the same port as requested, otherwise an incomplete entry may be returned, which causes "rdam res show cm_id" to return an error. For example on a machine with two rdma devices with "rping -C 1 -v -s" running background, the "rdma" command fails: $ rdma -V rdma utility, iproute2-5.19.0 $ rdma res show cm_id link mlx5_0/- cm-idn 0 state LISTEN ps TCP pid 28056 comm rping src-addr 0.0.0.0:7174 error: Protocol not available While with this fix it succeeds: $ rdma res show cm_id link mlx5_0/- cm-idn 0 state LISTEN ps TCP pid 26395 comm rping src-addr 0.0.0.0:7174 link mlx5_1/- cm-idn 0 state LISTEN ps TCP pid 26395 comm rping src-addr 0.0.0.0:7174 Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/a08e898cdac5e28428eb749a99d9d981571b8ea7.1667810736.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-15RDMA/core: Make sure "ib_port" is valid when access sysfs nodeMark Zhang1-6/+11
The "ib_port" structure must be set before adding the sysfs kobject, and reset after removing it, otherwise it may crash when accessing the sysfs node: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050 Mem abort info: ESR = 0x96000006 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000e85f5ba5 [0000000000000050] pgd=0000000848fd9003, pud=000000085b387003, pmd=0000000000000000 Internal error: Oops: 96000006 [#2] PREEMPT SMP Modules linked in: ib_umad(O) mlx5_ib(O) nfnetlink_cttimeout(E) nfnetlink(E) act_gact(E) cls_flower(E) sch_ingress(E) openvswitch(E) nsh(E) nf_nat_ipv6(E) nf_nat_ipv4(E) nf_conncount(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) mst_pciconf(O) ipmi_devintf(E) ipmi_msghandler(E) ipmb_dev_int(OE) mlx5_core(O) mlxfw(O) mlxdevm(O) auxiliary(O) ib_uverbs(O) ib_core(O) mlx_compat(O) psample(E) sbsa_gwdt(E) uio_pdrv_genirq(E) uio(E) mlxbf_pmc(OE) mlxbf_gige(OE) mlxbf_tmfifo(OE) gpio_mlxbf2(OE) pwr_mlxbf(OE) mlx_trio(OE) i2c_mlxbf(OE) mlx_bootctl(OE) bluefield_edac(OE) knem(O) ip_tables(E) ipv6(E) crc_ccitt(E) [last unloaded: mst_pci] Process grep (pid: 3372, stack limit = 0x0000000022055c92) CPU: 5 PID: 3372 Comm: grep Tainted: G D OE 4.19.161-mlnx.47.gadcd9e3 #1 Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.9.2-15-ga2403ab Sep 8 2022 pstate: 40000005 (nZcv daif -PAN -UAO) pc : hw_stat_port_show+0x4c/0x80 [ib_core] lr : port_attr_show+0x40/0x58 [ib_core] sp : ffff000029f43b50 x29: ffff000029f43b50 x28: 0000000019375000 x27: ffff8007b821a540 x26: ffff000029f43e30 x25: 0000000000008000 x24: ffff000000eaa958 x23: 0000000000001000 x22: ffff8007a4ce3000 x21: ffff8007baff8000 x20: ffff8007b9066ac0 x19: ffff8007bae97578 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : ffff8007a4ce4000 x7 : 0000000000000000 x6 : 000000000000003f x5 : ffff000000e6a280 x4 : ffff8007a4ce3000 x3 : 0000000000000000 x2 : aaaaaaaaaaaaaaab x1 : ffff8007b9066a10 x0 : ffff8007baff8000 Call trace: hw_stat_port_show+0x4c/0x80 [ib_core] port_attr_show+0x40/0x58 [ib_core] sysfs_kf_seq_show+0x8c/0x150 kernfs_seq_show+0x44/0x50 seq_read+0x1b4/0x45c kernfs_fop_read+0x148/0x1d8 __vfs_read+0x58/0x180 vfs_read+0x94/0x154 ksys_read+0x68/0xd8 __arm64_sys_read+0x28/0x34 el0_svc_common+0x88/0x18c el0_svc_handler+0x78/0x94 el0_svc+0x8/0xe8 Code: f2955562 aa1603e4 aa1503e0 f9405683 (f9402861) Fixes: d8a5883814b9 ("RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/88867e705c42c1cd2011e45201c25eecdb9fef94.1667810736.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-15RDMA/restrack: Release MR restrack when deleteMark Zhang1-2/+0
The MR restrack also needs to be released when delete it, otherwise it cause memory leak as the task struct won't be released. Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/703db18e8d4ef628691fb93980a709be673e62e3.1667810736.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>