linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2022-08-25	Bluetooth: ISO: Fix not handling shutdown condition	Luiz Augusto von Dentz	1	-10/+25
	In order to properly handle shutdown syscall the code shall not assume that the how argument is always SHUT_RDWR resulting in SHUTDOWN_MASK as that would result in poll to immediately report EPOLLHUP instead of properly waiting for disconnect_cfm (Disconnect Complete) which is rather important for the likes of BAP as the CIG may need to be reprogrammed. Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: hci_sync: fix double mgmt_pending_free() in remove_adv_monitor()	Tetsuo Handa	1	-1/+0
	syzbot is reporting double kfree() at remove_adv_monitor() [1], for commit 7cf5c2978f23fdbb ("Bluetooth: hci_sync: Refactor remove Adv Monitor") forgot to remove duplicated mgmt_pending_remove() when merging "if (err) {" path and "if (!pending) {" path. Link: https://syzkaller.appspot.com/bug?extid=915a8416bf15895b8e07 [1] Reported-by: syzbot <syzbot+915a8416bf15895b8e07@syzkaller.appspotmail.com> Fixes: 7cf5c2978f23fdbb ("Bluetooth: hci_sync: Refactor remove Adv Monitor") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: MGMT: Fix Get Device Flags	Luiz Augusto von Dentz	1	-29/+42
	Get Device Flags don't check if device does actually use an RPA in which case it shall only set HCI_CONN_FLAG_REMOTE_WAKEUP if LL Privacy is enabled. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: L2CAP: Fix build errors in some archs	Luiz Augusto von Dentz	1	-5/+5
	This attempts to fix the follow errors: In function 'memcmp', inlined from 'bacmp' at ./include/net/bluetooth/bluetooth.h:347:9, inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2003:15: ./include/linux/fortify-string.h:44:33: error: '__builtin_memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread] 44 \| #define __underlying_memcmp __builtin_memcmp \| ^ ./include/linux/fortify-string.h:420:16: note: in expansion of macro '__underlying_memcmp' 420 \| return __underlying_memcmp(p, q, size); \| ^~~~~~~~~~~~~~~~~~~ In function 'memcmp', inlined from 'bacmp' at ./include/net/bluetooth/bluetooth.h:347:9, inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2004:15: ./include/linux/fortify-string.h:44:33: error: '__builtin_memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread] 44 \| #define __underlying_memcmp __builtin_memcmp \| ^ ./include/linux/fortify-string.h:420:16: note: in expansion of macro '__underlying_memcmp' 420 \| return __underlying_memcmp(p, q, size); \| ^~~~~~~~~~~~~~~~~~~ Fixes: 332f1795ca20 ("Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: hci_sync: Fix suspend performance regression	Luiz Augusto von Dentz	1	-10/+14
	This attempts to fix suspend performance when there is no connections by not updating the event mask. Fixes: ef61b6ea1544 ("Bluetooth: Always set event mask on suspend") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: hci_event: Fix vendor (unknown) opcode status handling	Hans de Goede	1	-0/+11
	Commit c8992cffbe74 ("Bluetooth: hci_event: Use of a function table to handle Command Complete") was (presumably) meant to only refactor things without any functional changes. But it does have one undesirable side-effect, before status would always be set to skb->data[0] and it might be overridden by some of the opcode specific handling. While now it always set by the opcode specific handlers. This means that if the opcode is not known status does not get set any more at all! This behavior change has broken bluetooth support for BCM4343A0 HCIs, the hci_bcm.c code tries to configure UART attached HCIs at a higher baudraute using vendor specific opcodes. The BCM4343A0 does not support this and this used to simply fail: [ 25.646442] Bluetooth: hci0: BCM: failed to write clock (-56) [ 25.646481] Bluetooth: hci0: Failed to set baudrate After which things would continue with the initial baudraute. But now that hci_cmd_complete_evt() no longer sets status for unknown opcodes status is left at 0. This causes the hci_bcm.c code to think the baudraute has been changed on the HCI side and to also adjust the UART baudrate, after which communication with the HCI is broken, leading to: [ 28.579042] Bluetooth: hci0: command 0x0c03 tx timeout [ 36.961601] Bluetooth: hci0: BCM: Reset failed (-110) And non working bluetooth. Fix this by restoring the previous default "status = skb->data[0]" handling for unknown opcodes. Fixes: c8992cffbe74 ("Bluetooth: hci_event: Use of a function table to handle Command Complete") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: convert hci_update_adv_data to hci_sync	Brian Gix	6	-68/+23
	hci_update_adv_data() is called from hci_event and hci_core due to events from the controller. The prior function used the deprecated hci_request method, and the new one uses hci_sync.c Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: move hci_get_random_address() to hci_sync	Brian Gix	4	-172/+170
	This function has no dependencies on the deprecated hci_request mechanism, so has been moved unchanged to hci_sync.c Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: Delete unreferenced hci_request code	Brian Gix	2	-768/+2
	This patch deletes a whole bunch of code no longer reached because the functionality was recoded using hci_sync.c Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: Move Adv Instance timer to hci_sync	Brian Gix	5	-127/+125
	The Advertising Instance expiration timer adv_instance_expire was handled with the deprecated hci_request mechanism, rather than it's replacement: hci_sync. Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: Convert SCO configure_datapath to hci_sync	Brian Gix	3	-60/+75
	Recoding HCI cmds to offload SCO codec to use hci_sync mechanism rather than deprecated hci_request mechanism. Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: Delete unused hci_req_stop_discovery()	Brian Gix	2	-50/+0
	hci_req_stop_discovery has been deprecated in favor of hci_stop_discovery_sync() as part of transition to hci_sync.c Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: Rework le_scan_restart for hci_sync	Brian Gix	2	-89/+75
	le_scan_restart delayed work queue was running as a deprecated hci_request instead of on the newer thread-safe hci_sync mechanism. Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: Convert le_scan_disable timeout to hci_sync	Brian Gix	2	-97/+74
	The le_scan_disable timeout was being performed on the deprecated hci_request.c mechanism. This timeout is performed in hci_sync.c Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	Bluetooth: Add VID/PID 0489/e0e0 for MediaTek MT7921	Fae	1	-0/+3
	Tested on HP Envy ey0xxx output from /sys/kernel/debug/usb/devices: T: Bus=01 Lev=01 Prnt=01 Port=02 Cnt=01 Dev#= 2 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0489 ProdID=e0e0 Rev= 1.00 S: Manufacturer=MediaTek Inc. S: Product=Wireless_Device S: SerialNumber=000000000 C:* #Ifs= 3 Cfg#= 1 Atr=e0 MxPwr=100mA A: FirstIf#= 0 IfCount= 3 Cls=e0(wlcon) Sub=01 Prot=01 I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=125us E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms I: If#= 1 Alt= 6 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 63 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 63 Ivl=1ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none) E: Ad=8a(I) Atr=03(Int.) MxPS= 64 Ivl=125us E: Ad=0a(O) Atr=03(Int.) MxPS= 64 Ivl=125us I: If#= 2 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none) E: Ad=8a(I) Atr=03(Int.) MxPS= 512 Ivl=125us E: Ad=0a(O) Atr=03(Int.) MxPS= 512 Ivl=125us Signed-off-by: Fae <faenkhauser@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-25	netdev: Use try_cmpxchg in napi_if_scheduled_mark_missed	Uros Bizjak	1	-2/+2
	Use try_cmpxchg instead of cmpxchg (ptr, old, new) == old in napi_if_scheduled_mark_missed. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg implicitly assigns old ptr value to "old" when cmpxchg fails, enabling further code simplifications. Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20220822143243.2798-1-ubizjak@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25	Merge branch 'mlxsw-remove-some-unused-code'	Jakub Kicinski	3	-138/+0
	Petr Machata says: ==================== mlxsw: Remove some unused code This patchset removes code that is not used anymore after the following two commits removed all users: - commit b0d80c013b04 ("mlxsw: Remove Mellanox SwitchX-2 ASIC support") - commit 9b43fbb8ce24 ("mlxsw: Remove Mellanox SwitchIB ASIC support") ==================== Link: https://lore.kernel.org/r/cover.1661350629.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25	mlxsw: Remove unused mlxsw_core_port_type_get()	Jiri Pirko	2	-14/+0
	Function mlxsw_core_port_type_get() is no longer used. So remove it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25	mlxsw: Remove unused port_type_set devlink op	Jiri Pirko	2	-18/+0
	port_type_set devlink op is no longer used by any mlxsw driver, so remove it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25	mlxsw: Remove unused IB stuff	Jiri Pirko	3	-106/+0
	There are some IB leftovers that are no longer used in the code. So remove them. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25	Merge branch 'net-devlink-sync-flash-and-dev-info-commands'	Jakub Kicinski	3	-26/+147
	Jiri Pirko says: ==================== net: devlink: sync flash and dev info commands Purpose of this patchset is to introduce consistency between two devlink commands: devlink dev info Shows versions of running default flash target and components. devlink dev flash Flashes default flash target or component name (if specified on cmdline). Currently it is up to the driver what versions to expose and what flash update component names to accept. This is inconsistent. Thankfully, only netdevsim currently using components so it is still time to sanitize this. This patchset makes sure, that devlink.c calls into driver for component flash update only in case the driver exposes the same version name. Example: $ devlink dev info netdevsim/netdevsim10: driver netdevsim versions: running: fw.mgmt 10.20.30 stored: fw.mgmt 10.20.30 $ devlink dev flash netdevsim/netdevsim10 file somefile.bin [fw.mgmt] Preparing to flash [fw.mgmt] Flashing 100% [fw.mgmt] Flash select [fw.mgmt] Flashing done $ devlink dev flash netdevsim/netdevsim10 file somefile.bin component fw.mgmt [fw.mgmt] Preparing to flash [fw.mgmt] Flashing 100% [fw.mgmt] Flash select [fw.mgmt] Flashing done $ devlink dev flash netdevsim/netdevsim10 file somefile.bin component dummy Error: selected component is not supported by this device. ==================== Link: https://lore.kernel.org/r/20220824122011.1204330-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25	net: devlink: limit flash component name to match version returned by info_get()	Jiri Pirko	3	-21/+90
	Limit the acceptance of component name passed to cmd_flash_update() to match one of the versions returned by info_get(), marked by version type. This makes things clearer and enforces 1:1 mapping between exposed version and accepted flash component. Check VERSION_TYPE_COMPONENT version type during cmd_flash_update() execution by calling info_get() with different "req" context. That causes info_get() to lookup the component name instead of filling-up the netlink message. Remove "UPDATE_COMPONENT" flag which becomes used. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25	netdevsim: add version fw.mgmt info info_get() and mark as a component	Jiri Pirko	1	-1/+11
	Fix the only component user which is netdevsim. It uses component named "fw.mgmt" in selftests. So add this version to info_get() output with version type component. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25	net: devlink: extend info_get() version put to indicate a flash component	Jiri Pirko	2	-4/+46
	Whenever the driver is called by his info_get() op, it may put multiple version names and values to the netlink message. Extend by additional helper devlink_info_version_running/stored_put_ext() that allows to specify a version type that indicates when particular version name represents a flash component. This is going to be used in follow-up patch calling info_get() during flash update command checking if version with this the version type exists. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25	selftests/net: fix reinitialization of TEST_PROGS in net self tests.	Adel Abouchaev	1	-1/+1
	Assinging will drop all previous tests. Fixes: b690842d12fd ("selftests/net: test l2 tunnel TOS/TTL inheriting") Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20220824184351.3759862-1-adel.abushaev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25	net: sched: delete duplicate cleanup of backlog and qlen	Zhengchao Shao	21	-40/+0
	qdisc_reset() is clearing qdisc->q.qlen and qdisc->qstats.backlog _after_ calling qdisc->ops->reset. There is no need to clear them again in the specific reset function. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220824005231.345727-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-25	nfp: flower: support case of match on ct_state(0/0x3f)	Wenjuan Geng	1	-2/+7
	is_post_ct_flow() function will process only ct_state ESTABLISHED, then offload_pre_check() function will check FLOW_DISSECTOR_KEY_CT flag. When config tc filter match ct_state(0/0x3f), dissector->used_keys with FLOW_DISSECTOR_KEY_CT bit, function offload_pre_check() will return false, so not offload. This is a special case that can be handled safely. Therefore, modify to let initial packet which won't go through conntrack can be offloaded, as long as the cared ct fields are all zero. Signed-off-by: Wenjuan Geng <wenjuan.geng@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220823090122.403631-1-simon.horman@corigine.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-25	net: gro: skb_gro_header helper function	Richard Gobert	10	-69/+45
	Introduce a simple helper function to replace a common pattern. When accessing the GRO header, we fetch the pointer from frag0, then test its validity and fetch it from the skb when necessary. This leads to the pattern skb_gro_header_fast -> skb_gro_header_hard -> skb_gro_header_slow recurring many times throughout GRO code. This patch replaces these patterns with a single inlined function call, improving code readability. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220823071034.GA56142@debian Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-24	Documentation: devlink: fix the locking section	Jiri Pirko	1	-4/+2
	As all callbacks are converted now, fix the text reflecting that change. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20220823070213.1008956-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24	Merge branch 'add-a-second-bind-table-hashed-by-port-and-address'	Jakub Kicinski	18	-95/+1061
	Joanne Koong says: ==================== Add a second bind table hashed by port and address Currently, there is one bind hashtable (bhash) that hashes by port only. This patchset adds a second bind table (bhash2) that hashes by port and address. The motivation for adding bhash2 is to expedite bind requests in situations where the port has many sockets in its bhash table entry (eg a large number of sockets bound to different addresses on the same port), which makes checking bind conflicts costly especially given that we acquire the table entry spinlock while doing so, which can cause softirq cpu lockups and can prevent new tcp connections. We ran into this problem at Meta where the traffic team binds a large number of IPs to port 443 and the bind() call took a significant amount of time which led to cpu softirq lockups, which caused packet drops and other failures on the machine. When experimentally testing this on a local server for ~24k sockets bound to the port, the results seen were: ipv4: before - 0.002317 seconds with bhash2 - 0.000020 seconds ipv6: before - 0.002431 seconds with bhash2 - 0.000021 seconds The additions to the initial bhash2 submission [0] are: * Updating bhash2 in the cases where a socket's rcv saddr changes after it has * been bound * Adding locks for bhash2 hashbuckets [0] https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/ v3: https://lore.kernel.org/netdev/20220722195406.1304948-2-joannelkoong@gmail.com/ v2: https://lore.kernel.org/netdev/20220712235310.1935121-1-joannelkoong@gmail.com/ v1: https://lore.kernel.org/netdev/20220623234242.2083895-2-joannelkoong@gmail.com/ ==================== Link: https://lore.kernel.org/r/20220822181023.3979645-1-joannelkoong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24	selftests/net: Add sk_bind_sendto_listen and sk_connect_zero_addr	Joanne Koong	4	-0/+146
	This patch adds 2 new tests: sk_bind_sendto_listen and sk_connect_zero_addr. The sk_bind_sendto_listen test exercises the path where a socket's rcv saddr changes after it has been added to the binding tables, and then a listen() on the socket is invoked. The listen() should succeed. The sk_bind_sendto_listen test is copied over from one of syzbot's tests: https://syzkaller.appspot.com/x/repro.c?x=1673a38df00000 The sk_connect_zero_addr test exercises the path where the socket was never previously added to the binding tables and it gets assigned a saddr upon a connect() to address 0. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24	selftests/net: Add test for timing a bind request to a port with a populated ↵	Joanne Koong	4	-1/+215
	bhash entry This test populates the bhash table for a given port with MAX_THREADS * MAX_CONNECTIONS sockets, and then times how long a bind request on the port takes. When populating the bhash table, we create the sockets and then bind the sockets to the same address and port (SO_REUSEADDR and SO_REUSEPORT are set). When timing how long a bind on the port takes, we bind on a different address without SO_REUSEPORT set. We do not set SO_REUSEPORT because we are interested in the case where the bind request does not go through the tb->fastreuseport path, which is fragile (eg tb->fastreuseport path does not work if binding with a different uid). To run the script: Usage: ./bind_bhash.sh [-6 \| -4] [-p port] [-a address] 6: use ipv6 4: use ipv4 port: Port number address: ip address Without any arguments, ./bind_bhash.sh defaults to ipv6 using ip address "2001:0db8:0:f101::1" on port 443. On my local machine, I see: ipv4: before - 0.002317 seconds with bhash2 - 0.000020 seconds ipv6: before - 0.002431 seconds with bhash2 - 0.000021 seconds Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24	net: Add a bhash2 table hashed by port and address	Joanne Koong	12	-94/+700
	The current bind hashtable (bhash) is hashed by port only. In the socket bind path, we have to check for bind conflicts by traversing the specified port's inet_bind_bucket while holding the hashbucket's spinlock (see inet_csk_get_port() and inet_csk_bind_conflict()). In instances where there are tons of sockets hashed to the same port at different addresses, the bind conflict check is time-intensive and can cause softirq cpu lockups, as well as stops new tcp connections since __inet_inherit_port() also contests for the spinlock. This patch adds a second bind table, bhash2, that hashes by port and sk->sk_rcv_saddr (ipv4) and sk->sk_v6_rcv_saddr (ipv6). Searching the bhash2 table leads to significantly faster conflict resolution and less time holding the hashbucket spinlock. Please note a few things: * There can be the case where the a socket's address changes after it has been bound. There are two cases where this happens: 1) The case where there is a bind() call on INADDR_ANY (ipv4) or IPV6_ADDR_ANY (ipv6) and then a connect() call. The kernel will assign the socket an address when it handles the connect() 2) In inet_sk_reselect_saddr(), which is called when rebuilding the sk header and a few pre-conditions are met (eg rerouting fails). In these two cases, we need to update the bhash2 table by removing the entry for the old address, and add a new entry reflecting the updated address. * The bhash2 table must have its own lock, even though concurrent accesses on the same port are protected by the bhash lock. Bhash2 must have its own lock to protect against cases where sockets on different ports hash to different bhash hashbuckets but to the same bhash2 hashbucket. This brings up a few stipulations: 1) When acquiring both the bhash and the bhash2 lock, the bhash2 lock will always be acquired after the bhash lock and released before the bhash lock is released. 2) There are no nested bhash2 hashbucket locks. A bhash2 lock is always acquired+released before another bhash2 lock is acquired+released. * The bhash table cannot be superseded by the bhash2 table because for bind requests on INADDR_ANY (ipv4) or IPV6_ADDR_ANY (ipv6), every socket bound to that port must be checked for a potential conflict. The bhash table is the only source of port->socket associations. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24	netlink: fix some kernel-doc comments	Zhengchao Shao	1	-1/+3
	Modify the comment of input parameter of nlmsg_ and nla_ function. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220824013621.365103-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24	net: ethernet: ti: davinci_mdio: fix build for mdio bitbang uses	Randy Dunlap	1	-0/+1
	davinci_mdio.c uses mdio bitbang APIs, so it should select MDIO_BITBANG to prevent build errors. arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdio_remove': drivers/net/ethernet/ti/davinci_mdio.c:649: undefined reference to `free_mdio_bitbang' arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdio_probe': drivers/net/ethernet/ti/davinci_mdio.c:545: undefined reference to `alloc_mdio_bitbang' arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdiobb_read': drivers/net/ethernet/ti/davinci_mdio.c:236: undefined reference to `mdiobb_read' arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdiobb_write': drivers/net/ethernet/ti/davinci_mdio.c:253: undefined reference to `mdiobb_write' Fixes: d04807b80691 ("net: ethernet: ti: davinci_mdio: Add workaround for errata i2329") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Ravi Gunasekaran <r-gunasekaran@ti.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Sudip Mukherjee (Codethink) <sudipm.mukherjee@gmail.com> Link: https://lore.kernel.org/r/20220824024216.4939-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24	Documentation: sysctl: align cells in second content column	Bagas Sanjaya	1	-9/+9
	Stephen Rothwell reported htmldocs warning when merging net-next tree: Documentation/admin-guide/sysctl/net.rst:37: WARNING: Malformed table. Text in column margin in table line 4. ========= =================== = ========== ================== Directory Content Directory Content ========= =================== = ========== ================== 802 E802 protocol mptcp Multipath TCP appletalk Appletalk protocol netfilter Network Filter ax25 AX25 netrom NET/ROM bridge Bridging rose X.25 PLP layer core General parameter tipc TIPC ethernet Ethernet protocol unix Unix domain sockets ipv4 IP version 4 x25 X.25 protocol ipv6 IP version 6 ========= =================== = ========== ================== The warning above is caused by cells in second "Content" column of /proc/sys/net subdirectory table which are in column margin. Align these cells against the column header to fix the warning. Link: https://lore.kernel.org/linux-next/20220823134905.57ed08d5@canb.auug.org.au/ Fixes: 1202cdd665315c ("Remove DECnet support from kernel") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20220824035804.204322-1-bagasdotme@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24	Merge branch 'r8169-next'	David S. Miller	3	-248/+18
	Heiner Kallweit says: ==================== r8169: remove support for few unused chip versions There's a number of chip versions that apparently never made it to the mass market. Detection of these chip versions has been disabled for few kernel versions now and nobody complained. Therefore remove support for these chip versions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24	r8169: remove support for chip version 60	Heiner Kallweit	3	-90/+8
	Detection of this chip version has been disabled for few kernel versions now. Nobody complained, so remove support for this chip version. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24	r8169: remove support for chip version 50	Heiner Kallweit	3	-26/+3
	Detection of this chip version has been disabled for few kernel versions now. Nobody complained, so remove support for this chip version. v3: - rebase patch Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24	r8169: remove support for chip version 49	Heiner Kallweit	3	-47/+3
	Detection of this chip version has been disabled for few kernel versions now. Nobody complained, so remove support for this chip version. v2: - fix a typo: RTL_GIGA_MAC_VER_40 -> RTL_GIGA_MAC_VER_50 Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24	r8169: remove support for chip versions 45 and 47	Heiner Kallweit	3	-82/+5
	Detection of these chip versions has been disabled for few kernel versions now. Nobody complained, so remove support for this chip version. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24	r8169: remove support for chip version 41	Heiner Kallweit	3	-5/+1
	Detection of this chip version has been disabled for few kernel versions now. Nobody complained, so remove support for this chip version. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24	Merge tag 'mlx5-updates-2022-08-22' of ↵	David S. Miller	29	-765/+1246
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2022-08-22 Roi Dayan Says: =============== Add support for SF tunnel offload Mlx5 driver only supports VF tunnel offload. To add support for SF tunnel offload the driver needs to: 1. Add send-to-vport metadata matching rules like done for VFs. 2. Set an indirect table for SF vport, same as VF vport. info smaller sub functions for better maintainability. rules from esw init phase to representor load phase. SFs could be created after esw initialized and thus the send-to-vport meta rules would not be created for those SFs. By moving the creation of the rules to representor load phase we ensure creating the rules also for SFs created later. =============== Lama Kayal Says: ================ Make flow steering API loosely coupled from mlx5e_priv, in a manner to introduce more readable and maintainable modules. Make TC's private, let mlx5e_flow_steering struct be dynamically allocated, and introduce its API to maintain the code via setters and getters instead of publicly exposing it. Introduce flow steering debug macros to provide an elegant finish to the decoupled flow steering API, where errors related to flow steering shall be reported via them. All flow steering related files will drop any coupling to mlx5e_priv, instead they will get the relevant members as input. Among these, fs_tt_redirect, fs_tc, and arfs. ================
2022-08-24	micrel: ksz8851: fixes struct pointer issue	Jerry Ray	1	-3/+2
	Issue found during code review. This bug has no impact as long as the ks8851_net structure is the first element of the ks8851_net_spi structure. As long as the offset to the ks8851_net struct is zero, the container_of() macro is subtracting 0 and therefore no damage done. But if the ks8851_net_spi struct is ever modified such that the ks8851_net struct within it is no longer the first element of the struct, then the bug would manifest itself and cause problems. struct ks8851_net is contained within ks8851_net_spi. ks is contained within kss. kss is the priv_data of the netdev structure. Signed-off-by: Jerry Ray <jerry.ray@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24	tcp: annotate data-race around tcp_md5sig_pool_populated	Eric Dumazet	1	-4/+10
	tcp_md5sig_pool_populated can be read while another thread changes its value. The race has no consequence because allocations are protected with tcp_md5sig_mutex. This patch adds READ_ONCE() and WRITE_ONCE() to document the race and silence KCSAN. Reported-by: Abhishek Shah <abhishek.shah@columbia.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24	net: marvell: prestera: implement br_port_locked flag offloading	Oleksandr Mazur	5	-1/+34
	Both <port> br_port_locked and <lag> interfaces's flag offloading is supported. No new ABI is being added, rather existing (port_param_set) API call gets extended. Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> V2: add missing receipents (linux-kernel, netdev) Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24	Merge branch 'j7200-support'	David S. Miller	3	-9/+55
	Siddharth Vadapalli says: ==================== J7200: CPSW5G: Add support for QSGMII mode to am65-cpsw driver Add support for QSGMII mode to am65-cpsw driver. Change log: v4-> v5: 1. Move ti,j7200-cpswxg-nuss compatible to the line above the ti,j721e-cpsw-nuss compatible. 2. Add allOf and move if-then statements within it to allow future if-then statements to be added easily. v3 -> v4: 1. Update bindings to disallow ports based on compatible, instead of adding a new if/then statement for the new compatible. 2. Add Else-If condition for RMII mode in the set of supported interfaces. Support for RMII mode is already present in the driver and I had missed out adding a condition for RMII mode in the previous patches. v2 -> v3: 1. In ti,k3-am654-cpsw-nuss.yaml, restrict if/then statement to port nodes. v1 -> v2: 1. Add new compatible for CPSW5G in ti,k3-am654-cpsw-nuss.yaml and extend properties for new compatible. 2. Add extra_modes member to struct am65_cpsw_pdata to be used for QSGMII mode by new compatible. 3. Add check for phylink supported modes to ensure that only one phy mode is advertised as supported. 4. Check if extra_modes supports QSGMII mode in am65_cpsw_nuss_mac_config() for register write. 5. Add check for assigning port->sgmii_base only when extra_modes is valid. v4: https://lore.kernel.org/r/20220816060139.111934-1-s-vadapalli@ti.com/ v3: https://lore.kernel.org/r/20220606110443.30362-1-s-vadapalli@ti.com/ v2: https://lore.kernel.org/r/20220602114558.6204-1-s-vadapalli@ti.com/ v1: https://lore.kernel.org/r/20220531113058.23708-1-s-vadapalli@ti.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24	net: ethernet: ti: am65-cpsw: Move phy_set_mode_ext() to correct location	Siddharth Vadapalli	1	-5/+4
	In TI's J7200 SoC CPSW5G ports, each of the 4 ports can be configured as a QSGMII main or QSGMII-SUB port. This configuration is performed by phy-gmii-sel driver on invoking the phy_set_mode_ext() function. It is necessary for the QSGMII main port to be configured before any of the QSGMII-SUB interfaces are brought up. Currently, the QSGMII-SUB interfaces come up before the QSGMII main port is configured. Fix this by moving the call to phy_set_mode_ext() from am65_cpsw_nuss_ndo_slave_open() to am65_cpsw_nuss_init_slave_ports(), thereby ensuring that the QSGMII main port is configured before any of the QSGMII-SUB ports are brought up. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24	net: ethernet: ti: am65-cpsw: Add support for J7200 CPSW5G	Siddharth Vadapalli	2	-2/+35
	CPSW5G in J7200 supports additional modes like QSGMII and SGMII. Add new compatible for J7200 and enable QSGMII mode in am65-cpsw driver. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24	dt-bindings: net: ti: k3-am654-cpsw-nuss: Update bindings for J7200 CPSW5G	Siddharth Vadapalli	1	-2/+16
	Update bindings for TI K3 J7200 SoC which contains 5 ports (4 external ports) CPSW5G module and add compatible for it. Changes made: - Add new compatible ti,j7200-cpswxg-nuss for CPSW5G. - Extend pattern properties for new compatible. - Change maximum number of CPSW ports to 4 for new compatible. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>