From a063f2fba3fa633a599253b62561051ac185fa99 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Sat, 16 Apr 2022 13:51:10 +0200 Subject: batman-adv: Don't skb_split skbuffs with frag_list The receiving interface might have used GRO to receive more fragments than MAX_SKB_FRAGS fragments. In this case, these will not be stored in skb_shinfo(skb)->frags but merged into the frag list. batman-adv relies on the function skb_split to split packets up into multiple smaller packets which are not larger than the MTU on the outgoing interface. But this function cannot handle frag_list entries and is only operating on skb_shinfo(skb)->frags. If it is still trying to split such an skb and xmit'ing it on an interface without support for NETIF_F_FRAGLIST, then validate_xmit_skb() will try to linearize it. But this fails due to inconsistent information. And __pskb_pull_tail will trigger a BUG_ON after skb_copy_bits() returns an error. In case of entries in frag_list, just linearize the skb before operating on it with skb_split(). Reported-by: Felix Kaechele Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Signed-off-by: Sven Eckelmann Tested-by: Felix Kaechele Signed-off-by: Simon Wunderlich --- net/batman-adv/fragmentation.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/net/batman-adv/fragmentation.c b/net/batman-adv/fragmentation.c index 0899a729a23f..c120c7c6d25f 100644 --- a/net/batman-adv/fragmentation.c +++ b/net/batman-adv/fragmentation.c @@ -475,6 +475,17 @@ int batadv_frag_send_packet(struct sk_buff *skb, goto free_skb; } + /* GRO might have added fragments to the fragment list instead of + * frags[]. But this is not handled by skb_split and must be + * linearized to avoid incorrect length information after all + * batman-adv fragments were created and submitted to the + * hard-interface + */ + if (skb_has_frag_list(skb) && __skb_linearize(skb)) { + ret = -ENOMEM; + goto free_skb; + } + /* Create one header to be copied to all fragments */ frag_header.packet_type = BATADV_UNICAST_FRAG; frag_header.version = BATADV_COMPAT_VERSION; -- cgit v1.2.3 From 7635a1ad8d92dcc8247b53f949e37795154b5b6f Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Mon, 11 Apr 2022 08:42:10 -0700 Subject: iwlwifi: iwl-dbg: Use del_timer_sync() before freeing In Chrome OS, a large number of crashes is observed due to corrupted timer lists. Steven Rostedt pointed out that this usually happens when a timer is freed while still active, and that the problem is often triggered by code calling del_timer() instead of del_timer_sync() just before freeing. Steven also identified the iwlwifi driver as one of the possible culprits since it does exactly that. Reported-by: Steven Rostedt Cc: Steven Rostedt Cc: Johannes Berg Cc: Gregory Greenman Fixes: 60e8abd9d3e91 ("iwlwifi: dbg_ini: add periodic trigger new API support") Signed-off-by: Guenter Roeck Acked-by: Gregory Greenman Tested-by: Sedat Dilek # Linux v5.17.3-rc1 and Debian LLVM-14 Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20220411154210.1870008-1-linux@roeck-us.net --- drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c b/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c index 866a33f49915..3237d4b528b5 100644 --- a/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c +++ b/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c @@ -371,7 +371,7 @@ void iwl_dbg_tlv_del_timers(struct iwl_trans *trans) struct iwl_dbg_tlv_timer_node *node, *tmp; list_for_each_entry_safe(node, tmp, timer_list, list) { - del_timer(&node->timer); + del_timer_sync(&node->timer); list_del(&node->list); kfree(node); } -- cgit v1.2.3 From 4dd4e6f659850f2df20b9612593f5a0f040549e1 Mon Sep 17 00:00:00 2001 From: Gregory Greenman Date: Tue, 12 Apr 2022 22:01:41 +0300 Subject: MAINTAINERS: update iwlwifi driver maintainer Set myself as a maintainer of iwlwifi driver as Luca is moving to a new role. Signed-off-by: Gregory Greenman Acked-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20220412190141.4543-1-gregory.greenman@intel.com --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index c8fe97a8d60e..6a7a839acfe3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10129,7 +10129,7 @@ S: Supported F: drivers/net/wireless/intel/iwlegacy/ INTEL WIRELESS WIFI LINK (iwlwifi) -M: Luca Coelho +M: Gregory Greenman L: linux-wireless@vger.kernel.org S: Supported W: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi -- cgit v1.2.3 From bb300130e47fcefbe938f06dbacaef0312e28416 Mon Sep 17 00:00:00 2001 From: Wen Gong Date: Wed, 27 Apr 2022 14:16:19 +0300 Subject: ath11k: reduce the wait time of 11d scan and hw scan while add interface (cherry picked from commit 1f682dc9fb3790aa7ec27d3d122ff32b1eda1365 in wireless-next) Currently ath11k will wait 11d scan complete while add interface in ath11k_mac_op_add_interface(), when system resume without enable wowlan, ath11k_mac_op_add_interface() is called for each resume, thus it increase the resume time of system. And ath11k_mac_op_hw_scan() after ath11k_mac_op_add_interface() also needs some time cost because the previous 11d scan need more than 5 seconds when 6 GHz is enabled, then the scan started event will indicated to ath11k after the 11d scan completed. While 11d scan/hw scan is running in firmware, if ath11k update channel list to firmware by WMI_SCAN_CHAN_LIST_CMDID, then firmware will cancel the current scan which is running, it lead the scan failed. The patch commit 9dcf6808b253 ("ath11k: add 11d scan offload support") used finish_11d_scan/finish_11d_ch_list/pending_11d to synchronize the 11d scan/hw scan/channel list between ath11k/firmware/mac80211 and to avoid the scan fail. Add wait operation before ath11k update channel list, function ath11k_reg_update_chan_list() will wait until the current 11d scan/hw scan completed. And remove the wait operation of start 11d scan and waiting channel list complete in hw scan. After these changes, resume time cost reduce about 5 seconds and also hw scan time cost reduced obviously, and scan failed not seen. The 11d scan is sent to firmware only one time for each interface added in mac.c, and it is moved after the 1st hw scan because 11d scan will cost some time and thus leads the AP scan result update to UI delay. Currently priority of ath11k's hw scan is WMI_SCAN_PRIORITY_LOW, and priority of 11d scan in firmware is WMI_SCAN_PRIORITY_MEDIUM, then the 11d scan which sent after hw scan will cancel the hw scan in firmware, so change the priority to WMI_SCAN_PRIORITY_MEDIUM for the hw scan which is in front of the 11d scan, thus it will not happen scan cancel in firmware. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 Fixes: 9dcf6808b253 ("ath11k: add 11d scan offload support") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215777 Cc: Signed-off-by: Wen Gong Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20220328035832.14122-1-quic_wgong@quicinc.com Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20220427111619.9758-1-kvalo@kernel.org --- drivers/net/wireless/ath/ath11k/core.c | 1 + drivers/net/wireless/ath/ath11k/core.h | 13 +++++-- drivers/net/wireless/ath/ath11k/mac.c | 71 ++++++++++++++-------------------- drivers/net/wireless/ath/ath11k/mac.h | 2 +- drivers/net/wireless/ath/ath11k/reg.c | 43 +++++++++++++------- drivers/net/wireless/ath/ath11k/reg.h | 2 +- drivers/net/wireless/ath/ath11k/wmi.c | 16 +++++++- 7 files changed, 84 insertions(+), 64 deletions(-) diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c index 71eb7d04c3bf..90a5df1fbdbd 100644 --- a/drivers/net/wireless/ath/ath11k/core.c +++ b/drivers/net/wireless/ath/ath11k/core.c @@ -1288,6 +1288,7 @@ static void ath11k_core_restart(struct work_struct *work) ieee80211_stop_queues(ar->hw); ath11k_mac_drain_tx(ar); + complete(&ar->completed_11d_scan); complete(&ar->scan.started); complete(&ar->scan.completed); complete(&ar->peer_assoc_done); diff --git a/drivers/net/wireless/ath/ath11k/core.h b/drivers/net/wireless/ath/ath11k/core.h index c0228e91a596..b8634eddf49a 100644 --- a/drivers/net/wireless/ath/ath11k/core.h +++ b/drivers/net/wireless/ath/ath11k/core.h @@ -38,6 +38,8 @@ extern unsigned int ath11k_frame_mode; +#define ATH11K_SCAN_TIMEOUT_HZ (20 * HZ) + #define ATH11K_MON_TIMER_INTERVAL 10 enum ath11k_supported_bw { @@ -189,6 +191,12 @@ enum ath11k_scan_state { ATH11K_SCAN_ABORTING, }; +enum ath11k_11d_state { + ATH11K_11D_IDLE, + ATH11K_11D_PREPARING, + ATH11K_11D_RUNNING, +}; + enum ath11k_dev_flags { ATH11K_CAC_RUNNING, ATH11K_FLAG_CORE_REGISTERED, @@ -607,9 +615,8 @@ struct ath11k { bool dfs_block_radar_events; struct ath11k_thermal thermal; u32 vdev_id_11d_scan; - struct completion finish_11d_scan; - struct completion finish_11d_ch_list; - bool pending_11d; + struct completion completed_11d_scan; + enum ath11k_11d_state state_11d; bool regdom_set_by_user; int hw_rate_code; u8 twt_enabled; diff --git a/drivers/net/wireless/ath/ath11k/mac.c b/drivers/net/wireless/ath/ath11k/mac.c index e6b34b0d61bd..58ff761393db 100644 --- a/drivers/net/wireless/ath/ath11k/mac.c +++ b/drivers/net/wireless/ath/ath11k/mac.c @@ -3601,26 +3601,6 @@ static int ath11k_mac_op_hw_scan(struct ieee80211_hw *hw, if (ret) goto exit; - /* Currently the pending_11d=true only happened 1 time while - * wlan interface up in ath11k_mac_11d_scan_start(), it is called by - * ath11k_mac_op_add_interface(), after wlan interface up, - * pending_11d=false always. - * If remove below wait, it always happened scan fail and lead connect - * fail while wlan interface up, because it has a 11d scan which is running - * in firmware, and lead this scan failed. - */ - if (ar->pending_11d) { - long time_left; - unsigned long timeout = 5 * HZ; - - if (ar->supports_6ghz) - timeout += 5 * HZ; - - time_left = wait_for_completion_timeout(&ar->finish_11d_ch_list, timeout); - ath11k_dbg(ar->ab, ATH11K_DBG_MAC, - "mac wait 11d channel list time left %ld\n", time_left); - } - memset(&arg, 0, sizeof(arg)); ath11k_wmi_start_scan_init(ar, &arg); arg.vdev_id = arvif->vdev_id; @@ -3686,6 +3666,10 @@ exit: kfree(arg.extraie.ptr); mutex_unlock(&ar->conf_mutex); + + if (ar->state_11d == ATH11K_11D_PREPARING) + ath11k_mac_11d_scan_start(ar, arvif->vdev_id); + return ret; } @@ -5814,7 +5798,7 @@ static int ath11k_mac_op_start(struct ieee80211_hw *hw) /* TODO: Do we need to enable ANI? */ - ath11k_reg_update_chan_list(ar); + ath11k_reg_update_chan_list(ar, false); ar->num_started_vdevs = 0; ar->num_created_vdevs = 0; @@ -5881,6 +5865,11 @@ static void ath11k_mac_op_stop(struct ieee80211_hw *hw) cancel_work_sync(&ar->ab->update_11d_work); cancel_work_sync(&ar->ab->rfkill_work); + if (ar->state_11d == ATH11K_11D_PREPARING) { + ar->state_11d = ATH11K_11D_IDLE; + complete(&ar->completed_11d_scan); + } + spin_lock_bh(&ar->data_lock); list_for_each_entry_safe(ppdu_stats, tmp, &ar->ppdu_stats_info, list) { list_del(&ppdu_stats->list); @@ -6051,7 +6040,7 @@ static bool ath11k_mac_vif_ap_active_any(struct ath11k_base *ab) return false; } -void ath11k_mac_11d_scan_start(struct ath11k *ar, u32 vdev_id, bool wait) +void ath11k_mac_11d_scan_start(struct ath11k *ar, u32 vdev_id) { struct wmi_11d_scan_start_params param; int ret; @@ -6079,28 +6068,22 @@ void ath11k_mac_11d_scan_start(struct ath11k *ar, u32 vdev_id, bool wait) ath11k_dbg(ar->ab, ATH11K_DBG_MAC, "mac start 11d scan\n"); - if (wait) - reinit_completion(&ar->finish_11d_scan); - ret = ath11k_wmi_send_11d_scan_start_cmd(ar, ¶m); if (ret) { ath11k_warn(ar->ab, "failed to start 11d scan vdev %d ret: %d\n", vdev_id, ret); } else { ar->vdev_id_11d_scan = vdev_id; - if (wait) { - ar->pending_11d = true; - ret = wait_for_completion_timeout(&ar->finish_11d_scan, - 5 * HZ); - ath11k_dbg(ar->ab, ATH11K_DBG_MAC, - "mac 11d scan left time %d\n", ret); - - if (!ret) - ar->pending_11d = false; - } + if (ar->state_11d == ATH11K_11D_PREPARING) + ar->state_11d = ATH11K_11D_RUNNING; } fin: + if (ar->state_11d == ATH11K_11D_PREPARING) { + ar->state_11d = ATH11K_11D_IDLE; + complete(&ar->completed_11d_scan); + } + mutex_unlock(&ar->ab->vdev_id_11d_lock); } @@ -6123,12 +6106,15 @@ void ath11k_mac_11d_scan_stop(struct ath11k *ar) vdev_id = ar->vdev_id_11d_scan; ret = ath11k_wmi_send_11d_scan_stop_cmd(ar, vdev_id); - if (ret) + if (ret) { ath11k_warn(ar->ab, "failed to stopt 11d scan vdev %d ret: %d\n", vdev_id, ret); - else + } else { ar->vdev_id_11d_scan = ATH11K_11D_INVALID_VDEV_ID; + ar->state_11d = ATH11K_11D_IDLE; + complete(&ar->completed_11d_scan); + } } mutex_unlock(&ar->ab->vdev_id_11d_lock); } @@ -6324,8 +6310,10 @@ static int ath11k_mac_op_add_interface(struct ieee80211_hw *hw, goto err_peer_del; } - ath11k_mac_11d_scan_start(ar, arvif->vdev_id, true); - + if (test_bit(WMI_TLV_SERVICE_11D_OFFLOAD, ab->wmi_ab.svc_map)) { + reinit_completion(&ar->completed_11d_scan); + ar->state_11d = ATH11K_11D_PREPARING; + } break; case WMI_VDEV_TYPE_MONITOR: set_bit(ATH11K_FLAG_MONITOR_VDEV_CREATED, &ar->monitor_flags); @@ -7190,7 +7178,7 @@ ath11k_mac_op_unassign_vif_chanctx(struct ieee80211_hw *hw, } if (arvif->vdev_type == WMI_VDEV_TYPE_STA) - ath11k_mac_11d_scan_start(ar, arvif->vdev_id, false); + ath11k_mac_11d_scan_start(ar, arvif->vdev_id); mutex_unlock(&ar->conf_mutex); } @@ -8671,8 +8659,7 @@ int ath11k_mac_allocate(struct ath11k_base *ab) ar->monitor_vdev_id = -1; clear_bit(ATH11K_FLAG_MONITOR_VDEV_CREATED, &ar->monitor_flags); ar->vdev_id_11d_scan = ATH11K_11D_INVALID_VDEV_ID; - init_completion(&ar->finish_11d_scan); - init_completion(&ar->finish_11d_ch_list); + init_completion(&ar->completed_11d_scan); } return 0; diff --git a/drivers/net/wireless/ath/ath11k/mac.h b/drivers/net/wireless/ath/ath11k/mac.h index 0e6c870b09c8..29b523af66dd 100644 --- a/drivers/net/wireless/ath/ath11k/mac.h +++ b/drivers/net/wireless/ath/ath11k/mac.h @@ -130,7 +130,7 @@ extern const struct htt_rx_ring_tlv_filter ath11k_mac_mon_status_filter_default; #define ATH11K_SCAN_11D_INTERVAL 600000 #define ATH11K_11D_INVALID_VDEV_ID 0xFFFF -void ath11k_mac_11d_scan_start(struct ath11k *ar, u32 vdev_id, bool wait); +void ath11k_mac_11d_scan_start(struct ath11k *ar, u32 vdev_id); void ath11k_mac_11d_scan_stop(struct ath11k *ar); void ath11k_mac_11d_scan_stop_all(struct ath11k_base *ab); diff --git a/drivers/net/wireless/ath/ath11k/reg.c b/drivers/net/wireless/ath/ath11k/reg.c index 81e11cde31d7..80a697771393 100644 --- a/drivers/net/wireless/ath/ath11k/reg.c +++ b/drivers/net/wireless/ath/ath11k/reg.c @@ -102,7 +102,7 @@ ath11k_reg_notifier(struct wiphy *wiphy, struct regulatory_request *request) ar->regdom_set_by_user = true; } -int ath11k_reg_update_chan_list(struct ath11k *ar) +int ath11k_reg_update_chan_list(struct ath11k *ar, bool wait) { struct ieee80211_supported_band **bands; struct scan_chan_list_params *params; @@ -111,7 +111,32 @@ int ath11k_reg_update_chan_list(struct ath11k *ar) struct channel_param *ch; enum nl80211_band band; int num_channels = 0; - int i, ret; + int i, ret, left; + + if (wait && ar->state_11d != ATH11K_11D_IDLE) { + left = wait_for_completion_timeout(&ar->completed_11d_scan, + ATH11K_SCAN_TIMEOUT_HZ); + if (!left) { + ath11k_dbg(ar->ab, ATH11K_DBG_REG, + "failed to receive 11d scan complete: timed out\n"); + ar->state_11d = ATH11K_11D_IDLE; + } + ath11k_dbg(ar->ab, ATH11K_DBG_REG, + "reg 11d scan wait left time %d\n", left); + } + + if (wait && + (ar->scan.state == ATH11K_SCAN_STARTING || + ar->scan.state == ATH11K_SCAN_RUNNING)) { + left = wait_for_completion_timeout(&ar->scan.completed, + ATH11K_SCAN_TIMEOUT_HZ); + if (!left) + ath11k_dbg(ar->ab, ATH11K_DBG_REG, + "failed to receive hw scan complete: timed out\n"); + + ath11k_dbg(ar->ab, ATH11K_DBG_REG, + "reg hw scan wait left time %d\n", left); + } bands = hw->wiphy->bands; for (band = 0; band < NUM_NL80211_BANDS; band++) { @@ -193,11 +218,6 @@ int ath11k_reg_update_chan_list(struct ath11k *ar) ret = ath11k_wmi_send_scan_chan_list_cmd(ar, params); kfree(params); - if (ar->pending_11d) { - complete(&ar->finish_11d_ch_list); - ar->pending_11d = false; - } - return ret; } @@ -263,15 +283,8 @@ int ath11k_regd_update(struct ath11k *ar) goto err; } - if (ar->pending_11d) - complete(&ar->finish_11d_scan); - rtnl_lock(); wiphy_lock(ar->hw->wiphy); - - if (ar->pending_11d) - reinit_completion(&ar->finish_11d_ch_list); - ret = regulatory_set_wiphy_regd_sync(ar->hw->wiphy, regd_copy); wiphy_unlock(ar->hw->wiphy); rtnl_unlock(); @@ -282,7 +295,7 @@ int ath11k_regd_update(struct ath11k *ar) goto err; if (ar->state == ATH11K_STATE_ON) { - ret = ath11k_reg_update_chan_list(ar); + ret = ath11k_reg_update_chan_list(ar, true); if (ret) goto err; } diff --git a/drivers/net/wireless/ath/ath11k/reg.h b/drivers/net/wireless/ath/ath11k/reg.h index 5fb9dc03a74e..2f284f26378d 100644 --- a/drivers/net/wireless/ath/ath11k/reg.h +++ b/drivers/net/wireless/ath/ath11k/reg.h @@ -32,5 +32,5 @@ struct ieee80211_regdomain * ath11k_reg_build_regd(struct ath11k_base *ab, struct cur_regulatory_info *reg_info, bool intersect); int ath11k_regd_update(struct ath11k *ar); -int ath11k_reg_update_chan_list(struct ath11k *ar); +int ath11k_reg_update_chan_list(struct ath11k *ar, bool wait); #endif diff --git a/drivers/net/wireless/ath/ath11k/wmi.c b/drivers/net/wireless/ath/ath11k/wmi.c index b4f86c45d81f..2751fe8814df 100644 --- a/drivers/net/wireless/ath/ath11k/wmi.c +++ b/drivers/net/wireless/ath/ath11k/wmi.c @@ -2015,7 +2015,10 @@ void ath11k_wmi_start_scan_init(struct ath11k *ar, { /* setup commonly used values */ arg->scan_req_id = 1; - arg->scan_priority = WMI_SCAN_PRIORITY_LOW; + if (ar->state_11d == ATH11K_11D_PREPARING) + arg->scan_priority = WMI_SCAN_PRIORITY_MEDIUM; + else + arg->scan_priority = WMI_SCAN_PRIORITY_LOW; arg->dwell_time_active = 50; arg->dwell_time_active_2g = 0; arg->dwell_time_passive = 150; @@ -6350,8 +6353,10 @@ static void ath11k_wmi_op_ep_tx_credits(struct ath11k_base *ab) static int ath11k_reg_11d_new_cc_event(struct ath11k_base *ab, struct sk_buff *skb) { const struct wmi_11d_new_cc_ev *ev; + struct ath11k *ar; + struct ath11k_pdev *pdev; const void **tb; - int ret; + int ret, i; tb = ath11k_wmi_tlv_parse_alloc(ab, skb->data, skb->len, GFP_ATOMIC); if (IS_ERR(tb)) { @@ -6377,6 +6382,13 @@ static int ath11k_reg_11d_new_cc_event(struct ath11k_base *ab, struct sk_buff *s kfree(tb); + for (i = 0; i < ab->num_radios; i++) { + pdev = &ab->pdevs[i]; + ar = pdev->ar; + ar->state_11d = ATH11K_11D_IDLE; + complete(&ar->completed_11d_scan); + } + queue_work(ab->workqueue, &ab->update_11d_work); return 0; -- cgit v1.2.3 From 5e469ed9764d4722c59562da13120bd2dc6834c5 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Wed, 20 Apr 2022 12:50:38 +0200 Subject: mac80211: fix rx reordering with non explicit / psmp ack policy When the QoS ack policy was set to non explicit / psmp ack, frames are treated as not being part of a BA session, which causes extra latency on reordering. Fix this by only bypassing reordering for packets with no-ack policy Signed-off-by: Felix Fietkau Link: https://lore.kernel.org/r/20220420105038.36443-1-nbd@nbd.name Signed-off-by: Johannes Berg --- net/mac80211/rx.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index beb6b92eb780..88d797fa82ff 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -1405,8 +1405,7 @@ static void ieee80211_rx_reorder_ampdu(struct ieee80211_rx_data *rx, goto dont_reorder; /* not part of a BA session */ - if (ack_policy != IEEE80211_QOS_CTL_ACK_POLICY_BLOCKACK && - ack_policy != IEEE80211_QOS_CTL_ACK_POLICY_NORMAL) + if (ack_policy == IEEE80211_QOS_CTL_ACK_POLICY_NOACK) goto dont_reorder; /* new, potentially un-ordered, ampdu frame - process it */ -- cgit v1.2.3 From 5d087aa759eb82b8208411913f6c2158bd85abc0 Mon Sep 17 00:00:00 2001 From: Kieran Frewen Date: Wed, 20 Apr 2022 04:13:21 +0000 Subject: nl80211: validate S1G channel width Validate the S1G channel width input by user to ensure it matches that of the requested channel Signed-off-by: Kieran Frewen Signed-off-by: Bassem Dawood Link: https://lore.kernel.org/r/20220420041321.3788789-2-kieran.frewen@morsemicro.com Signed-off-by: Johannes Berg --- net/wireless/nl80211.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 21e808fcb676..aa6094c3c9b0 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -3173,6 +3173,15 @@ int nl80211_parse_chandef(struct cfg80211_registered_device *rdev, } else if (attrs[NL80211_ATTR_CHANNEL_WIDTH]) { chandef->width = nla_get_u32(attrs[NL80211_ATTR_CHANNEL_WIDTH]); + if (chandef->chan->band == NL80211_BAND_S1GHZ) { + /* User input error for channel width doesn't match channel */ + if (chandef->width != ieee80211_s1g_channel_width(chandef->chan)) { + NL_SET_ERR_MSG_ATTR(extack, + attrs[NL80211_ATTR_CHANNEL_WIDTH], + "bad channel width"); + return -EINVAL; + } + } if (attrs[NL80211_ATTR_CENTER_FREQ1]) { chandef->center_freq1 = nla_get_u32(attrs[NL80211_ATTR_CENTER_FREQ1]); -- cgit v1.2.3 From e847ffe2d146cfd52980ca688d84358e024a6e70 Mon Sep 17 00:00:00 2001 From: Kieran Frewen Date: Wed, 20 Apr 2022 04:13:20 +0000 Subject: cfg80211: retrieve S1G operating channel number When retrieving the S1G channel number from IEs, we should retrieve the operating channel instead of the primary channel. The S1G operation element specifies the main channel of operation as the oper channel, unlike for HT and HE which specify their main channel of operation as the primary channel. Signed-off-by: Kieran Frewen Signed-off-by: Bassem Dawood Link: https://lore.kernel.org/r/20220420041321.3788789-1-kieran.frewen@morsemicro.com Signed-off-by: Johannes Berg --- net/wireless/scan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/wireless/scan.c b/net/wireless/scan.c index 4a6d86432910..6d82bd9eaf8c 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -1829,7 +1829,7 @@ int cfg80211_get_ies_channel_number(const u8 *ie, size_t ielen, if (tmp && tmp->datalen >= sizeof(struct ieee80211_s1g_oper_ie)) { struct ieee80211_s1g_oper_ie *s1gop = (void *)tmp->data; - return s1gop->primary_ch; + return s1gop->oper_ch; } } else { tmp = cfg80211_find_elem(WLAN_EID_DS_PARAMS, ie, ielen); -- cgit v1.2.3 From 86af062f40a73bf63321694e6bf637144f0383fe Mon Sep 17 00:00:00 2001 From: Manikanta Pubbisetty Date: Thu, 28 Apr 2022 10:57:44 +0530 Subject: mac80211: Reset MBSSID parameters upon connection Currently MBSSID parameters in struct ieee80211_bss_conf are not reset upon connection. This could be problematic with some drivers in a scenario where the device first connects to a non-transmit BSS and then connects to a transmit BSS of a Multi BSS AP. The MBSSID parameters which are set after connecting to a non-transmit BSS will not be reset and the same parameters will be passed on to the driver during the subsequent connection to a transmit BSS of a Multi BSS AP. For example, firmware running on the ath11k device uses the Multi BSS data for tracking the beacon of a non-transmit BSS and reports the driver when there is a beacon miss. If we do not reset the MBSSID parameters during the subsequent connection to a transmit BSS, then the driver would have wrong MBSSID data and FW would be looking for an incorrect BSSID in the MBSSID beacon of a Multi BSS AP and reports beacon loss leading to an unstable connection. Reset the MBSSID parameters upon every connection to solve this problem. Fixes: 78ac51f81532 ("mac80211: support multi-bssid") Signed-off-by: Manikanta Pubbisetty Link: https://lore.kernel.org/r/20220428052744.27040-1-quic_mpubbise@quicinc.com Signed-off-by: Johannes Berg --- net/mac80211/mlme.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index 1b30c724ca8d..dc8aec1a5d3d 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -3657,6 +3657,12 @@ static bool ieee80211_assoc_success(struct ieee80211_sub_if_data *sdata, cbss->transmitted_bss->bssid); bss_conf->bssid_indicator = cbss->max_bssid_indicator; bss_conf->bssid_index = cbss->bssid_index; + } else { + bss_conf->nontransmitted = false; + memset(bss_conf->transmitter_bssid, 0, + sizeof(bss_conf->transmitter_bssid)); + bss_conf->bssid_indicator = 0; + bss_conf->bssid_index = 0; } /* -- cgit v1.2.3 From 6997fbd7a3dafa754f81d541498ace35b43246d8 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Thu, 5 May 2022 10:53:53 +0900 Subject: net: rds: use maybe_get_net() when acquiring refcount on TCP sockets Eric Dumazet is reporting addition on 0 problem at rds_tcp_tune(), for delayed works queued in rds_wq might be invoked after a net namespace's refcount already reached 0. Since rds_tcp_exit_net() from cleanup_net() calls flush_workqueue(rds_wq), it is guaranteed that we can instead use maybe_get_net() from delayed work functions until rds_tcp_exit_net() returns. Note that I'm not convinced that all works which might access a net namespace are already queued in rds_wq by the moment rds_tcp_exit_net() calls flush_workqueue(rds_wq). If some race is there, rds_tcp_exit_net() will fail to wait for work functions, and kmem_cache_free() could be called from net_free() before maybe_get_net() is called from rds_tcp_tune(). Reported-by: Eric Dumazet Fixes: 3a58f13a881ed351 ("net: rds: acquire refcount on TCP sockets") Signed-off-by: Tetsuo Handa Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/41d09faf-bc78-1a87-dfd1-c6d1b5984b61@I-love.SAKURA.ne.jp Signed-off-by: Jakub Kicinski --- net/rds/tcp.c | 12 +++++++++--- net/rds/tcp.h | 2 +- net/rds/tcp_connect.c | 5 ++++- net/rds/tcp_listen.c | 5 ++++- 4 files changed, 18 insertions(+), 6 deletions(-) diff --git a/net/rds/tcp.c b/net/rds/tcp.c index 2f638f8b7b1e..73ee2771093d 100644 --- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -487,11 +487,11 @@ struct rds_tcp_net { /* All module specific customizations to the RDS-TCP socket should be done in * rds_tcp_tune() and applied after socket creation. */ -void rds_tcp_tune(struct socket *sock) +bool rds_tcp_tune(struct socket *sock) { struct sock *sk = sock->sk; struct net *net = sock_net(sk); - struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); + struct rds_tcp_net *rtn; tcp_sock_set_nodelay(sock->sk); lock_sock(sk); @@ -499,10 +499,15 @@ void rds_tcp_tune(struct socket *sock) * a process which created this net namespace terminated. */ if (!sk->sk_net_refcnt) { + if (!maybe_get_net(net)) { + release_sock(sk); + return false; + } sk->sk_net_refcnt = 1; - get_net_track(net, &sk->ns_tracker, GFP_KERNEL); + netns_tracker_alloc(net, &sk->ns_tracker, GFP_KERNEL); sock_inuse_add(net, 1); } + rtn = net_generic(net, rds_tcp_netid); if (rtn->sndbuf_size > 0) { sk->sk_sndbuf = rtn->sndbuf_size; sk->sk_userlocks |= SOCK_SNDBUF_LOCK; @@ -512,6 +517,7 @@ void rds_tcp_tune(struct socket *sock) sk->sk_userlocks |= SOCK_RCVBUF_LOCK; } release_sock(sk); + return true; } static void rds_tcp_accept_worker(struct work_struct *work) diff --git a/net/rds/tcp.h b/net/rds/tcp.h index dc8d745d6857..f8b5930d7b34 100644 --- a/net/rds/tcp.h +++ b/net/rds/tcp.h @@ -49,7 +49,7 @@ struct rds_tcp_statistics { }; /* tcp.c */ -void rds_tcp_tune(struct socket *sock); +bool rds_tcp_tune(struct socket *sock); void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp); void rds_tcp_reset_callbacks(struct socket *sock, struct rds_conn_path *cp); void rds_tcp_restore_callbacks(struct socket *sock, diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c index 5461d77fff4f..f0c477c5d1db 100644 --- a/net/rds/tcp_connect.c +++ b/net/rds/tcp_connect.c @@ -124,7 +124,10 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp) if (ret < 0) goto out; - rds_tcp_tune(sock); + if (!rds_tcp_tune(sock)) { + ret = -EINVAL; + goto out; + } if (isv6) { sin6.sin6_family = AF_INET6; diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c index 09cadd556d1e..7edf2e69d3fe 100644 --- a/net/rds/tcp_listen.c +++ b/net/rds/tcp_listen.c @@ -133,7 +133,10 @@ int rds_tcp_accept_one(struct socket *sock) __module_get(new_sock->ops->owner); rds_tcp_keepalive(new_sock); - rds_tcp_tune(new_sock); + if (!rds_tcp_tune(new_sock)) { + ret = -EINVAL; + goto out; + } inet = inet_sk(new_sock->sk); -- cgit v1.2.3 From e333eed63a091a09bd0db191b7710c594c6e995b Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Wed, 4 May 2022 11:31:03 -0300 Subject: net: phy: micrel: Do not use kszphy_suspend/resume for KSZ8061 Since commit f1131b9c23fb ("net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices") the following NULL pointer dereference is observed on a board with KSZ8061: # udhcpc -i eth0 udhcpc: started, v1.35.0 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000008 pgd = f73cef4e [00000008] *pgd=00000000 Internal error: Oops: 5 [#1] SMP ARM Modules linked in: CPU: 0 PID: 196 Comm: ifconfig Not tainted 5.15.37-dirty #94 Hardware name: Freescale i.MX6 SoloX (Device Tree) PC is at kszphy_config_reset+0x10/0x114 LR is at kszphy_resume+0x24/0x64 ... The KSZ8061 phy_driver structure does not have the .probe/..driver_data fields, which means that priv is not allocated. This causes the NULL pointer dereference inside kszphy_config_reset(). Fix the problem by using the generic suspend/resume functions as before. Another alternative would be to provide the .probe and .driver_data information into the structure, but to be on the safe side, let's just restore Ethernet functionality by using the generic suspend/resume. Cc: stable@vger.kernel.org Fixes: f1131b9c23fb ("net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices") Signed-off-by: Fabio Estevam Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/20220504143104.1286960-1-festevam@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/phy/micrel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index fc53b71dc872..7c243cedde9f 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -2782,8 +2782,8 @@ static struct phy_driver ksphy_driver[] = { .config_init = ksz8061_config_init, .config_intr = kszphy_config_intr, .handle_interrupt = kszphy_handle_interrupt, - .suspend = kszphy_suspend, - .resume = kszphy_resume, + .suspend = genphy_suspend, + .resume = genphy_resume, }, { .phy_id = PHY_ID_KSZ9021, .phy_id_mask = 0x000ffffe, -- cgit v1.2.3 From 15f03ffe4bb951e982457f44b6cf6b06ef4cbb93 Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Wed, 4 May 2022 11:31:04 -0300 Subject: net: phy: micrel: Pass .probe for KS8737 Since commit f1131b9c23fb ("net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices") the kszphy_suspend/ resume hooks are used. These functions require the probe function to be called so that priv can be allocated. Otherwise, a NULL pointer dereference happens inside kszphy_config_reset(). Cc: stable@vger.kernel.org Fixes: f1131b9c23fb ("net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices") Reported-by: Andrew Lunn Signed-off-by: Fabio Estevam Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/20220504143104.1286960-2-festevam@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/phy/micrel.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 7c243cedde9f..9d7dafed3931 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -2657,6 +2657,7 @@ static struct phy_driver ksphy_driver[] = { .name = "Micrel KS8737", /* PHY_BASIC_FEATURES */ .driver_data = &ks8737_type, + .probe = kszphy_probe, .config_init = kszphy_config_init, .config_intr = kszphy_config_intr, .handle_interrupt = kszphy_handle_interrupt, -- cgit v1.2.3 From e1a7ac6f3ba6e157adcd0ca94d92a401f1943f56 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Wed, 4 May 2022 11:07:38 +0200 Subject: ping: fix address binding wrt vrf When ping_group_range is updated, 'ping' uses the DGRAM ICMP socket, instead of an IP raw socket. In this case, 'ping' is unable to bind its socket to a local address owned by a vrflite. Before the patch: $ sysctl -w net.ipv4.ping_group_range='0 2147483647' $ ip link add blue type vrf table 10 $ ip link add foo type dummy $ ip link set foo master blue $ ip link set foo up $ ip addr add 192.168.1.1/24 dev foo $ ip addr add 2001::1/64 dev foo $ ip vrf exec blue ping -c1 -I 192.168.1.1 192.168.1.2 ping: bind: Cannot assign requested address $ ip vrf exec blue ping6 -c1 -I 2001::1 2001::2 ping6: bind icmp socket: Cannot assign requested address CC: stable@vger.kernel.org Fixes: 1b69c6d0ae90 ("net: Introduce L3 Master device abstraction") Signed-off-by: Nicolas Dichtel Reviewed-by: David Ahern Signed-off-by: Jakub Kicinski --- net/ipv4/ping.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 3ee947557b88..aa9a11b20d18 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -305,6 +305,7 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk, struct net *net = sock_net(sk); if (sk->sk_family == AF_INET) { struct sockaddr_in *addr = (struct sockaddr_in *) uaddr; + u32 tb_id = RT_TABLE_LOCAL; int chk_addr_ret; if (addr_len < sizeof(*addr)) @@ -318,7 +319,8 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk, pr_debug("ping_check_bind_addr(sk=%p,addr=%pI4,port=%d)\n", sk, &addr->sin_addr.s_addr, ntohs(addr->sin_port)); - chk_addr_ret = inet_addr_type(net, addr->sin_addr.s_addr); + tb_id = l3mdev_fib_table_by_index(net, sk->sk_bound_dev_if) ? : tb_id; + chk_addr_ret = inet_addr_type_table(net, addr->sin_addr.s_addr, tb_id); if (!inet_addr_valid_or_nonlocal(net, inet_sk(sk), addr->sin_addr.s_addr, @@ -355,6 +357,14 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk, return -ENODEV; } } + + if (!dev && sk->sk_bound_dev_if) { + dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if); + if (!dev) { + rcu_read_unlock(); + return -ENODEV; + } + } has_addr = pingv6_ops.ipv6_chk_addr(net, &addr->sin6_addr, dev, scoped); rcu_read_unlock(); -- cgit v1.2.3 From e71b7f1f44d3d88c677769c85ef0171caf9fc89f Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Wed, 4 May 2022 11:07:39 +0200 Subject: selftests: add ping test with ping_group_range tuned The 'ping' utility is able to manage two kind of sockets (raw or icmp), depending on the sysctl ping_group_range. By default, ping_group_range is set to '1 0', which forces ping to use an ip raw socket. Let's replay the ping tests by allowing 'ping' to use the ip icmp socket. After the previous patch, ipv4 tests results are the same with both kinds of socket. For ipv6, there are a lot a new failures (the previous patch fixes only two cases). Signed-off-by: Nicolas Dichtel Reviewed-by: David Ahern Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/fcnal-test.sh | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh index 47c4d4b4a44a..54701c8b0cd7 100755 --- a/tools/testing/selftests/net/fcnal-test.sh +++ b/tools/testing/selftests/net/fcnal-test.sh @@ -810,10 +810,16 @@ ipv4_ping() setup set_sysctl net.ipv4.raw_l3mdev_accept=1 2>/dev/null ipv4_ping_novrf + setup + set_sysctl net.ipv4.ping_group_range='0 2147483647' 2>/dev/null + ipv4_ping_novrf log_subsection "With VRF" setup "yes" ipv4_ping_vrf + setup "yes" + set_sysctl net.ipv4.ping_group_range='0 2147483647' 2>/dev/null + ipv4_ping_vrf } ################################################################################ @@ -2348,10 +2354,16 @@ ipv6_ping() log_subsection "No VRF" setup ipv6_ping_novrf + setup + set_sysctl net.ipv4.ping_group_range='0 2147483647' 2>/dev/null + ipv6_ping_novrf log_subsection "With VRF" setup "yes" ipv6_ping_vrf + setup "yes" + set_sysctl net.ipv4.ping_group_range='0 2147483647' 2>/dev/null + ipv6_ping_vrf } ################################################################################ -- cgit v1.2.3 From 85db6352fc8a158a893151baa1716463d34a20d0 Mon Sep 17 00:00:00 2001 From: Tariq Toukan Date: Wed, 4 May 2022 11:09:14 +0300 Subject: net: Fix features skip in for_each_netdev_feature() The find_next_netdev_feature() macro gets the "remaining length", not bit index. Passing "bit - 1" for the following iteration is wrong as it skips the adjacent bit. Pass "bit" instead. Fixes: 3b89ea9c5902 ("net: Fix for_each_netdev_feature on Big endian") Signed-off-by: Tariq Toukan Reviewed-by: Gal Pressman Link: https://lore.kernel.org/r/20220504080914.1918-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski --- include/linux/netdev_features.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index 2c6b9e416225..7c2d77d75a88 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -169,7 +169,7 @@ enum { #define NETIF_F_HW_HSR_FWD __NETIF_F(HW_HSR_FWD) #define NETIF_F_HW_HSR_DUP __NETIF_F(HW_HSR_DUP) -/* Finds the next feature with the highest number of the range of start till 0. +/* Finds the next feature with the highest number of the range of start-1 till 0. */ static inline int find_next_netdev_feature(u64 feature, unsigned long start) { @@ -188,7 +188,7 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start) for ((bit) = find_next_netdev_feature((mask_addr), \ NETDEV_FEATURE_COUNT); \ (bit) >= 0; \ - (bit) = find_next_netdev_feature((mask_addr), (bit) - 1)) + (bit) = find_next_netdev_feature((mask_addr), (bit))) /* Features valid for ethtool to change */ /* = all defined minus driver/device-class-related */ -- cgit v1.2.3 From 4e707344e18525b4edf5c2bc2e3eb60692e8c92e Mon Sep 17 00:00:00 2001 From: Jonathan Toppins Date: Wed, 4 May 2022 14:59:08 -0400 Subject: MAINTAINERS: add missing files for bonding definition The bonding entry did not include additional include files that have been added nor did it reference the documentation. Add these references for completeness. Signed-off-by: Jonathan Toppins Link: https://lore.kernel.org/r/903ed2906b93628b38a2015664a20d2802042863.1651690748.git.jtoppins@redhat.com Signed-off-by: Jakub Kicinski --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index edc96cdb85e8..15037bd1ce48 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3571,8 +3571,9 @@ M: Andy Gospodarek L: netdev@vger.kernel.org S: Supported W: http://sourceforge.net/projects/bonding/ +F: Documentation/networking/bonding.rst F: drivers/net/bonding/ -F: include/net/bonding.h +F: include/net/bond* F: include/uapi/linux/if_bonding.h BOSCH SENSORTEC BMA400 ACCELEROMETER IIO DRIVER -- cgit v1.2.3 From e1846cff2fe614d93a2f89461b5935678fd34bd9 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 5 May 2022 02:54:59 +0300 Subject: net: mscc: ocelot: mark traps with a bool instead of keeping them in a list Since the blamed commit, VCAP filters can appear on more than one list. If their action is "trap", they are chained on ocelot->traps via filter->trap_list. This is in addition to their normal placement on the VCAP block->rules list head. Therefore, when we free a VCAP filter, we must remove it from all lists it is a member of, including ocelot->traps. There are at least 2 bugs which are direct consequences of this design decision. First is the incorrect usage of list_empty(), meant to denote whether "filter" is chained into ocelot->traps via filter->trap_list. This does not do the correct thing, because list_empty() checks whether "head->next == head", but in our case, head->next == head->prev == NULL. So we dereference NULL pointers and die when we call list_del(). Second is the fact that not all places that should remove the filter from ocelot->traps do so. One example is ocelot_vcap_block_remove_filter(), which is where we have the main kfree(filter). By keeping freed filters in ocelot->traps we end up in a use-after-free in felix_update_trapping_destinations(). Attempting to fix all the buggy patterns is a whack-a-mole game which makes the driver unmaintainable. Actually this is what the previous patch version attempted to do: https://patchwork.kernel.org/project/netdevbpf/patch/20220503115728.834457-3-vladimir.oltean@nxp.com/ but it introduced another set of bugs, because there are other places in which create VCAP filters, not just ocelot_vcap_filter_create(): - ocelot_trap_add() - felix_tag_8021q_vlan_add_rx() - felix_tag_8021q_vlan_add_tx() Relying on the convention that all those code paths must call INIT_LIST_HEAD(&filter->trap_list) is not going to scale. So let's do what should have been done in the first place and keep a bool in struct ocelot_vcap_filter which denotes whether we are looking at a trapping rule or not. Iterating now happens over the main VCAP IS2 block->rules. The advantage is that we no longer risk having stale references to a freed filter, since it is only present in that list. Fixes: e42bd4ed09aa ("net: mscc: ocelot: keep traps in a list") Signed-off-by: Vladimir Oltean Signed-off-by: Jakub Kicinski --- drivers/net/dsa/ocelot/felix.c | 7 ++++++- drivers/net/ethernet/mscc/ocelot.c | 11 +++-------- drivers/net/ethernet/mscc/ocelot_flower.c | 4 +--- include/soc/mscc/ocelot_vcap.h | 2 +- 4 files changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c index 9e28219b223d..faccfb3f0158 100644 --- a/drivers/net/dsa/ocelot/felix.c +++ b/drivers/net/dsa/ocelot/felix.c @@ -403,6 +403,7 @@ static int felix_update_trapping_destinations(struct dsa_switch *ds, { struct ocelot *ocelot = ds->priv; struct felix *felix = ocelot_to_felix(ocelot); + struct ocelot_vcap_block *block_vcap_is2; struct ocelot_vcap_filter *trap; enum ocelot_mask_mode mask_mode; unsigned long port_mask; @@ -422,9 +423,13 @@ static int felix_update_trapping_destinations(struct dsa_switch *ds, /* We are sure that "cpu" was found, otherwise * dsa_tree_setup_default_cpu() would have failed earlier. */ + block_vcap_is2 = &ocelot->block[VCAP_IS2]; /* Make sure all traps are set up for that destination */ - list_for_each_entry(trap, &ocelot->traps, trap_list) { + list_for_each_entry(trap, &block_vcap_is2->rules, list) { + if (!trap->is_trap) + continue; + /* Figure out the current trapping destination */ if (using_tag_8021q) { /* Redirect to the tag_8021q CPU port. If timestamps diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index ca71b62a44dc..20ceac81a2c2 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -1622,7 +1622,7 @@ int ocelot_trap_add(struct ocelot *ocelot, int port, trap->action.mask_mode = OCELOT_MASK_MODE_PERMIT_DENY; trap->action.port_mask = 0; trap->take_ts = take_ts; - list_add_tail(&trap->trap_list, &ocelot->traps); + trap->is_trap = true; new = true; } @@ -1634,10 +1634,8 @@ int ocelot_trap_add(struct ocelot *ocelot, int port, err = ocelot_vcap_filter_replace(ocelot, trap); if (err) { trap->ingress_port_mask &= ~BIT(port); - if (!trap->ingress_port_mask) { - list_del(&trap->trap_list); + if (!trap->ingress_port_mask) kfree(trap); - } return err; } @@ -1657,11 +1655,8 @@ int ocelot_trap_del(struct ocelot *ocelot, int port, unsigned long cookie) return 0; trap->ingress_port_mask &= ~BIT(port); - if (!trap->ingress_port_mask) { - list_del(&trap->trap_list); - + if (!trap->ingress_port_mask) return ocelot_vcap_filter_del(ocelot, trap); - } return ocelot_vcap_filter_replace(ocelot, trap); } diff --git a/drivers/net/ethernet/mscc/ocelot_flower.c b/drivers/net/ethernet/mscc/ocelot_flower.c index 03b5e59d033e..a9b26b3002be 100644 --- a/drivers/net/ethernet/mscc/ocelot_flower.c +++ b/drivers/net/ethernet/mscc/ocelot_flower.c @@ -295,7 +295,7 @@ static int ocelot_flower_parse_action(struct ocelot *ocelot, int port, filter->action.cpu_copy_ena = true; filter->action.cpu_qu_num = 0; filter->type = OCELOT_VCAP_FILTER_OFFLOAD; - list_add_tail(&filter->trap_list, &ocelot->traps); + filter->is_trap = true; break; case FLOW_ACTION_POLICE: if (filter->block_id == PSFP_BLOCK_ID) { @@ -878,8 +878,6 @@ int ocelot_cls_flower_replace(struct ocelot *ocelot, int port, ret = ocelot_flower_parse(ocelot, port, ingress, f, filter); if (ret) { - if (!list_empty(&filter->trap_list)) - list_del(&filter->trap_list); kfree(filter); return ret; } diff --git a/include/soc/mscc/ocelot_vcap.h b/include/soc/mscc/ocelot_vcap.h index 7b2bf9b1fe69..de26c992f821 100644 --- a/include/soc/mscc/ocelot_vcap.h +++ b/include/soc/mscc/ocelot_vcap.h @@ -681,7 +681,6 @@ struct ocelot_vcap_id { struct ocelot_vcap_filter { struct list_head list; - struct list_head trap_list; enum ocelot_vcap_filter_type type; int block_id; @@ -695,6 +694,7 @@ struct ocelot_vcap_filter { struct ocelot_vcap_stats stats; /* For VCAP IS1 and IS2 */ bool take_ts; + bool is_trap; unsigned long ingress_port_mask; /* For VCAP ES0 */ struct ocelot_vcap_port ingress_port; -- cgit v1.2.3 From 16bbebd35629c93a8c68c6d8d28557e100bcee73 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 5 May 2022 02:55:00 +0300 Subject: net: mscc: ocelot: fix last VCAP IS1/IS2 filter persisting in hardware when deleted ocelot_vcap_filter_del() works by moving the next filters over the current one, and then deleting the last filter by calling vcap_entry_set() with a del_filter which was specially created by memsetting its memory to zeroes. vcap_entry_set() then programs this to the TCAM and action RAM via the cache registers. The problem is that vcap_entry_set() is a dispatch function which looks at del_filter->block_id. But since del_filter is zeroized memory, the block_id is 0, or otherwise said, VCAP_ES0. So practically, what we do is delete the entry at the same TCAM index from VCAP ES0 instead of IS1 or IS2. The code was not always like this. vcap_entry_set() used to simply be is2_entry_set(), and then, the logic used to work. Restore the functionality by populating the block_id of the del_filter based on the VCAP block of the filter that we're deleting. This makes vcap_entry_set() know what to do. Fixes: 1397a2eb52e2 ("net: mscc: ocelot: create TCAM skeleton from tc filter chains") Signed-off-by: Vladimir Oltean Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mscc/ocelot_vcap.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/mscc/ocelot_vcap.c b/drivers/net/ethernet/mscc/ocelot_vcap.c index c8701ac955a8..2749df593ebc 100644 --- a/drivers/net/ethernet/mscc/ocelot_vcap.c +++ b/drivers/net/ethernet/mscc/ocelot_vcap.c @@ -1250,7 +1250,11 @@ int ocelot_vcap_filter_del(struct ocelot *ocelot, struct ocelot_vcap_filter del_filter; int i, index; + /* Need to inherit the block_id so that vcap_entry_set() + * does not get confused and knows where to install it. + */ memset(&del_filter, 0, sizeof(del_filter)); + del_filter.block_id = filter->block_id; /* Gets index of the filter */ index = ocelot_vcap_block_get_filter_index(block, filter); -- cgit v1.2.3 From 6741e11880003e35802d78cc58035057934f4dab Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 5 May 2022 02:55:01 +0300 Subject: net: mscc: ocelot: fix VCAP IS2 filters matching on both lookups The VCAP IS2 TCAM is looked up twice per packet, and each filter can be configured to only match during the first, second lookup, or both, or none. The blamed commit wrote the code for making VCAP IS2 filters match only on the given lookup. But right below that code, there was another line that explicitly made the lookup a "don't care", and this is overwriting the lookup we've selected. So the code had no effect. Some of the more noticeable effects of having filters match on both lookups: - in "tc -s filter show dev swp0 ingress", we see each packet matching a VCAP IS2 filter counted twice. This throws off scripts such as tools/testing/selftests/net/forwarding/tc_actions.sh and makes them fail. - a "tc-drop" action offloaded to VCAP IS2 needs a policer as well, because once the CPU port becomes a member of the destination port mask of a packet, nothing removes it, not even a PERMIT/DENY mask mode with a port mask of 0. But VCAP IS2 rules with the POLICE_ENA bit in the action vector can only appear in the first lookup. What happens when a filter matches both lookups is that the action vector is combined, and this makes the POLICE_ENA bit ineffective, since the last lookup in which it has appeared is the second one. In other words, "tc-drop" actions do not drop packets for the CPU port, dropped packets are still seen by software unless there was an FDB entry that directed those packets to some other place different from the CPU. The last bit used to work, because in the initial commit b596229448dd ("net: mscc: ocelot: Add support for tcam"), we were writing the FIRST field of the VCAP IS2 half key with a 1, not with a "don't care". The change to "don't care" was made inadvertently by me in commit c1c3993edb7c ("net: mscc: ocelot: generalize existing code for VCAP"), which I just realized, and which needs a separate fix from this one, for "stable" kernels that lack the commit blamed below. Fixes: 226e9cd82a96 ("net: mscc: ocelot: only install TCAM entries into a specific lookup and PAG") Signed-off-by: Vladimir Oltean Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mscc/ocelot_vcap.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/mscc/ocelot_vcap.c b/drivers/net/ethernet/mscc/ocelot_vcap.c index 2749df593ebc..774cec377703 100644 --- a/drivers/net/ethernet/mscc/ocelot_vcap.c +++ b/drivers/net/ethernet/mscc/ocelot_vcap.c @@ -374,7 +374,6 @@ static void is2_entry_set(struct ocelot *ocelot, int ix, OCELOT_VCAP_BIT_0); vcap_key_set(vcap, &data, VCAP_IS2_HK_IGR_PORT_MASK, 0, ~filter->ingress_port_mask); - vcap_key_bit_set(vcap, &data, VCAP_IS2_HK_FIRST, OCELOT_VCAP_BIT_ANY); vcap_key_bit_set(vcap, &data, VCAP_IS2_HK_HOST_MATCH, OCELOT_VCAP_BIT_ANY); vcap_key_bit_set(vcap, &data, VCAP_IS2_HK_L2_MC, filter->dmac_mc); -- cgit v1.2.3 From 477d2b91623e682e9a8126ea92acb8f684969cc7 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 5 May 2022 02:55:02 +0300 Subject: net: mscc: ocelot: restrict tc-trap actions to VCAP IS2 lookup 0 Once the CPU port was added to the destination port mask of a packet, it can never be cleared, so even packets marked as dropped by the MASK_MODE of a VCAP IS2 filter will still reach it. This is why we need the OCELOT_POLICER_DISCARD to "kill dropped packets dead" and make software stop seeing them. We disallow policer rules from being put on any other chain than the one for the first lookup, but we don't do this for "drop" rules, although we should. This change is merely ascertaining that the rules dont't (completely) work and letting the user know. The blamed commit is the one that introduced the multi-chain architecture in ocelot. Prior to that, we should have always offloaded the filters to VCAP IS2 lookup 0, where they did work. Fixes: 1397a2eb52e2 ("net: mscc: ocelot: create TCAM skeleton from tc filter chains") Signed-off-by: Vladimir Oltean Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mscc/ocelot_flower.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot_flower.c b/drivers/net/ethernet/mscc/ocelot_flower.c index a9b26b3002be..51cf241ff7d0 100644 --- a/drivers/net/ethernet/mscc/ocelot_flower.c +++ b/drivers/net/ethernet/mscc/ocelot_flower.c @@ -280,9 +280,10 @@ static int ocelot_flower_parse_action(struct ocelot *ocelot, int port, filter->type = OCELOT_VCAP_FILTER_OFFLOAD; break; case FLOW_ACTION_TRAP: - if (filter->block_id != VCAP_IS2) { + if (filter->block_id != VCAP_IS2 || + filter->lookup != 0) { NL_SET_ERR_MSG_MOD(extack, - "Trap action can only be offloaded to VCAP IS2"); + "Trap action can only be offloaded to VCAP IS2 lookup 0"); return -EOPNOTSUPP; } if (filter->goto_target != -1) { -- cgit v1.2.3 From 93a8417088ea570b5721d2b526337a2d3aed9fa3 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 5 May 2022 02:55:03 +0300 Subject: net: mscc: ocelot: avoid corrupting hardware counters when moving VCAP filters Given the following order of operations: (1) we add filter A using tc-flower (2) we send a packet that matches it (3) we read the filter's statistics to find a hit count of 1 (4) we add a second filter B with a higher preference than A, and A moves one position to the right to make room in the TCAM for it (5) we send another packet, and this matches the second filter B (6) we read the filter statistics again. When this happens, the hit count of filter A is 2 and of filter B is 1, despite a single packet having matched each filter. Furthermore, in an alternate history, reading the filter stats a second time between steps (3) and (4) makes the hit count of filter A remain at 1 after step (6), as expected. The reason why this happens has to do with the filter->stats.pkts field, which is written to hardware through the call path below: vcap_entry_set / | \ / | \ / | \ / | \ es0_entry_set is1_entry_set is2_entry_set \ | / \ | / \ | / vcap_data_set(data.counter, ...) The primary role of filter->stats.pkts is to transport the filter hit counters from the last readout all the way from vcap_entry_get() -> ocelot_vcap_filter_stats_update() -> ocelot_cls_flower_stats(). The reason why vcap_entry_set() writes it to hardware is so that the counters (saturating and having a limited bit width) are cleared after each user space readout. The writing of filter->stats.pkts to hardware during the TCAM entry movement procedure is an unintentional consequence of the code design, because the hit count isn't up to date at this point. So at step (4), when filter A is moved by ocelot_vcap_filter_add() to make room for filter B, the hardware hit count is 0 (no packet matched on it in the meantime), but filter->stats.pkts is 1, because the last readout saw the earlier packet. The movement procedure programs the old hit count back to hardware, so this creates the impression to user space that more packets have been matched than they really were. The bug can be seen when running the gact_drop_and_ok_test() from the tc_actions.sh selftest. Fix the issue by reading back the hit count to tmp->stats.pkts before migrating the VCAP filter. Sure, this is a best-effort technique, since the packets that hit the rule between vcap_entry_get() and vcap_entry_set() won't be counted, but at least it allows the counters to be reliably used for selftests where the traffic is under control. The vcap_entry_get() name is a bit unintuitive, but it only reads back the counter portion of the TCAM entry, not the entire entry. The index from which we retrieve the counter is also a bit unintuitive (i - 1 during add, i + 1 during del), but this is the way in which TCAM entry movement works. The "entry index" isn't a stored integer for a TCAM filter, instead it is dynamically computed by ocelot_vcap_block_get_filter_index() based on the entry's position in the &block->rules list. That position (as well as block->count) is automatically updated by ocelot_vcap_filter_add_to_block() on add, and by ocelot_vcap_block_remove_filter() on del. So "i" is the new filter index, and "i - 1" or "i + 1" respectively are the old addresses of that TCAM entry (we only support installing/deleting one filter at a time). Fixes: b596229448dd ("net: mscc: ocelot: Add support for tcam") Signed-off-by: Vladimir Oltean Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mscc/ocelot_vcap.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/mscc/ocelot_vcap.c b/drivers/net/ethernet/mscc/ocelot_vcap.c index 774cec377703..eeb4cc07dd16 100644 --- a/drivers/net/ethernet/mscc/ocelot_vcap.c +++ b/drivers/net/ethernet/mscc/ocelot_vcap.c @@ -1216,6 +1216,8 @@ int ocelot_vcap_filter_add(struct ocelot *ocelot, struct ocelot_vcap_filter *tmp; tmp = ocelot_vcap_block_find_filter_by_index(block, i); + /* Read back the filter's counters before moving it */ + vcap_entry_get(ocelot, i - 1, tmp); vcap_entry_set(ocelot, i, tmp); } @@ -1268,6 +1270,8 @@ int ocelot_vcap_filter_del(struct ocelot *ocelot, struct ocelot_vcap_filter *tmp; tmp = ocelot_vcap_block_find_filter_by_index(block, i); + /* Read back the filter's counters before moving it */ + vcap_entry_get(ocelot, i + 1, tmp); vcap_entry_set(ocelot, i, tmp); } -- cgit v1.2.3 From 486b9eee57ddca5c9a2d59fc41153f36002e0a00 Mon Sep 17 00:00:00 2001 From: Ivan Vecera Date: Sat, 23 Apr 2022 12:20:21 +0200 Subject: ice: Fix race during aux device (un)plugging Function ice_plug_aux_dev() assigns pf->adev field too early prior aux device initialization and on other side ice_unplug_aux_dev() starts aux device deinit and at the end assigns NULL to pf->adev. This is wrong because pf->adev should always be non-NULL only when aux device is fully initialized and ready. This wrong order causes a crash when ice_send_event_to_aux() call occurs because that function depends on non-NULL value of pf->adev and does not assume that aux device is half-initialized or half-destroyed. After order correction the race window is tiny but it is still there, as Leon mentioned and manipulation with pf->adev needs to be protected by mutex. Fix (un-)plugging functions so pf->adev field is set after aux device init and prior aux device destroy and protect pf->adev assignment by new mutex. This mutex is also held during ice_send_event_to_aux() call to ensure that aux device is valid during that call. Note that device lock used ice_send_event_to_aux() needs to be kept to avoid race with aux drv unload. Reproducer: cycle=1 while :;do echo "#### Cycle: $cycle" ip link set ens7f0 mtu 9000 ip link add bond0 type bond mode 1 miimon 100 ip link set bond0 up ifenslave bond0 ens7f0 ip link set bond0 mtu 9000 ethtool -L ens7f0 combined 1 ip link del bond0 ip link set ens7f0 mtu 1500 sleep 1 let cycle++ done In short when the device is added/removed to/from bond the aux device is unplugged/plugged. When MTU of the device is changed an event is sent to aux device asynchronously. This can race with (un)plugging operation and because pf->adev is set too early (plug) or too late (unplug) the function ice_send_event_to_aux() can touch uninitialized or destroyed fields. In the case of crash below pf->adev->dev.mutex. Crash: [ 53.372066] bond0: (slave ens7f0): making interface the new active one [ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready [ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.248204] bond0: (slave ens7f0): Releasing backup interface [ 54.253955] bond0: (slave ens7f1): making interface the new active one [ 54.274875] bond0: (slave ens7f1): Releasing backup interface [ 54.289153] bond0 (unregistering): Released all slaves [ 55.383179] MII link monitoring set to 100 ms [ 55.398696] bond0: (slave ens7f0): making interface the new active one [ 55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080 [ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 55.412198] #PF: supervisor write access in kernel mode [ 55.412200] #PF: error_code(0x0002) - not-present page [ 55.412201] PGD 25d2ad067 P4D 0 [ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S 5.17.0-13579-g57f2d6540f03 #1 [ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/ 2021 [ 55.430226] Workqueue: ice ice_service_task [ice] [ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20 [ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48 [ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246 [ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001 [ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080 [ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041 [ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0 [ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000 [ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000 [ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0 [ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.567305] PKRU: 55555554 [ 55.570018] Call Trace: [ 55.572474] [ 55.574579] ice_service_task+0xaab/0xef0 [ice] [ 55.579130] process_one_work+0x1c5/0x390 [ 55.583141] ? process_one_work+0x390/0x390 [ 55.587326] worker_thread+0x30/0x360 [ 55.590994] ? process_one_work+0x390/0x390 [ 55.595180] kthread+0xe6/0x110 [ 55.598325] ? kthread_complete_and_exit+0x20/0x20 [ 55.603116] ret_from_fork+0x1f/0x30 [ 55.606698] Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Reviewed-by: Leon Romanovsky Signed-off-by: Ivan Vecera Reviewed-by: Dave Ertman Tested-by: Gurucharan (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice.h | 1 + drivers/net/ethernet/intel/ice/ice_idc.c | 25 +++++++++++++++++-------- drivers/net/ethernet/intel/ice/ice_main.c | 2 ++ 3 files changed, 20 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 8ed3c9ab7ff7..a895e3a8e988 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -540,6 +540,7 @@ struct ice_pf { struct mutex avail_q_mutex; /* protects access to avail_[rx|tx]qs */ struct mutex sw_mutex; /* lock for protecting VSI alloc flow */ struct mutex tc_mutex; /* lock to protect TC changes */ + struct mutex adev_mutex; /* lock to protect aux device access */ u32 msg_enable; struct ice_ptp ptp; struct tty_driver *ice_gnss_tty_driver; diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c index 25a436d342c2..3e3b2ed4cd5d 100644 --- a/drivers/net/ethernet/intel/ice/ice_idc.c +++ b/drivers/net/ethernet/intel/ice/ice_idc.c @@ -37,14 +37,17 @@ void ice_send_event_to_aux(struct ice_pf *pf, struct iidc_event *event) if (WARN_ON_ONCE(!in_task())) return; + mutex_lock(&pf->adev_mutex); if (!pf->adev) - return; + goto finish; device_lock(&pf->adev->dev); iadrv = ice_get_auxiliary_drv(pf); if (iadrv && iadrv->event_handler) iadrv->event_handler(pf, event); device_unlock(&pf->adev->dev); +finish: + mutex_unlock(&pf->adev_mutex); } /** @@ -290,7 +293,6 @@ int ice_plug_aux_dev(struct ice_pf *pf) return -ENOMEM; adev = &iadev->adev; - pf->adev = adev; iadev->pf = pf; adev->id = pf->aux_idx; @@ -300,18 +302,20 @@ int ice_plug_aux_dev(struct ice_pf *pf) ret = auxiliary_device_init(adev); if (ret) { - pf->adev = NULL; kfree(iadev); return ret; } ret = auxiliary_device_add(adev); if (ret) { - pf->adev = NULL; auxiliary_device_uninit(adev); return ret; } + mutex_lock(&pf->adev_mutex); + pf->adev = adev; + mutex_unlock(&pf->adev_mutex); + return 0; } @@ -320,12 +324,17 @@ int ice_plug_aux_dev(struct ice_pf *pf) */ void ice_unplug_aux_dev(struct ice_pf *pf) { - if (!pf->adev) - return; + struct auxiliary_device *adev; - auxiliary_device_delete(pf->adev); - auxiliary_device_uninit(pf->adev); + mutex_lock(&pf->adev_mutex); + adev = pf->adev; pf->adev = NULL; + mutex_unlock(&pf->adev_mutex); + + if (adev) { + auxiliary_device_delete(adev); + auxiliary_device_uninit(adev); + } } /** diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 9a0a358a15c2..949669fed7d6 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -3769,6 +3769,7 @@ u16 ice_get_avail_rxq_count(struct ice_pf *pf) static void ice_deinit_pf(struct ice_pf *pf) { ice_service_task_stop(pf); + mutex_destroy(&pf->adev_mutex); mutex_destroy(&pf->sw_mutex); mutex_destroy(&pf->tc_mutex); mutex_destroy(&pf->avail_q_mutex); @@ -3847,6 +3848,7 @@ static int ice_init_pf(struct ice_pf *pf) mutex_init(&pf->sw_mutex); mutex_init(&pf->tc_mutex); + mutex_init(&pf->adev_mutex); INIT_HLIST_HEAD(&pf->aq_wait_list); spin_lock_init(&pf->aq_wait_lock); -- cgit v1.2.3 From 6096dae926a22e2892ef9169f582589c16d39639 Mon Sep 17 00:00:00 2001 From: Anatolii Gerasymenko Date: Thu, 28 Apr 2022 12:01:00 +0000 Subject: ice: clear stale Tx queue settings before configuring The iAVF driver uses 3 virtchnl op codes to communicate with the PF regarding the VF Tx queues: * VIRTCHNL_OP_CONFIG_VSI_QUEUES configures the hardware and firmware logic for the Tx queues * VIRTCHNL_OP_ENABLE_QUEUES configures the queue interrupts * VIRTCHNL_OP_DISABLE_QUEUES disables the queue interrupts and Tx rings. There is a bug in the iAVF driver due to the race condition between VF reset request and shutdown being executed in parallel. This leads to a break in logic and VIRTCHNL_OP_DISABLE_QUEUES is not being sent. If this occurs, the PF driver never cleans up the Tx queues. This results in leaving behind stale Tx queue settings in the hardware and firmware. The most obvious outcome is that upon the next VIRTCHNL_OP_CONFIG_VSI_QUEUES, the PF will fail to program the Tx scheduler node due to a lack of space. We need to protect ICE driver against such situation. To fix this, make sure we clear existing stale settings out when handling VIRTCHNL_OP_CONFIG_VSI_QUEUES. This ensures we remove the previous settings. Calling ice_vf_vsi_dis_single_txq should be safe as it will do nothing if the queue is not configured. The function already handles the case when the Tx queue is not currently configured and exits with a 0 return in that case. Fixes: 7ad15440acf8 ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling") Signed-off-by: Jacob Keller Signed-off-by: Anatolii Gerasymenko Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_virtchnl.c | 68 ++++++++++++++++++++------- 1 file changed, 50 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c index b72606c9e6d0..2889e050a4c9 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c @@ -1307,13 +1307,52 @@ error_param: NULL, 0); } +/** + * ice_vf_vsi_dis_single_txq - disable a single Tx queue + * @vf: VF to disable queue for + * @vsi: VSI for the VF + * @q_id: VF relative (0-based) queue ID + * + * Attempt to disable the Tx queue passed in. If the Tx queue was successfully + * disabled then clear q_id bit in the enabled queues bitmap and return + * success. Otherwise return error. + */ +static int +ice_vf_vsi_dis_single_txq(struct ice_vf *vf, struct ice_vsi *vsi, u16 q_id) +{ + struct ice_txq_meta txq_meta = { 0 }; + struct ice_tx_ring *ring; + int err; + + if (!test_bit(q_id, vf->txq_ena)) + dev_dbg(ice_pf_to_dev(vsi->back), "Queue %u on VSI %u is not enabled, but stopping it anyway\n", + q_id, vsi->vsi_num); + + ring = vsi->tx_rings[q_id]; + if (!ring) + return -EINVAL; + + ice_fill_txq_meta(vsi, ring, &txq_meta); + + err = ice_vsi_stop_tx_ring(vsi, ICE_NO_RESET, vf->vf_id, ring, &txq_meta); + if (err) { + dev_err(ice_pf_to_dev(vsi->back), "Failed to stop Tx ring %d on VSI %d\n", + q_id, vsi->vsi_num); + return err; + } + + /* Clear enabled queues flag */ + clear_bit(q_id, vf->txq_ena); + + return 0; +} + /** * ice_vc_dis_qs_msg * @vf: pointer to the VF info * @msg: pointer to the msg buffer * - * called from the VF to disable all or specific - * queue(s) + * called from the VF to disable all or specific queue(s) */ static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg) { @@ -1350,30 +1389,15 @@ static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg) q_map = vqs->tx_queues; for_each_set_bit(vf_q_id, &q_map, ICE_MAX_RSS_QS_PER_VF) { - struct ice_tx_ring *ring = vsi->tx_rings[vf_q_id]; - struct ice_txq_meta txq_meta = { 0 }; - if (!ice_vc_isvalid_q_id(vf, vqs->vsi_id, vf_q_id)) { v_ret = VIRTCHNL_STATUS_ERR_PARAM; goto error_param; } - if (!test_bit(vf_q_id, vf->txq_ena)) - dev_dbg(ice_pf_to_dev(vsi->back), "Queue %u on VSI %u is not enabled, but stopping it anyway\n", - vf_q_id, vsi->vsi_num); - - ice_fill_txq_meta(vsi, ring, &txq_meta); - - if (ice_vsi_stop_tx_ring(vsi, ICE_NO_RESET, vf->vf_id, - ring, &txq_meta)) { - dev_err(ice_pf_to_dev(vsi->back), "Failed to stop Tx ring %d on VSI %d\n", - vf_q_id, vsi->vsi_num); + if (ice_vf_vsi_dis_single_txq(vf, vsi, vf_q_id)) { v_ret = VIRTCHNL_STATUS_ERR_PARAM; goto error_param; } - - /* Clear enabled queues flag */ - clear_bit(vf_q_id, vf->txq_ena); } } @@ -1622,6 +1646,14 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg) if (qpi->txq.ring_len > 0) { vsi->tx_rings[i]->dma = qpi->txq.dma_ring_addr; vsi->tx_rings[i]->count = qpi->txq.ring_len; + + /* Disable any existing queue first */ + if (ice_vf_vsi_dis_single_txq(vf, vsi, q_idx)) { + v_ret = VIRTCHNL_STATUS_ERR_PARAM; + goto error_param; + } + + /* Configure a queue with the requested settings */ if (ice_vsi_cfg_single_txq(vsi, vsi->tx_rings, q_idx)) { v_ret = VIRTCHNL_STATUS_ERR_PARAM; goto error_param; -- cgit v1.2.3 From a11b6c1a383ff092f432e040c20e032503785d47 Mon Sep 17 00:00:00 2001 From: Michal Michalik Date: Wed, 20 Apr 2022 14:23:02 +0200 Subject: ice: fix PTP stale Tx timestamps cleanup Read stale PTP Tx timestamps from PHY on cleanup. After running out of Tx timestamps request handlers, hardware (HW) stops reporting finished requests. Function ice_ptp_tx_tstamp_cleanup() used to only clean up stale handlers in driver and was leaving the hardware registers not read. Not reading stale PTP Tx timestamps prevents next interrupts from arriving and makes timestamping unusable. Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Michal Michalik Reviewed-by: Jacob Keller Reviewed-by: Paul Menzel Tested-by: Gurucharan (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_ptp.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c index a1cd33273ca4..da025c204577 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c @@ -2287,6 +2287,7 @@ ice_ptp_init_tx_e810(struct ice_pf *pf, struct ice_ptp_tx *tx) /** * ice_ptp_tx_tstamp_cleanup - Cleanup old timestamp requests that got dropped + * @hw: pointer to the hw struct * @tx: PTP Tx tracker to clean up * * Loop through the Tx timestamp requests and see if any of them have been @@ -2295,7 +2296,7 @@ ice_ptp_init_tx_e810(struct ice_pf *pf, struct ice_ptp_tx *tx) * timestamp will never be captured. This might happen if the packet gets * discarded before it reaches the PHY timestamping block. */ -static void ice_ptp_tx_tstamp_cleanup(struct ice_ptp_tx *tx) +static void ice_ptp_tx_tstamp_cleanup(struct ice_hw *hw, struct ice_ptp_tx *tx) { u8 idx; @@ -2304,11 +2305,16 @@ static void ice_ptp_tx_tstamp_cleanup(struct ice_ptp_tx *tx) for_each_set_bit(idx, tx->in_use, tx->len) { struct sk_buff *skb; + u64 raw_tstamp; /* Check if this SKB has been waiting for too long */ if (time_is_after_jiffies(tx->tstamps[idx].start + 2 * HZ)) continue; + /* Read tstamp to be able to use this register again */ + ice_read_phy_tstamp(hw, tx->quad, idx + tx->quad_offset, + &raw_tstamp); + spin_lock(&tx->lock); skb = tx->tstamps[idx].skb; tx->tstamps[idx].skb = NULL; @@ -2330,7 +2336,7 @@ static void ice_ptp_periodic_work(struct kthread_work *work) ice_ptp_update_cached_phctime(pf); - ice_ptp_tx_tstamp_cleanup(&pf->ptp.port.tx); + ice_ptp_tx_tstamp_cleanup(&pf->hw, &pf->ptp.port.tx); /* Run twice a second */ kthread_queue_delayed_work(ptp->kworker, &ptp->work, -- cgit v1.2.3 From 9e6c6d17d1d6a3f1515ce399f9a011629ec79aa0 Mon Sep 17 00:00:00 2001 From: Lokesh Dhoundiyal Date: Thu, 5 May 2022 14:00:17 +1200 Subject: ipv4: drop dst in multicast routing path kmemleak reports the following when routing multicast traffic over an ipsec tunnel. Kmemleak output: unreferenced object 0x8000000044bebb00 (size 256): comm "softirq", pid 0, jiffies 4294985356 (age 126.810s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 80 00 00 00 05 13 74 80 ..............t. 80 00 00 00 04 9b bf f9 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f83947e0>] __kmalloc+0x1e8/0x300 [<00000000b7ed8dca>] metadata_dst_alloc+0x24/0x58 [<0000000081d32c20>] __ipgre_rcv+0x100/0x2b8 [<00000000824f6cf1>] gre_rcv+0x178/0x540 [<00000000ccd4e162>] gre_rcv+0x7c/0xd8 [<00000000c024b148>] ip_protocol_deliver_rcu+0x124/0x350 [<000000006a483377>] ip_local_deliver_finish+0x54/0x68 [<00000000d9271b3a>] ip_local_deliver+0x128/0x168 [<00000000bd4968ae>] xfrm_trans_reinject+0xb8/0xf8 [<0000000071672a19>] tasklet_action_common.isra.16+0xc4/0x1b0 [<0000000062e9c336>] __do_softirq+0x1fc/0x3e0 [<00000000013d7914>] irq_exit+0xc4/0xe0 [<00000000a4d73e90>] plat_irq_dispatch+0x7c/0x108 [<000000000751eb8e>] handle_int+0x16c/0x178 [<000000001668023b>] _raw_spin_unlock_irqrestore+0x1c/0x28 The metadata dst is leaked when ip_route_input_mc() updates the dst for the skb. Commit f38a9eb1f77b ("dst: Metadata destinations") correctly handled dropping the dst in ip_route_input_slow() but missed the multicast case which is handled by ip_route_input_mc(). Drop the dst in ip_route_input_mc() avoiding the leak. Fixes: f38a9eb1f77b ("dst: Metadata destinations") Signed-off-by: Lokesh Dhoundiyal Signed-off-by: Chris Packham Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20220505020017.3111846-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski --- net/ipv4/route.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 98c6f3429593..57abd27e842c 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1753,6 +1753,7 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr, #endif RT_CACHE_STAT_INC(in_slow_mc); + skb_dst_drop(skb); skb_dst_set(skb, &rth->dst); return 0; } -- cgit v1.2.3 From d5076fe4049cadef1f040eda4aaa001bb5424225 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 5 May 2022 09:19:46 -0700 Subject: netlink: do not reset transport header in netlink_recvmsg() netlink_recvmsg() does not need to change transport header. If transport header was needed, it should have been reset by the producer (netlink_dump()), not the consumer(s). The following trace probably happened when multiple threads were using MSG_PEEK. BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg write to 0xffff88811e9f15b2 of 2 bytes by task 32012 on cpu 1: skb_reset_transport_header include/linux/skbuff.h:2760 [inline] netlink_recvmsg+0x1de/0x790 net/netlink/af_netlink.c:1978 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] __sys_recvfrom+0x204/0x2c0 net/socket.c:2097 __do_sys_recvfrom net/socket.c:2115 [inline] __se_sys_recvfrom net/socket.c:2111 [inline] __x64_sys_recvfrom+0x74/0x90 net/socket.c:2111 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae write to 0xffff88811e9f15b2 of 2 bytes by task 32005 on cpu 0: skb_reset_transport_header include/linux/skbuff.h:2760 [inline] netlink_recvmsg+0x1de/0x790 net/netlink/af_netlink.c:1978 ____sys_recvmsg+0x162/0x2f0 ___sys_recvmsg net/socket.c:2674 [inline] __sys_recvmsg+0x209/0x3f0 net/socket.c:2704 __do_sys_recvmsg net/socket.c:2714 [inline] __se_sys_recvmsg net/socket.c:2711 [inline] __x64_sys_recvmsg+0x42/0x50 net/socket.c:2711 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffff -> 0x0000 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 32005 Comm: syz-executor.4 Not tainted 5.18.0-rc1-syzkaller-00328-ge1f700ebd6be-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet Reported-by: syzbot Link: https://lore.kernel.org/r/20220505161946.2867638-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski --- net/netlink/af_netlink.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 05a3795eac8e..73e9c0a9c187 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1975,7 +1975,6 @@ static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, copied = len; } - skb_reset_transport_header(data_skb); err = skb_copy_datagram_msg(data_skb, 0, msg, copied); if (msg->msg_name) { -- cgit v1.2.3 From 1c7ab9cd98b78bef1657a5db7204d8d437e24c94 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 5 May 2022 16:31:01 -0700 Subject: net: chelsio: cxgb4: Avoid potential negative array offset Using min_t(int, ...) as a potential array index implies to the compiler that negative offsets should be allowed. This is not the case, though. Replace "int" with "unsigned int". Fixes the following warning exposed under future CONFIG_FORTIFY_SOURCE improvements: In file included from include/linux/string.h:253, from include/linux/bitmap.h:11, from include/linux/cpumask.h:12, from include/linux/smp.h:13, from include/linux/lockdep.h:14, from include/linux/rcupdate.h:29, from include/linux/rculist.h:11, from include/linux/pid.h:5, from include/linux/sched.h:14, from include/linux/delay.h:23, from drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:35: drivers/net/ethernet/chelsio/cxgb4/t4_hw.c: In function 't4_get_raw_vpd_params': include/linux/fortify-string.h:46:33: warning: '__builtin_memcpy' pointer overflow between offset 29 and size [2147483648, 4294967295] [-Warray-bounds] 46 | #define __underlying_memcpy __builtin_memcpy | ^ include/linux/fortify-string.h:388:9: note: in expansion of macro '__underlying_memcpy' 388 | __underlying_##op(p, q, __fortify_size); \ | ^~~~~~~~~~~~~ include/linux/fortify-string.h:433:26: note: in expansion of macro '__fortify_memcpy_chk' 433 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2796:9: note: in expansion of macro 'memcpy' 2796 | memcpy(p->id, vpd + id, min_t(int, id_len, ID_LEN)); | ^~~~~~ include/linux/fortify-string.h:46:33: warning: '__builtin_memcpy' pointer overflow between offset 0 and size [2147483648, 4294967295] [-Warray-bounds] 46 | #define __underlying_memcpy __builtin_memcpy | ^ include/linux/fortify-string.h:388:9: note: in expansion of macro '__underlying_memcpy' 388 | __underlying_##op(p, q, __fortify_size); \ | ^~~~~~~~~~~~~ include/linux/fortify-string.h:433:26: note: in expansion of macro '__fortify_memcpy_chk' 433 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2798:9: note: in expansion of macro 'memcpy' 2798 | memcpy(p->sn, vpd + sn, min_t(int, sn_len, SERNUM_LEN)); | ^~~~~~ Additionally remove needless cast from u8[] to char * in last strim() call. Reported-by: kernel test robot Link: https://lore.kernel.org/lkml/202205031926.FVP7epJM-lkp@intel.com Fixes: fc9279298e3a ("cxgb4: Search VPD with pci_vpd_find_ro_info_keyword()") Fixes: 24c521f81c30 ("cxgb4: Use pci_vpd_find_id_string() to find VPD ID string") Cc: Raju Rangoju Cc: Eric Dumazet Cc: Paolo Abeni Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220505233101.1224230-1-keescook@chromium.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c index e7b4e3ed056c..8d719f82854a 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c @@ -2793,14 +2793,14 @@ int t4_get_raw_vpd_params(struct adapter *adapter, struct vpd_params *p) goto out; na = ret; - memcpy(p->id, vpd + id, min_t(int, id_len, ID_LEN)); + memcpy(p->id, vpd + id, min_t(unsigned int, id_len, ID_LEN)); strim(p->id); - memcpy(p->sn, vpd + sn, min_t(int, sn_len, SERNUM_LEN)); + memcpy(p->sn, vpd + sn, min_t(unsigned int, sn_len, SERNUM_LEN)); strim(p->sn); - memcpy(p->pn, vpd + pn, min_t(int, pn_len, PN_LEN)); + memcpy(p->pn, vpd + pn, min_t(unsigned int, pn_len, PN_LEN)); strim(p->pn); - memcpy(p->na, vpd + na, min_t(int, na_len, MACADDR_LEN)); - strim((char *)p->na); + memcpy(p->na, vpd + na, min_t(unsigned int, na_len, MACADDR_LEN)); + strim(p->na); out: vfree(vpd); -- cgit v1.2.3 From 49e6123c65dac6393b04f39ceabf79c44f66b8be Mon Sep 17 00:00:00 2001 From: Taehee Yoo Date: Wed, 4 May 2022 12:32:27 +0000 Subject: net: sfc: fix memory leak due to ptp channel It fixes memory leak in ring buffer change logic. When ring buffer size is changed(ethtool -G eth0 rx 4096), sfc driver works like below. 1. stop all channels and remove ring buffers. 2. allocates new buffer array. 3. allocates rx buffers. 4. start channels. While the above steps are working, it skips some steps if the channel doesn't have a ->copy callback function. Due to ptp channel doesn't have ->copy callback, these above steps are skipped for ptp channel. It eventually makes some problems. a. ptp channel's ring buffer size is not changed, it works only 1024(default). b. memory leak. The reason for memory leak is to use the wrong ring buffer values. There are some values, which is related to ring buffer size. a. efx->rxq_entries - This is global value of rx queue size. b. rx_queue->ptr_mask - used for access ring buffer as circular ring. - roundup_pow_of_two(efx->rxq_entries) - 1 c. rx_queue->max_fill - efx->rxq_entries - EFX_RXD_HEAD_ROOM These all values should be based on ring buffer size consistently. But ptp channel's values are not. a. efx->rxq_entries - This is global(for sfc) value, always new ring buffer size. b. rx_queue->ptr_mask - This is always 1023(default). c. rx_queue->max_fill - This is new ring buffer size - EFX_RXD_HEAD_ROOM. Let's assume we set 4096 for rx ring buffer, normal channel ptp channel efx->rxq_entries 4096 4096 rx_queue->ptr_mask 4095 1023 rx_queue->max_fill 4086 4086 sfc driver allocates rx ring buffers based on these values. When it allocates ptp channel's ring buffer, 4086 ring buffers are allocated then, these buffers are attached to the allocated array. But ptp channel's ring buffer array size is still 1024(default) and ptr_mask is still 1023 too. So, 3062 ring buffers will be overwritten to the array. This is the reason for memory leak. Test commands: ethtool -G rx 4096 while : do ip link set up ip link set down done In order to avoid this problem, it adds ->copy callback to ptp channel type. So that rx_queue->ptr_mask value will be updated correctly. Fixes: 7c236c43b838 ("sfc: Add support for IEEE-1588 PTP") Signed-off-by: Taehee Yoo Signed-off-by: David S. Miller --- drivers/net/ethernet/sfc/efx_channels.c | 7 ++++++- drivers/net/ethernet/sfc/ptp.c | 14 +++++++++++++- drivers/net/ethernet/sfc/ptp.h | 1 + 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c index 377df8b7f015..40df910aa140 100644 --- a/drivers/net/ethernet/sfc/efx_channels.c +++ b/drivers/net/ethernet/sfc/efx_channels.c @@ -867,7 +867,9 @@ static void efx_set_xdp_channels(struct efx_nic *efx) int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries) { - struct efx_channel *other_channel[EFX_MAX_CHANNELS], *channel; + struct efx_channel *other_channel[EFX_MAX_CHANNELS], *channel, + *ptp_channel = efx_ptp_channel(efx); + struct efx_ptp_data *ptp_data = efx->ptp_data; unsigned int i, next_buffer_table = 0; u32 old_rxq_entries, old_txq_entries; int rc, rc2; @@ -938,6 +940,7 @@ int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries) efx_set_xdp_channels(efx); out: + efx->ptp_data = NULL; /* Destroy unused channel structures */ for (i = 0; i < efx->n_channels; i++) { channel = other_channel[i]; @@ -948,6 +951,7 @@ out: } } + efx->ptp_data = ptp_data; rc2 = efx_soft_enable_interrupts(efx); if (rc2) { rc = rc ? rc : rc2; @@ -966,6 +970,7 @@ rollback: efx->txq_entries = old_txq_entries; for (i = 0; i < efx->n_channels; i++) swap(efx->channel[i], other_channel[i]); + efx_ptp_update_channel(efx, ptp_channel); goto out; } diff --git a/drivers/net/ethernet/sfc/ptp.c b/drivers/net/ethernet/sfc/ptp.c index f0ef515e2ade..4625f85acab2 100644 --- a/drivers/net/ethernet/sfc/ptp.c +++ b/drivers/net/ethernet/sfc/ptp.c @@ -45,6 +45,7 @@ #include "farch_regs.h" #include "tx.h" #include "nic.h" /* indirectly includes ptp.h */ +#include "efx_channels.h" /* Maximum number of events expected to make up a PTP event */ #define MAX_EVENT_FRAGS 3 @@ -541,6 +542,12 @@ struct efx_channel *efx_ptp_channel(struct efx_nic *efx) return efx->ptp_data ? efx->ptp_data->channel : NULL; } +void efx_ptp_update_channel(struct efx_nic *efx, struct efx_channel *channel) +{ + if (efx->ptp_data) + efx->ptp_data->channel = channel; +} + static u32 last_sync_timestamp_major(struct efx_nic *efx) { struct efx_channel *channel = efx_ptp_channel(efx); @@ -1443,6 +1450,11 @@ int efx_ptp_probe(struct efx_nic *efx, struct efx_channel *channel) int rc = 0; unsigned int pos; + if (efx->ptp_data) { + efx->ptp_data->channel = channel; + return 0; + } + ptp = kzalloc(sizeof(struct efx_ptp_data), GFP_KERNEL); efx->ptp_data = ptp; if (!efx->ptp_data) @@ -2176,7 +2188,7 @@ static const struct efx_channel_type efx_ptp_channel_type = { .pre_probe = efx_ptp_probe_channel, .post_remove = efx_ptp_remove_channel, .get_name = efx_ptp_get_channel_name, - /* no copy operation; there is no need to reallocate this channel */ + .copy = efx_copy_channel, .receive_skb = efx_ptp_rx, .want_txqs = efx_ptp_want_txqs, .keep_eventq = false, diff --git a/drivers/net/ethernet/sfc/ptp.h b/drivers/net/ethernet/sfc/ptp.h index 9855e8c9e544..7b1ef7002b3f 100644 --- a/drivers/net/ethernet/sfc/ptp.h +++ b/drivers/net/ethernet/sfc/ptp.h @@ -16,6 +16,7 @@ struct ethtool_ts_info; int efx_ptp_probe(struct efx_nic *efx, struct efx_channel *channel); void efx_ptp_defer_probe_with_channel(struct efx_nic *efx); struct efx_channel *efx_ptp_channel(struct efx_nic *efx); +void efx_ptp_update_channel(struct efx_nic *efx, struct efx_channel *channel); void efx_ptp_remove(struct efx_nic *efx); int efx_ptp_set_ts_config(struct efx_nic *efx, struct ifreq *ifr); int efx_ptp_get_ts_config(struct efx_nic *efx, struct ifreq *ifr); -- cgit v1.2.3 From cf3ab8d4a797960b4be20565abb3bcd227b18a68 Mon Sep 17 00:00:00 2001 From: Lina Wang Date: Thu, 5 May 2022 13:48:49 +0800 Subject: net: fix wrong network header length When clatd starts with ebpf offloaing, and NETIF_F_GRO_FRAGLIST is enable, several skbs are gathered in skb_shinfo(skb)->frag_list. The first skb's ipv6 header will be changed to ipv4 after bpf_skb_proto_6_to_4, network_header\transport_header\mac_header have been updated as ipv4 acts, but other skbs in frag_list didnot update anything, just ipv6 packets. udp_queue_rcv_skb will call skb_segment_list to traverse other skbs in frag_list and make sure right udp payload is delivered to user space. Unfortunately, other skbs in frag_list who are still ipv6 packets are updated like the first skb and will have wrong transport header length. e.g.before bpf_skb_proto_6_to_4,the first skb and other skbs in frag_list has the same network_header(24)& transport_header(64), after bpf_skb_proto_6_to_4, ipv6 protocol has been changed to ipv4, the first skb's network_header is 44,transport_header is 64, other skbs in frag_list didnot change.After skb_segment_list, the other skbs in frag_list has different network_header(24) and transport_header(44), so there will be 20 bytes different from original,that is difference between ipv6 header and ipv4 header. Just change transport_header to be the same with original. Actually, there are two solutions to fix it, one is traversing all skbs and changing every skb header in bpf_skb_proto_6_to_4, the other is modifying frag_list skb's header in skb_segment_list. Considering efficiency, adopt the second one--- when the first skb and other skbs in frag_list has different network_header length, restore them to make sure right udp payload is delivered to user space. Signed-off-by: Lina Wang Signed-off-by: David S. Miller --- net/core/skbuff.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 30b523fa4ad2..c90c74de90d5 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3897,7 +3897,7 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, unsigned int delta_len = 0; struct sk_buff *tail = NULL; struct sk_buff *nskb, *tmp; - int err; + int len_diff, err; skb_push(skb, -skb_network_offset(skb) + offset); @@ -3937,9 +3937,11 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, skb_push(nskb, -skb_network_offset(nskb) + offset); skb_release_head_state(nskb); + len_diff = skb_network_header_len(nskb) - skb_network_header_len(skb); __copy_skb_header(nskb, skb); skb_headers_offset_update(nskb, skb_headroom(nskb) - skb_headroom(skb)); + nskb->transport_header += len_diff; skb_copy_from_linear_data_offset(skb, -tnl_hlen, nskb->data - tnl_hlen, offset + tnl_hlen); -- cgit v1.2.3 From edae34a3ed9293b5077dddf9e51a3d86c95dc76a Mon Sep 17 00:00:00 2001 From: Lina Wang Date: Thu, 5 May 2022 13:48:50 +0800 Subject: selftests net: add UDP GRO fraglist + bpf self-tests When NET_F_F_GRO_FRAGLIST is enabled and bpf_skb_change_proto is used, check if udp packets and tcp packets are successfully delivered to user space. If wrong udp packets are delivered, udpgso_bench_rx will exit with "Initial byte out of range" Signed-off-by: Maciej enczykowski Signed-off-by: Lina Wang Signed-off-by: David S. Miller --- tools/testing/selftests/net/Makefile | 3 + tools/testing/selftests/net/bpf/Makefile | 14 ++ tools/testing/selftests/net/bpf/nat6to4.c | 285 ++++++++++++++++++++++++++ tools/testing/selftests/net/udpgro_frglist.sh | 101 +++++++++ 4 files changed, 403 insertions(+) create mode 100644 tools/testing/selftests/net/bpf/Makefile create mode 100644 tools/testing/selftests/net/bpf/nat6to4.c create mode 100755 tools/testing/selftests/net/udpgro_frglist.sh diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 0f2ebc38d893..e1f998defd10 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -25,6 +25,7 @@ TEST_PROGS += bareudp.sh TEST_PROGS += amt.sh TEST_PROGS += unicast_extensions.sh TEST_PROGS += udpgro_fwd.sh +TEST_PROGS += udpgro_frglist.sh TEST_PROGS += veth.sh TEST_PROGS += ioam6.sh TEST_PROGS += gro.sh @@ -61,6 +62,8 @@ TEST_FILES := settings KSFT_KHDR_INSTALL := 1 include ../lib.mk +include bpf/Makefile + $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma $(OUTPUT)/tcp_mmap: LDLIBS += -lpthread $(OUTPUT)/tcp_inq: LDLIBS += -lpthread diff --git a/tools/testing/selftests/net/bpf/Makefile b/tools/testing/selftests/net/bpf/Makefile new file mode 100644 index 000000000000..f91bf14bbee7 --- /dev/null +++ b/tools/testing/selftests/net/bpf/Makefile @@ -0,0 +1,14 @@ +# SPDX-License-Identifier: GPL-2.0 + +CLANG ?= clang +CCINCLUDE += -I../../bpf +CCINCLUDE += -I../../../../../usr/include/ + +TEST_CUSTOM_PROGS = $(OUTPUT)/bpf/nat6to4.o +all: $(TEST_CUSTOM_PROGS) + +$(OUTPUT)/%.o: %.c + $(CLANG) -O2 -target bpf -c $< $(CCINCLUDE) -o $@ + +clean: + rm -f $(TEST_CUSTOM_PROGS) diff --git a/tools/testing/selftests/net/bpf/nat6to4.c b/tools/testing/selftests/net/bpf/nat6to4.c new file mode 100644 index 000000000000..ac54c36b25fc --- /dev/null +++ b/tools/testing/selftests/net/bpf/nat6to4.c @@ -0,0 +1,285 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * This code is taken from the Android Open Source Project and the author + * (Maciej Żenczykowski) has gave permission to relicense it under the + * GPLv2. Therefore this program is free software; + * You can redistribute it and/or modify it under the terms of the GNU + * General Public License version 2 as published by the Free Software + * Foundation + + * The original headers, including the original license headers, are + * included below for completeness. + * + * Copyright (C) 2019 The Android Open Source Project + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +#include + +#include +#include + +#define IP_DF 0x4000 // Flag: "Don't Fragment" + +SEC("schedcls/ingress6/nat_6") +int sched_cls_ingress6_nat_6_prog(struct __sk_buff *skb) +{ + const int l2_header_size = sizeof(struct ethhdr); + void *data = (void *)(long)skb->data; + const void *data_end = (void *)(long)skb->data_end; + const struct ethhdr * const eth = data; // used iff is_ethernet + const struct ipv6hdr * const ip6 = (void *)(eth + 1); + + // Require ethernet dst mac address to be our unicast address. + if (skb->pkt_type != PACKET_HOST) + return TC_ACT_OK; + + // Must be meta-ethernet IPv6 frame + if (skb->protocol != bpf_htons(ETH_P_IPV6)) + return TC_ACT_OK; + + // Must have (ethernet and) ipv6 header + if (data + l2_header_size + sizeof(*ip6) > data_end) + return TC_ACT_OK; + + // Ethertype - if present - must be IPv6 + if (eth->h_proto != bpf_htons(ETH_P_IPV6)) + return TC_ACT_OK; + + // IP version must be 6 + if (ip6->version != 6) + return TC_ACT_OK; + // Maximum IPv6 payload length that can be translated to IPv4 + if (bpf_ntohs(ip6->payload_len) > 0xFFFF - sizeof(struct iphdr)) + return TC_ACT_OK; + switch (ip6->nexthdr) { + case IPPROTO_TCP: // For TCP & UDP the checksum neutrality of the chosen IPv6 + case IPPROTO_UDP: // address means there is no need to update their checksums. + case IPPROTO_GRE: // We do not need to bother looking at GRE/ESP headers, + case IPPROTO_ESP: // since there is never a checksum to update. + break; + default: // do not know how to handle anything else + return TC_ACT_OK; + } + + struct ethhdr eth2; // used iff is_ethernet + + eth2 = *eth; // Copy over the ethernet header (src/dst mac) + eth2.h_proto = bpf_htons(ETH_P_IP); // But replace the ethertype + + struct iphdr ip = { + .version = 4, // u4 + .ihl = sizeof(struct iphdr) / sizeof(__u32), // u4 + .tos = (ip6->priority << 4) + (ip6->flow_lbl[0] >> 4), // u8 + .tot_len = bpf_htons(bpf_ntohs(ip6->payload_len) + sizeof(struct iphdr)), // u16 + .id = 0, // u16 + .frag_off = bpf_htons(IP_DF), // u16 + .ttl = ip6->hop_limit, // u8 + .protocol = ip6->nexthdr, // u8 + .check = 0, // u16 + .saddr = 0x0201a8c0, // u32 + .daddr = 0x0101a8c0, // u32 + }; + + // Calculate the IPv4 one's complement checksum of the IPv4 header. + __wsum sum4 = 0; + + for (int i = 0; i < sizeof(ip) / sizeof(__u16); ++i) + sum4 += ((__u16 *)&ip)[i]; + + // Note that sum4 is guaranteed to be non-zero by virtue of ip.version == 4 + sum4 = (sum4 & 0xFFFF) + (sum4 >> 16); // collapse u32 into range 1 .. 0x1FFFE + sum4 = (sum4 & 0xFFFF) + (sum4 >> 16); // collapse any potential carry into u16 + ip.check = (__u16)~sum4; // sum4 cannot be zero, so this is never 0xFFFF + + // Calculate the *negative* IPv6 16-bit one's complement checksum of the IPv6 header. + __wsum sum6 = 0; + // We'll end up with a non-zero sum due to ip6->version == 6 (which has '0' bits) + for (int i = 0; i < sizeof(*ip6) / sizeof(__u16); ++i) + sum6 += ~((__u16 *)ip6)[i]; // note the bitwise negation + + // Note that there is no L4 checksum update: we are relying on the checksum neutrality + // of the ipv6 address chosen by netd's ClatdController. + + // Packet mutations begin - point of no return, but if this first modification fails + // the packet is probably still pristine, so let clatd handle it. + if (bpf_skb_change_proto(skb, bpf_htons(ETH_P_IP), 0)) + return TC_ACT_OK; + bpf_csum_update(skb, sum6); + + data = (void *)(long)skb->data; + data_end = (void *)(long)skb->data_end; + if (data + l2_header_size + sizeof(struct iphdr) > data_end) + return TC_ACT_SHOT; + + struct ethhdr *new_eth = data; + + // Copy over the updated ethernet header + *new_eth = eth2; + + // Copy over the new ipv4 header. + *(struct iphdr *)(new_eth + 1) = ip; + return bpf_redirect(skb->ifindex, BPF_F_INGRESS); +} + +SEC("schedcls/egress4/snat4") +int sched_cls_egress4_snat4_prog(struct __sk_buff *skb) +{ + const int l2_header_size = sizeof(struct ethhdr); + void *data = (void *)(long)skb->data; + const void *data_end = (void *)(long)skb->data_end; + const struct ethhdr *const eth = data; // used iff is_ethernet + const struct iphdr *const ip4 = (void *)(eth + 1); + + // Must be meta-ethernet IPv4 frame + if (skb->protocol != bpf_htons(ETH_P_IP)) + return TC_ACT_OK; + + // Must have ipv4 header + if (data + l2_header_size + sizeof(struct ipv6hdr) > data_end) + return TC_ACT_OK; + + // Ethertype - if present - must be IPv4 + if (eth->h_proto != bpf_htons(ETH_P_IP)) + return TC_ACT_OK; + + // IP version must be 4 + if (ip4->version != 4) + return TC_ACT_OK; + + // We cannot handle IP options, just standard 20 byte == 5 dword minimal IPv4 header + if (ip4->ihl != 5) + return TC_ACT_OK; + + // Maximum IPv6 payload length that can be translated to IPv4 + if (bpf_htons(ip4->tot_len) > 0xFFFF - sizeof(struct ipv6hdr)) + return TC_ACT_OK; + + // Calculate the IPv4 one's complement checksum of the IPv4 header. + __wsum sum4 = 0; + + for (int i = 0; i < sizeof(*ip4) / sizeof(__u16); ++i) + sum4 += ((__u16 *)ip4)[i]; + + // Note that sum4 is guaranteed to be non-zero by virtue of ip4->version == 4 + sum4 = (sum4 & 0xFFFF) + (sum4 >> 16); // collapse u32 into range 1 .. 0x1FFFE + sum4 = (sum4 & 0xFFFF) + (sum4 >> 16); // collapse any potential carry into u16 + // for a correct checksum we should get *a* zero, but sum4 must be positive, ie 0xFFFF + if (sum4 != 0xFFFF) + return TC_ACT_OK; + + // Minimum IPv4 total length is the size of the header + if (bpf_ntohs(ip4->tot_len) < sizeof(*ip4)) + return TC_ACT_OK; + + // We are incapable of dealing with IPv4 fragments + if (ip4->frag_off & ~bpf_htons(IP_DF)) + return TC_ACT_OK; + + switch (ip4->protocol) { + case IPPROTO_TCP: // For TCP & UDP the checksum neutrality of the chosen IPv6 + case IPPROTO_GRE: // address means there is no need to update their checksums. + case IPPROTO_ESP: // We do not need to bother looking at GRE/ESP headers, + break; // since there is never a checksum to update. + + case IPPROTO_UDP: // See above comment, but must also have UDP header... + if (data + sizeof(*ip4) + sizeof(struct udphdr) > data_end) + return TC_ACT_OK; + const struct udphdr *uh = (const struct udphdr *)(ip4 + 1); + // If IPv4/UDP checksum is 0 then fallback to clatd so it can calculate the + // checksum. Otherwise the network or more likely the NAT64 gateway might + // drop the packet because in most cases IPv6/UDP packets with a zero checksum + // are invalid. See RFC 6935. TODO: calculate checksum via bpf_csum_diff() + if (!uh->check) + return TC_ACT_OK; + break; + + default: // do not know how to handle anything else + return TC_ACT_OK; + } + struct ethhdr eth2; // used iff is_ethernet + + eth2 = *eth; // Copy over the ethernet header (src/dst mac) + eth2.h_proto = bpf_htons(ETH_P_IPV6); // But replace the ethertype + + struct ipv6hdr ip6 = { + .version = 6, // __u8:4 + .priority = ip4->tos >> 4, // __u8:4 + .flow_lbl = {(ip4->tos & 0xF) << 4, 0, 0}, // __u8[3] + .payload_len = bpf_htons(bpf_ntohs(ip4->tot_len) - 20), // __be16 + .nexthdr = ip4->protocol, // __u8 + .hop_limit = ip4->ttl, // __u8 + }; + ip6.saddr.in6_u.u6_addr32[0] = bpf_htonl(0x20010db8); + ip6.saddr.in6_u.u6_addr32[1] = 0; + ip6.saddr.in6_u.u6_addr32[2] = 0; + ip6.saddr.in6_u.u6_addr32[3] = bpf_htonl(1); + ip6.daddr.in6_u.u6_addr32[0] = bpf_htonl(0x20010db8); + ip6.daddr.in6_u.u6_addr32[1] = 0; + ip6.daddr.in6_u.u6_addr32[2] = 0; + ip6.daddr.in6_u.u6_addr32[3] = bpf_htonl(2); + + // Calculate the IPv6 16-bit one's complement checksum of the IPv6 header. + __wsum sum6 = 0; + // We'll end up with a non-zero sum due to ip6.version == 6 + for (int i = 0; i < sizeof(ip6) / sizeof(__u16); ++i) + sum6 += ((__u16 *)&ip6)[i]; + + // Packet mutations begin - point of no return, but if this first modification fails + // the packet is probably still pristine, so let clatd handle it. + if (bpf_skb_change_proto(skb, bpf_htons(ETH_P_IPV6), 0)) + return TC_ACT_OK; + + // This takes care of updating the skb->csum field for a CHECKSUM_COMPLETE packet. + // In such a case, skb->csum is a 16-bit one's complement sum of the entire payload, + // thus we need to subtract out the ipv4 header's sum, and add in the ipv6 header's sum. + // However, we've already verified the ipv4 checksum is correct and thus 0. + // Thus we only need to add the ipv6 header's sum. + // + // bpf_csum_update() always succeeds if the skb is CHECKSUM_COMPLETE and returns an error + // (-ENOTSUPP) if it isn't. So we just ignore the return code (see above for more details). + bpf_csum_update(skb, sum6); + + // bpf_skb_change_proto() invalidates all pointers - reload them. + data = (void *)(long)skb->data; + data_end = (void *)(long)skb->data_end; + + // I cannot think of any valid way for this error condition to trigger, however I do + // believe the explicit check is required to keep the in kernel ebpf verifier happy. + if (data + l2_header_size + sizeof(ip6) > data_end) + return TC_ACT_SHOT; + + struct ethhdr *new_eth = data; + + // Copy over the updated ethernet header + *new_eth = eth2; + // Copy over the new ipv4 header. + *(struct ipv6hdr *)(new_eth + 1) = ip6; + return TC_ACT_OK; +} + +char _license[] SEC("license") = ("GPL"); diff --git a/tools/testing/selftests/net/udpgro_frglist.sh b/tools/testing/selftests/net/udpgro_frglist.sh new file mode 100755 index 000000000000..807b74c8fd80 --- /dev/null +++ b/tools/testing/selftests/net/udpgro_frglist.sh @@ -0,0 +1,101 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Run a series of udpgro benchmarks + +readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)" + +cleanup() { + local -r jobs="$(jobs -p)" + local -r ns="$(ip netns list|grep $PEER_NS)" + + [ -n "${jobs}" ] && kill -INT ${jobs} 2>/dev/null + [ -n "$ns" ] && ip netns del $ns 2>/dev/null +} +trap cleanup EXIT + +run_one() { + # use 'rx' as separator between sender args and receiver args + local -r all="$@" + local -r tx_args=${all%rx*} + local rx_args=${all#*rx} + + + + ip netns add "${PEER_NS}" + ip -netns "${PEER_NS}" link set lo up + ip link add type veth + ip link set dev veth0 up + ip addr add dev veth0 192.168.1.2/24 + ip addr add dev veth0 2001:db8::2/64 nodad + + ip link set dev veth1 netns "${PEER_NS}" + ip -netns "${PEER_NS}" addr add dev veth1 192.168.1.1/24 + ip -netns "${PEER_NS}" addr add dev veth1 2001:db8::1/64 nodad + ip -netns "${PEER_NS}" link set dev veth1 up + ip netns exec "${PEER_NS}" ethtool -K veth1 rx-gro-list on + + + ip -n "${PEER_NS}" link set veth1 xdp object ../bpf/xdp_dummy.o section xdp_dummy + tc -n "${PEER_NS}" qdisc add dev veth1 clsact + tc -n "${PEER_NS}" filter add dev veth1 ingress prio 4 protocol ipv6 bpf object-file ../bpf/nat6to4.o section schedcls/ingress6/nat_6 direct-action + tc -n "${PEER_NS}" filter add dev veth1 egress prio 4 protocol ip bpf object-file ../bpf/nat6to4.o section schedcls/egress4/snat4 direct-action + echo ${rx_args} + ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${rx_args} -r & + + # Hack: let bg programs complete the startup + sleep 0.1 + ./udpgso_bench_tx ${tx_args} +} + +run_in_netns() { + local -r args=$@ + echo ${args} + ./in_netns.sh $0 __subprocess ${args} +} + +run_udp() { + local -r args=$@ + + echo "udp gso - over veth touching data" + run_in_netns ${args} -u -S 0 rx -4 -v + + echo "udp gso and gro - over veth touching data" + run_in_netns ${args} -S 0 rx -4 -G +} + +run_tcp() { + local -r args=$@ + + echo "tcp - over veth touching data" + run_in_netns ${args} -t rx -4 -t +} + +run_all() { + local -r core_args="-l 4" + local -r ipv4_args="${core_args} -4 -D 192.168.1.1" + local -r ipv6_args="${core_args} -6 -D 2001:db8::1" + + echo "ipv6" + run_tcp "${ipv6_args}" + run_udp "${ipv6_args}" +} + +if [ ! -f ../bpf/xdp_dummy.o ]; then + echo "Missing xdp_dummy helper. Build bpf selftest first" + exit -1 +fi + +if [ ! -f bpf/nat6to4.o ]; then + echo "Missing nat6to4 helper. Build bpfnat6to4.o selftest first" + exit -1 +fi + +if [[ $# -eq 0 ]]; then + run_all +elif [[ $1 == "__subprocess" ]]; then + shift + run_one $@ +else + run_in_netns $@ +fi -- cgit v1.2.3 From c7e34c1e263f71b2dd78a12b6b7259a89b3a1f44 Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Fri, 6 May 2022 11:42:12 +0300 Subject: mailmap: update Kalle Valo's email MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit I switched to use my kernel.org address, the old kvalo@codeaurora.org address doesn't work anymore. Signed-off-by: Kalle Valo Tested-by: Toke Høiland-Jørgensen Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20220506084212.8952-1-kvalo@kernel.org --- .mailmap | 1 + 1 file changed, 1 insertion(+) diff --git a/.mailmap b/.mailmap index b9d358217586..2c05d7257509 100644 --- a/.mailmap +++ b/.mailmap @@ -204,6 +204,7 @@ Juha Yrjola Juha Yrjola Juha Yrjola Julien Thierry +Kalle Valo Kalyan Thota Kay Sievers Kees Cook -- cgit v1.2.3 From a59d55568d02bbbdf9c0cc15be9580180f855b4f Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Thu, 5 May 2022 23:04:21 +0200 Subject: mac80211_hwsim: fix RCU protected chanctx access We need to RCU protect the chanctx_conf access, so do that. Fixes: 585625c955b1 ("mac80211_hwsim: check TX and STA bandwidth") Signed-off-by: Johannes Berg Link: https://lore.kernel.org/r/20220505230421.fb8055c081a2.Ic6da3307c77a909bd61a0ea25dc2a4b08fe1b03f@changeid Signed-off-by: Johannes Berg --- drivers/net/wireless/mac80211_hwsim.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c index 28bfa7b7b73c..3ac3693dbecb 100644 --- a/drivers/net/wireless/mac80211_hwsim.c +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -2202,11 +2202,14 @@ mac80211_hwsim_sta_rc_update(struct ieee80211_hw *hw, if (!data->use_chanctx) { confbw = data->bw; } else { - struct ieee80211_chanctx_conf *chanctx_conf = - rcu_dereference(vif->chanctx_conf); + struct ieee80211_chanctx_conf *chanctx_conf; + + rcu_read_lock(); + chanctx_conf = rcu_dereference(vif->chanctx_conf); if (!WARN_ON(!chanctx_conf)) confbw = chanctx_conf->def.width; + rcu_read_unlock(); } WARN(bw > hwsim_get_chanwidth(confbw), -- cgit v1.2.3 From 9e2db50f1ef2238fc2f71c5de1c0418b7a5b0ea2 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Thu, 5 May 2022 23:04:22 +0200 Subject: mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protection This is needed since it might use (and pass out) pointers to e.g. keys protected by RCU. Can't really happen here as the frames aren't encrypted, but we need to still adhere to the rules. Fixes: cacfddf82baf ("mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work") Signed-off-by: Johannes Berg Link: https://lore.kernel.org/r/20220505230421.5f139f9de173.I77ae111a28f7c0e9fd1ebcee7f39dbec5c606770@changeid Signed-off-by: Johannes Berg --- drivers/net/wireless/mac80211_hwsim.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c index 3ac3693dbecb..e9ec63e0e395 100644 --- a/drivers/net/wireless/mac80211_hwsim.c +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -2478,11 +2478,13 @@ static void hw_scan_work(struct work_struct *work) if (req->ie_len) skb_put_data(probe, req->ie, req->ie_len); + rcu_read_lock(); if (!ieee80211_tx_prepare_skb(hwsim->hw, hwsim->hw_scan_vif, probe, hwsim->tmp_chan->band, NULL)) { + rcu_read_unlock(); kfree_skb(probe); continue; } @@ -2490,6 +2492,7 @@ static void hw_scan_work(struct work_struct *work) local_bh_disable(); mac80211_hwsim_tx_frame(hwsim->hw, probe, hwsim->tmp_chan); + rcu_read_unlock(); local_bh_enable(); } } -- cgit v1.2.3 From f971e1887fdb3ab500c9bebf4b98f62d49a20655 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Fri, 6 May 2022 10:21:38 +0200 Subject: nl80211: fix locking in nl80211_set_tx_bitrate_mask() This accesses the wdev's chandef etc., so cannot safely be used without holding the lock. Signed-off-by: Johannes Berg Link: https://lore.kernel.org/r/20220506102136.06b7205419e6.I2a87c05fbd8bc5e565e84d190d4cfd2e92695a90@changeid Signed-off-by: Johannes Berg --- net/wireless/nl80211.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index aa6094c3c9b0..1a3551b6d18b 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -11666,18 +11666,23 @@ static int nl80211_set_tx_bitrate_mask(struct sk_buff *skb, struct cfg80211_bitrate_mask mask; struct cfg80211_registered_device *rdev = info->user_ptr[0]; struct net_device *dev = info->user_ptr[1]; + struct wireless_dev *wdev = dev->ieee80211_ptr; int err; if (!rdev->ops->set_bitrate_mask) return -EOPNOTSUPP; + wdev_lock(wdev); err = nl80211_parse_tx_bitrate_mask(info, info->attrs, NL80211_ATTR_TX_RATES, &mask, dev, true); if (err) - return err; + goto out; - return rdev_set_bitrate_mask(rdev, dev, NULL, &mask); + err = rdev_set_bitrate_mask(rdev, dev, NULL, &mask); +out: + wdev_unlock(wdev); + return err; } static int nl80211_register_mgmt(struct sk_buff *skb, struct genl_info *info) -- cgit v1.2.3 From a36e07dfe6ee71e209383ea9288cd8d1617e14f9 Mon Sep 17 00:00:00 2001 From: Gleb Fotengauer-Malinovskiy Date: Fri, 6 May 2022 17:24:54 +0000 Subject: rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition The definition of RFKILL_IOCTL_MAX_SIZE introduced by commit 54f586a91532 ("rfkill: make new event layout opt-in") is unusable since it is based on RFKILL_IOC_EXT_SIZE which has not been defined. Fix that by replacing the undefined constant with the constant which is intended to be used in this definition. Fixes: 54f586a91532 ("rfkill: make new event layout opt-in") Cc: stable@vger.kernel.org # 5.11+ Signed-off-by: Gleb Fotengauer-Malinovskiy Signed-off-by: Dmitry V. Levin Link: https://lore.kernel.org/r/20220506172454.120319-1-glebfm@altlinux.org [add commit message provided later by Dmitry] Signed-off-by: Johannes Berg --- include/uapi/linux/rfkill.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/rfkill.h b/include/uapi/linux/rfkill.h index 283c5a7b3f2c..db6c8588c1d0 100644 --- a/include/uapi/linux/rfkill.h +++ b/include/uapi/linux/rfkill.h @@ -184,7 +184,7 @@ struct rfkill_event_ext { #define RFKILL_IOC_NOINPUT 1 #define RFKILL_IOCTL_NOINPUT _IO(RFKILL_IOC_MAGIC, RFKILL_IOC_NOINPUT) #define RFKILL_IOC_MAX_SIZE 2 -#define RFKILL_IOCTL_MAX_SIZE _IOW(RFKILL_IOC_MAGIC, RFKILL_IOC_EXT_SIZE, __u32) +#define RFKILL_IOCTL_MAX_SIZE _IOW(RFKILL_IOC_MAGIC, RFKILL_IOC_MAX_SIZE, __u32) /* and that's all userspace gets */ -- cgit v1.2.3 From e4b1045bf9cfec6f70ac6d3783be06c3a88dcb25 Mon Sep 17 00:00:00 2001 From: Yang Yingliang Date: Fri, 6 May 2022 11:40:40 +0800 Subject: ionic: fix missing pci_release_regions() on error in ionic_probe() If ionic_map_bars() fails, pci_release_regions() need be called. Fixes: fbfb8031533c ("ionic: Add hardware init and device commands") Signed-off-by: Yang Yingliang Link: https://lore.kernel.org/r/20220506034040.2614129-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c b/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c index 6ffc62c41165..0a7a757494bc 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c @@ -256,7 +256,7 @@ static int ionic_probe(struct pci_dev *pdev, const struct pci_device_id *ent) err = ionic_map_bars(ionic); if (err) - goto err_out_pci_disable_device; + goto err_out_pci_release_regions; /* Configure the device */ err = ionic_setup(ionic); @@ -360,6 +360,7 @@ err_out_teardown: err_out_unmap_bars: ionic_unmap_bars(ionic); +err_out_pci_release_regions: pci_release_regions(pdev); err_out_pci_disable_device: pci_disable_device(pdev); -- cgit v1.2.3 From 51ca86b4c9c7c75f5630fa0dbe5f8f0bd98e3c3e Mon Sep 17 00:00:00 2001 From: Yang Yingliang Date: Fri, 6 May 2022 17:42:50 +0800 Subject: ethernet: tulip: fix missing pci_disable_device() on error in tulip_init_one() Fix the missing pci_disable_device() before return from tulip_init_one() in the error handling case. Reported-by: Hulk Robot Signed-off-by: Yang Yingliang Link: https://lore.kernel.org/r/20220506094250.3630615-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/dec/tulip/tulip_core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/dec/tulip/tulip_core.c b/drivers/net/ethernet/dec/tulip/tulip_core.c index 79df5a72877b..0040dcaab945 100644 --- a/drivers/net/ethernet/dec/tulip/tulip_core.c +++ b/drivers/net/ethernet/dec/tulip/tulip_core.c @@ -1399,8 +1399,10 @@ static int tulip_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) /* alloc_etherdev ensures aligned and zeroed private structures */ dev = alloc_etherdev (sizeof (*tp)); - if (!dev) + if (!dev) { + pci_disable_device(pdev); return -ENOMEM; + } SET_NETDEV_DEV(dev, &pdev->dev); if (pci_resource_len (pdev, 0) < tulip_tbl[chip_idx].io_size) { @@ -1785,6 +1787,7 @@ err_out_free_res: err_out_free_netdev: free_netdev (dev); + pci_disable_device(pdev); return -ENODEV; } -- cgit v1.2.3 From 4bd46bb037f8e1883dbe1fc9e79896b7f885db3f Mon Sep 17 00:00:00 2001 From: Jonathan Lemon Date: Fri, 6 May 2022 15:37:39 -0700 Subject: ptp: ocp: Use DIV64_U64_ROUND_UP for rounding. The initial code used roundup() to round the starting time to a multiple of a period. This generated an error on 32-bit systems, so was replaced with DIV_ROUND_UP_ULL(). However, this truncates to 32-bits on a 64-bit system. Replace with DIV64_U64_ROUND_UP() instead. Fixes: b325af3cfab9 ("ptp: ocp: Add signal generators and update sysfs nodes") Signed-off-by: Jonathan Lemon Link: https://lore.kernel.org/r/20220506223739.1930-2-jonathan.lemon@gmail.com Signed-off-by: Jakub Kicinski --- drivers/ptp/ptp_ocp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/ptp/ptp_ocp.c b/drivers/ptp/ptp_ocp.c index 0feaa4b45317..dd45471f6780 100644 --- a/drivers/ptp/ptp_ocp.c +++ b/drivers/ptp/ptp_ocp.c @@ -1557,7 +1557,7 @@ ptp_ocp_signal_set(struct ptp_ocp *bp, int gen, struct ptp_ocp_signal *s) start_ns = ktime_set(ts.tv_sec, ts.tv_nsec) + NSEC_PER_MSEC; if (!s->start) { /* roundup() does not work on 32-bit systems */ - s->start = DIV_ROUND_UP_ULL(start_ns, s->period); + s->start = DIV64_U64_ROUND_UP(start_ns, s->period); s->start = ktime_add(s->start, s->phase); } -- cgit v1.2.3 From ee1444b5e1df4155b591d0d9b1e72853a99ea861 Mon Sep 17 00:00:00 2001 From: Jesse Brandeburg Date: Fri, 6 May 2022 18:10:38 -0700 Subject: dim: initialize all struct fields MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The W=2 build pointed out that the code wasn't initializing all the variables in the dim_cq_moder declarations with the struct initializers. The net change here is zero since these structs were already static const globals and were initialized with zeros by the compiler, but removing compiler warnings has value in and of itself. lib/dim/net_dim.c: At top level: lib/dim/net_dim.c:54:9: warning: missing initializer for field ‘comps’ of ‘const struct dim_cq_moder’ [-Wmissing-field-initializers] 54 | NET_DIM_RX_EQE_PROFILES, | ^~~~~~~~~~~~~~~~~~~~~~~ In file included from lib/dim/net_dim.c:6: ./include/linux/dim.h:45:13: note: ‘comps’ declared here 45 | u16 comps; | ^~~~~ and repeats for the tx struct, and once you fix the comps entry then the cq_period_mode field needs the same treatment. Use the commonly accepted style to indicate to the compiler that we know what we're doing, and add a comma at the end of each struct initializer to clean up the issue, and use explicit initializers for the fields we are initializing which makes the compiler happy. While here and fixing these lines, clean up the code slightly with a fix for the super long lines by removing the word "_MODERATION" from a couple defines only used in this file. Fixes: f8be17b81d44 ("lib/dim: Fix -Wunused-const-variable warnings") Signed-off-by: Jesse Brandeburg Link: https://lore.kernel.org/r/20220507011038.14568-1-jesse.brandeburg@intel.com Signed-off-by: Jakub Kicinski --- lib/dim/net_dim.c | 44 ++++++++++++++++++++++---------------------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/lib/dim/net_dim.c b/lib/dim/net_dim.c index 06811d866775..53f6b9c6e936 100644 --- a/lib/dim/net_dim.c +++ b/lib/dim/net_dim.c @@ -12,41 +12,41 @@ * Each profile size must be of NET_DIM_PARAMS_NUM_PROFILES */ #define NET_DIM_PARAMS_NUM_PROFILES 5 -#define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256 -#define NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE 128 +#define NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE 256 +#define NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE 128 #define NET_DIM_DEF_PROFILE_CQE 1 #define NET_DIM_DEF_PROFILE_EQE 1 #define NET_DIM_RX_EQE_PROFILES { \ - {1, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \ - {8, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \ - {64, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \ - {128, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \ - {256, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \ + {.usec = 1, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \ + {.usec = 8, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \ + {.usec = 64, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \ + {.usec = 128, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \ + {.usec = 256, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,} \ } #define NET_DIM_RX_CQE_PROFILES { \ - {2, 256}, \ - {8, 128}, \ - {16, 64}, \ - {32, 64}, \ - {64, 64} \ + {.usec = 2, .pkts = 256,}, \ + {.usec = 8, .pkts = 128,}, \ + {.usec = 16, .pkts = 64,}, \ + {.usec = 32, .pkts = 64,}, \ + {.usec = 64, .pkts = 64,} \ } #define NET_DIM_TX_EQE_PROFILES { \ - {1, NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE}, \ - {8, NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE}, \ - {32, NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE}, \ - {64, NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE}, \ - {128, NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE} \ + {.usec = 1, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \ + {.usec = 8, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \ + {.usec = 32, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \ + {.usec = 64, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \ + {.usec = 128, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,} \ } #define NET_DIM_TX_CQE_PROFILES { \ - {5, 128}, \ - {8, 64}, \ - {16, 32}, \ - {32, 32}, \ - {64, 32} \ + {.usec = 5, .pkts = 128,}, \ + {.usec = 8, .pkts = 64,}, \ + {.usec = 16, .pkts = 32,}, \ + {.usec = 32, .pkts = 32,}, \ + {.usec = 64, .pkts = 32,} \ } static const struct dim_cq_moder -- cgit v1.2.3 From 91a7cda1f4b8bdf770000a3b60640576dafe0cec Mon Sep 17 00:00:00 2001 From: Francesco Dolcini Date: Fri, 6 May 2022 08:08:15 +0200 Subject: net: phy: Fix race condition on link status change This fixes the following error caused by a race condition between phydev->adjust_link() and a MDIO transaction in the phy interrupt handler. The issue was reproduced with the ethernet FEC driver and a micrel KSZ9031 phy. [ 146.195696] fec 2188000.ethernet eth0: MDIO read timeout [ 146.201779] ------------[ cut here ]------------ [ 146.206671] WARNING: CPU: 0 PID: 571 at drivers/net/phy/phy.c:942 phy_error+0x24/0x6c [ 146.214744] Modules linked in: bnep imx_vdoa imx_sdma evbug [ 146.220640] CPU: 0 PID: 571 Comm: irq/128-2188000 Not tainted 5.18.0-rc3-00080-gd569e86915b7 #9 [ 146.229563] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 146.236257] unwind_backtrace from show_stack+0x10/0x14 [ 146.241640] show_stack from dump_stack_lvl+0x58/0x70 [ 146.246841] dump_stack_lvl from __warn+0xb4/0x24c [ 146.251772] __warn from warn_slowpath_fmt+0x5c/0xd4 [ 146.256873] warn_slowpath_fmt from phy_error+0x24/0x6c [ 146.262249] phy_error from kszphy_handle_interrupt+0x40/0x48 [ 146.268159] kszphy_handle_interrupt from irq_thread_fn+0x1c/0x78 [ 146.274417] irq_thread_fn from irq_thread+0xf0/0x1dc [ 146.279605] irq_thread from kthread+0xe4/0x104 [ 146.284267] kthread from ret_from_fork+0x14/0x28 [ 146.289164] Exception stack(0xe6fa1fb0 to 0xe6fa1ff8) [ 146.294448] 1fa0: 00000000 00000000 00000000 00000000 [ 146.302842] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 146.311281] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 146.318262] irq event stamp: 12325 [ 146.321780] hardirqs last enabled at (12333): [] __up_console_sem+0x50/0x60 [ 146.330013] hardirqs last disabled at (12342): [] __up_console_sem+0x3c/0x60 [ 146.338259] softirqs last enabled at (12324): [] __do_softirq+0x2c0/0x624 [ 146.346311] softirqs last disabled at (12319): [] __irq_exit_rcu+0x138/0x178 [ 146.354447] ---[ end trace 0000000000000000 ]--- With the FEC driver phydev->adjust_link() calls fec_enet_adjust_link() calls fec_stop()/fec_restart() and both these function reset and temporary disable the FEC disrupting any MII transaction that could be happening at the same time. fec_enet_adjust_link() and phy_read() can be running at the same time when we have one additional interrupt before the phy_state_machine() is able to terminate. Thread 1 (phylib WQ) | Thread 2 (phy interrupt) | | phy_interrupt() <-- PHY IRQ | handle_interrupt() | phy_read() | phy_trigger_machine() | --> schedule phylib WQ | | phy_state_machine() | phy_check_link_status() | phy_link_change() | phydev->adjust_link() | fec_enet_adjust_link() | --> FEC reset | phy_interrupt() <-- PHY IRQ | phy_read() | Fix this by acquiring the phydev lock in phy_interrupt(). Link: https://lore.kernel.org/all/20220422152612.GA510015@francesco-nb.int.toradex.com/ Fixes: c974bdbc3e77 ("net: phy: Use threaded IRQ, to allow IRQ from sleeping devices") cc: Signed-off-by: Francesco Dolcini Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/20220506060815.327382-1-francesco.dolcini@toradex.com Signed-off-by: Jakub Kicinski --- drivers/net/phy/phy.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index beb2b66da132..f122026c4682 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -970,8 +970,13 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat) { struct phy_device *phydev = phy_dat; struct phy_driver *drv = phydev->drv; + irqreturn_t ret; - return drv->handle_interrupt(phydev); + mutex_lock(&phydev->lock); + ret = drv->handle_interrupt(phydev); + mutex_unlock(&phydev->lock); + + return ret; } /** -- cgit v1.2.3 From 630fd4822af2374cd75c682b7665dcb367613765 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 7 May 2022 16:45:50 +0300 Subject: net: dsa: flush switchdev workqueue on bridge join error path There is a race between switchdev_bridge_port_offload() and the dsa_port_switchdev_sync_attrs() call right below it. When switchdev_bridge_port_offload() finishes, FDB entries have been replayed by the bridge, but are scheduled for deferred execution later. However dsa_port_switchdev_sync_attrs -> dsa_port_can_apply_vlan_filtering() may impose restrictions on the vlan_filtering attribute and refuse offloading. When this happens, the delayed FDB entries will dereference dp->bridge, which is a NULL pointer because we have stopped the process of offloading this bridge. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Workqueue: dsa_ordered dsa_slave_switchdev_event_work pc : dsa_port_bridge_host_fdb_del+0x64/0x100 lr : dsa_slave_switchdev_event_work+0x130/0x1bc Call trace: dsa_port_bridge_host_fdb_del+0x64/0x100 dsa_slave_switchdev_event_work+0x130/0x1bc process_one_work+0x294/0x670 worker_thread+0x80/0x460 ---[ end trace 0000000000000000 ]--- Error: dsa_core: Must first remove VLAN uppers having VIDs also present in bridge. Fix the bug by doing what we do on the normal bridge leave path as well, which is to wait until the deferred FDB entries complete executing, then exit. The placement of dsa_flush_workqueue() after switchdev_bridge_port_unoffload() guarantees that both the FDB additions and deletions on rollback are waited for. Fixes: d7d0d423dbaa ("net: dsa: flush switchdev workqueue when leaving the bridge") Signed-off-by: Vladimir Oltean Link: https://lore.kernel.org/r/20220507134550.1849834-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- net/dsa/port.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/dsa/port.c b/net/dsa/port.c index cdc56ba11f52..bdccb613285d 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -451,6 +451,7 @@ out_rollback_unoffload: switchdev_bridge_port_unoffload(brport_dev, dp, &dsa_slave_switchdev_notifier, &dsa_slave_switchdev_blocking_notifier); + dsa_flush_workqueue(); out_rollback_unbridge: dsa_broadcast(DSA_NOTIFIER_BRIDGE_LEAVE, &info); out_rollback: -- cgit v1.2.3 From 1809c30b6e5a83a1de1435fe01aaa4de4d626a7c Mon Sep 17 00:00:00 2001 From: Manuel Ullmann Date: Wed, 4 May 2022 21:30:44 +0200 Subject: net: atlantic: always deep reset on pm op, fixing up my null deref regression The impact of this regression is the same for resume that I saw on thaw: the kernel hangs and nothing except SysRq rebooting can be done. Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs"), where I disabled deep pm resets in suspend and resume, trying to make sense of the atl_resume_common() deep parameter in the first place. It turns out, that atlantic always has to deep reset on pm operations. Even though I expected that and tested resume, I screwed up by kexec-rebooting into an unpatched kernel, thus missing the breakage. This fixup obsoletes the deep parameter of atl_resume_common, but I leave the cleanup for the maintainers to post to mainline. Suspend and hibernation were successfully tested by the reporters. Fixes: cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs") Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/ Reported-by: Jordan Leppert Reported-by: Holger Hoffstaette Tested-by: Jordan Leppert Tested-by: Holger Hoffstaette CC: # 5.10+ Signed-off-by: Manuel Ullmann Link: https://lore.kernel.org/r/87bkw8dfmp.fsf@posteo.de Signed-off-by: Paolo Abeni --- drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c index 3a529ee8c834..831833911a52 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c @@ -449,7 +449,7 @@ static int aq_pm_freeze(struct device *dev) static int aq_pm_suspend_poweroff(struct device *dev) { - return aq_suspend_common(dev, false); + return aq_suspend_common(dev, true); } static int aq_pm_thaw(struct device *dev) @@ -459,7 +459,7 @@ static int aq_pm_thaw(struct device *dev) static int aq_pm_resume_restore(struct device *dev) { - return atl_resume_common(dev, false); + return atl_resume_common(dev, true); } static const struct dev_pm_ops aq_pm_ops = { -- cgit v1.2.3 From dc5306a8c0eace6c113aded2e36ae5e15fdca4d7 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Sun, 8 May 2022 03:22:17 -0700 Subject: decnet: Use container_of() for struct dn_neigh casts Clang's structure layout randomization feature gets upset when it sees struct neighbor (which is randomized) cast to struct dn_neigh: net/decnet/dn_route.c:1123:15: error: casting from randomized structure pointer type 'struct neighbour *' to 'struct dn_neigh *' gateway = ((struct dn_neigh *)neigh)->addr; ^ Update all the open-coded casts to use container_of() to do the conversion instead of depending on strict member ordering. Reported-by: kernel test robot Link: https://lore.kernel.org/lkml/202205041247.WKBEHGS5-lkp@intel.com Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Yajun Deng Cc: Zheng Yongjun Cc: Bill Wendling Cc: linux-decnet-user@lists.sourceforge.net Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220508102217.2647184-1-keescook@chromium.org Signed-off-by: Paolo Abeni --- net/decnet/dn_dev.c | 4 ++-- net/decnet/dn_neigh.c | 3 ++- net/decnet/dn_route.c | 4 ++-- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c index 0ee7d4c0c955..a09ba642b5e7 100644 --- a/net/decnet/dn_dev.c +++ b/net/decnet/dn_dev.c @@ -854,7 +854,7 @@ static void dn_send_endnode_hello(struct net_device *dev, struct dn_ifaddr *ifa) memcpy(msg->neighbor, dn_hiord, ETH_ALEN); if (dn_db->router) { - struct dn_neigh *dn = (struct dn_neigh *)dn_db->router; + struct dn_neigh *dn = container_of(dn_db->router, struct dn_neigh, n); dn_dn2eth(msg->neighbor, dn->addr); } @@ -902,7 +902,7 @@ static void dn_send_router_hello(struct net_device *dev, struct dn_ifaddr *ifa) { int n; struct dn_dev *dn_db = rcu_dereference_raw(dev->dn_ptr); - struct dn_neigh *dn = (struct dn_neigh *)dn_db->router; + struct dn_neigh *dn = container_of(dn_db->router, struct dn_neigh, n); struct sk_buff *skb; size_t size; unsigned char *ptr; diff --git a/net/decnet/dn_neigh.c b/net/decnet/dn_neigh.c index 94b306f6d551..fbd98ac853ea 100644 --- a/net/decnet/dn_neigh.c +++ b/net/decnet/dn_neigh.c @@ -426,7 +426,8 @@ int dn_neigh_router_hello(struct net *net, struct sock *sk, struct sk_buff *skb) if (!dn_db->router) { dn_db->router = neigh_clone(neigh); } else { - if (msg->priority > ((struct dn_neigh *)dn_db->router)->priority) + if (msg->priority > container_of(dn_db->router, + struct dn_neigh, n)->priority) neigh_release(xchg(&dn_db->router, neigh_clone(neigh))); } } diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c index 7e85f2a1ae25..d1d78a463a06 100644 --- a/net/decnet/dn_route.c +++ b/net/decnet/dn_route.c @@ -1120,7 +1120,7 @@ source_ok: /* Ok then, we assume its directly connected and move on */ select_source: if (neigh) - gateway = ((struct dn_neigh *)neigh)->addr; + gateway = container_of(neigh, struct dn_neigh, n)->addr; if (gateway == 0) gateway = fld.daddr; if (fld.saddr == 0) { @@ -1429,7 +1429,7 @@ static int dn_route_input_slow(struct sk_buff *skb) /* Use the default router if there is one */ neigh = neigh_clone(dn_db->router); if (neigh) { - gateway = ((struct dn_neigh *)neigh)->addr; + gateway = container_of(neigh, struct dn_neigh, n)->addr; goto make_route; } -- cgit v1.2.3 From 12a4d677b1c34717443470c1492fe520638ef39a Mon Sep 17 00:00:00 2001 From: Wan Jiabing Date: Mon, 9 May 2022 22:45:19 +0800 Subject: net: phy: micrel: Fix incorrect variable type in micrel In lanphy_read_page_reg, calling __phy_read() might return a negative error code. Use 'int' to check the error code. Fixes: 7c2dcfa295b1 ("net: phy: micrel: Add support for LAN8804 PHY") Signed-off-by: Wan Jiabing Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/20220509144519.2343399-1-wanjiabing@vivo.com Signed-off-by: Jakub Kicinski --- drivers/net/phy/micrel.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 9d7dafed3931..cd9aa353b653 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -1743,7 +1743,7 @@ static int ksz886x_cable_test_get_status(struct phy_device *phydev, static int lanphy_read_page_reg(struct phy_device *phydev, int page, u32 addr) { - u32 data; + int data; phy_lock_mdio_bus(phydev); __phy_write(phydev, LAN_EXT_PAGE_ACCESS_CONTROL, page); @@ -2444,8 +2444,7 @@ static int lan8804_config_init(struct phy_device *phydev) static irqreturn_t lan8814_handle_interrupt(struct phy_device *phydev) { - u16 tsu_irq_status; - int irq_status; + int irq_status, tsu_irq_status; irq_status = phy_read(phydev, LAN8814_INTS); if (irq_status > 0 && (irq_status & LAN8814_INT_LINK)) -- cgit v1.2.3 From 0807ce0b010418a191e0e4009803b2d74c3245d5 Mon Sep 17 00:00:00 2001 From: Yang Yingliang Date: Tue, 10 May 2022 11:13:16 +0800 Subject: net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe() Switch to using pcim_enable_device() to avoid missing pci_disable_device(). Reported-by: Hulk Robot Signed-off-by: Yang Yingliang Link: https://lore.kernel.org/r/20220510031316.1780409-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c index fcf17d8a0494..644bb54f5f02 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c @@ -181,7 +181,7 @@ static int stmmac_pci_probe(struct pci_dev *pdev, return -ENOMEM; /* Enable pci device */ - ret = pci_enable_device(pdev); + ret = pcim_enable_device(pdev); if (ret) { dev_err(&pdev->dev, "%s: ERROR: failed to enable device\n", __func__); @@ -241,8 +241,6 @@ static void stmmac_pci_remove(struct pci_dev *pdev) pcim_iounmap_regions(pdev, BIT(i)); break; } - - pci_disable_device(pdev); } static int __maybe_unused stmmac_pci_suspend(struct device *dev) -- cgit v1.2.3 From 62e0ae0f4020250f961cf8d0103a4621be74e077 Mon Sep 17 00:00:00 2001 From: Grant Grundler Date: Mon, 9 May 2022 19:28:23 -0700 Subject: net: atlantic: fix "frag[0] not initialized" In aq_ring_rx_clean(), if buff->is_eop is not set AND buff->len < AQ_CFG_RX_HDR_SIZE, then hdr_len remains equal to buff->len and skb_add_rx_frag(xxx, *0*, ...) is not called. The loop following this code starts calling skb_add_rx_frag() starting with i=1 and thus frag[0] is never initialized. Since i is initialized to zero at the top of the primary loop, we can just reference and post-increment i instead of hardcoding the 0 when calling skb_add_rx_frag() the first time. Reported-by: Aashay Shringarpure Reported-by: Yi Chou Reported-by: Shervin Oloumi Signed-off-by: Grant Grundler Signed-off-by: David S. Miller --- drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c index 77e76c9efd32..440423b0e8ea 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c @@ -446,7 +446,7 @@ int aq_ring_rx_clean(struct aq_ring_s *self, ALIGN(hdr_len, sizeof(long))); if (buff->len - hdr_len > 0) { - skb_add_rx_frag(skb, 0, buff->rxdata.page, + skb_add_rx_frag(skb, i++, buff->rxdata.page, buff->rxdata.pg_off + hdr_len, buff->len - hdr_len, AQ_CFG_RX_FRAME_MAX); @@ -455,7 +455,6 @@ int aq_ring_rx_clean(struct aq_ring_s *self, if (!buff->is_eop) { buff_ = buff; - i = 1U; do { next_ = buff_->next; buff_ = &self->buff_ring[next_]; -- cgit v1.2.3 From 79784d77ebbd3ec516b7a5ce555d979fb7946202 Mon Sep 17 00:00:00 2001 From: Grant Grundler Date: Mon, 9 May 2022 19:28:24 -0700 Subject: net: atlantic: reduce scope of is_rsc_complete Don't defer handling the err case outside the loop. That's pointless. And since is_rsc_complete is only used inside this loop, declare it inside the loop to reduce it's scope. Signed-off-by: Grant Grundler Signed-off-by: David S. Miller --- drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c index 440423b0e8ea..bc1952131799 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c @@ -346,7 +346,6 @@ int aq_ring_rx_clean(struct aq_ring_s *self, int budget) { struct net_device *ndev = aq_nic_get_ndev(self->aq_nic); - bool is_rsc_completed = true; int err = 0; for (; (self->sw_head != self->hw_head) && budget; @@ -366,6 +365,8 @@ int aq_ring_rx_clean(struct aq_ring_s *self, if (!buff->is_eop) { buff_ = buff; do { + bool is_rsc_completed = true; + if (buff_->next >= self->size) { err = -EIO; goto err_exit; @@ -377,18 +378,16 @@ int aq_ring_rx_clean(struct aq_ring_s *self, next_, self->hw_head); - if (unlikely(!is_rsc_completed)) - break; + if (unlikely(!is_rsc_completed)) { + err = 0; + goto err_exit; + } buff->is_error |= buff_->is_error; buff->is_cso_err |= buff_->is_cso_err; } while (!buff_->is_eop); - if (!is_rsc_completed) { - err = 0; - goto err_exit; - } if (buff->is_error || (buff->is_lro && buff->is_cso_err)) { buff_ = buff; -- cgit v1.2.3 From 6aecbba12b5c90b26dc062af3b9de8c4b3a2f19f Mon Sep 17 00:00:00 2001 From: Grant Grundler Date: Mon, 9 May 2022 19:28:25 -0700 Subject: net: atlantic: add check for MAX_SKB_FRAGS Enforce that the CPU can not get stuck in an infinite loop. Reported-by: Aashay Shringarpure Reported-by: Yi Chou Reported-by: Shervin Oloumi Signed-off-by: Grant Grundler Signed-off-by: David S. Miller --- drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c index bc1952131799..8201ce7adb77 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c @@ -363,6 +363,7 @@ int aq_ring_rx_clean(struct aq_ring_s *self, continue; if (!buff->is_eop) { + unsigned int frag_cnt = 0U; buff_ = buff; do { bool is_rsc_completed = true; @@ -371,6 +372,8 @@ int aq_ring_rx_clean(struct aq_ring_s *self, err = -EIO; goto err_exit; } + + frag_cnt++; next_ = buff_->next, buff_ = &self->buff_ring[next_]; is_rsc_completed = @@ -378,7 +381,8 @@ int aq_ring_rx_clean(struct aq_ring_s *self, next_, self->hw_head); - if (unlikely(!is_rsc_completed)) { + if (unlikely(!is_rsc_completed) || + frag_cnt > MAX_SKB_FRAGS) { err = 0; goto err_exit; } -- cgit v1.2.3 From 2120b7f4d128433ad8c5f503a9584deba0684901 Mon Sep 17 00:00:00 2001 From: Grant Grundler Date: Mon, 9 May 2022 19:28:26 -0700 Subject: net: atlantic: verify hw_head_ lies within TX buffer ring Bounds check hw_head index provided by NIC to verify it lies within the TX buffer ring. Reported-by: Aashay Shringarpure Reported-by: Yi Chou Reported-by: Shervin Oloumi Signed-off-by: Grant Grundler Signed-off-by: David S. Miller --- drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c index d875ce3ec759..15ede7285fb5 100644 --- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c +++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c @@ -889,6 +889,13 @@ int hw_atl_b0_hw_ring_tx_head_update(struct aq_hw_s *self, err = -ENXIO; goto err_exit; } + + /* Validate that the new hw_head_ is reasonable. */ + if (hw_head_ >= ring->size) { + err = -ENXIO; + goto err_exit; + } + ring->hw_head = hw_head_; err = aq_hw_err_from_flags(self); -- cgit v1.2.3 From 2c50c6867c85afee6f2b3bcbc50fc9d0083d1343 Mon Sep 17 00:00:00 2001 From: Alexandra Winter Date: Tue, 10 May 2022 09:05:06 +0200 Subject: s390/ctcm: fix variable dereferenced before check Found by cppcheck and smatch. smatch complains about drivers/s390/net/ctcm_sysfs.c:43 ctcm_buffer_write() warn: variable dereferenced before check 'priv' (see line 42) Fixes: 3c09e2647b5e ("ctcm: rename READ/WRITE defines to avoid redefinitions") Reported-by: Colin Ian King Signed-off-by: Alexandra Winter Signed-off-by: David S. Miller --- drivers/s390/net/ctcm_sysfs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/s390/net/ctcm_sysfs.c b/drivers/s390/net/ctcm_sysfs.c index ded1930a00b2..e3813a7aa5e6 100644 --- a/drivers/s390/net/ctcm_sysfs.c +++ b/drivers/s390/net/ctcm_sysfs.c @@ -39,11 +39,12 @@ static ssize_t ctcm_buffer_write(struct device *dev, struct ctcm_priv *priv = dev_get_drvdata(dev); int rc; - ndev = priv->channel[CTCM_READ]->netdev; - if (!(priv && priv->channel[CTCM_READ] && ndev)) { + if (!(priv && priv->channel[CTCM_READ] && + priv->channel[CTCM_READ]->netdev)) { CTCM_DBF_TEXT(SETUP, CTC_DBF_ERROR, "bfnondev"); return -ENODEV; } + ndev = priv->channel[CTCM_READ]->netdev; rc = kstrtouint(buf, 0, &bs1); if (rc) -- cgit v1.2.3 From 0c0b20587b9f25a2ad14db7f80ebe49bdf29920a Mon Sep 17 00:00:00 2001 From: Alexandra Winter Date: Tue, 10 May 2022 09:05:07 +0200 Subject: s390/ctcm: fix potential memory leak smatch complains about drivers/s390/net/ctcm_mpc.c:1210 ctcmpc_unpack_skb() warn: possible memory leak of 'mpcginfo' mpc_action_discontact() did not free mpcginfo. Consolidate the freeing in ctcmpc_unpack_skb(). Fixes: 293d984f0e36 ("ctcm: infrastructure for replaced ctc driver") Signed-off-by: Alexandra Winter Signed-off-by: David S. Miller --- drivers/s390/net/ctcm_mpc.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/s390/net/ctcm_mpc.c b/drivers/s390/net/ctcm_mpc.c index 88abfb5e8045..8ac213a55141 100644 --- a/drivers/s390/net/ctcm_mpc.c +++ b/drivers/s390/net/ctcm_mpc.c @@ -626,8 +626,6 @@ static void mpc_rcvd_sweep_resp(struct mpcg_info *mpcginfo) ctcm_clear_busy_do(dev); } - kfree(mpcginfo); - return; } @@ -1192,10 +1190,10 @@ static void ctcmpc_unpack_skb(struct channel *ch, struct sk_buff *pskb) CTCM_FUNTAIL, dev->name); priv->stats.rx_dropped++; /* mpcginfo only used for non-data transfers */ - kfree(mpcginfo); if (do_debug_data) ctcmpc_dump_skb(pskb, -8); } + kfree(mpcginfo); } done: @@ -1977,7 +1975,6 @@ static void mpc_action_rcvd_xid0(fsm_instance *fsm, int event, void *arg) } break; } - kfree(mpcginfo); CTCM_PR_DEBUG("ctcmpc:%s() %s xid2:%i xid7:%i xidt_p2:%i \n", __func__, ch->id, grp->outstanding_xid2, @@ -2038,7 +2035,6 @@ static void mpc_action_rcvd_xid7(fsm_instance *fsm, int event, void *arg) mpc_validate_xid(mpcginfo); break; } - kfree(mpcginfo); return; } -- cgit v1.2.3 From 671bb35c8e746439f0ed70815968f9a4f20a8deb Mon Sep 17 00:00:00 2001 From: Alexandra Winter Date: Tue, 10 May 2022 09:05:08 +0200 Subject: s390/lcs: fix variable dereferenced before check smatch complains about drivers/s390/net/lcs.c:1741 lcs_get_control() warn: variable dereferenced before check 'card->dev' (see line 1739) Fixes: 27eb5ac8f015 ("[PATCH] s390: lcs driver bug fixes and improvements [1/2]") Signed-off-by: Alexandra Winter Signed-off-by: David S. Miller --- drivers/s390/net/lcs.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/s390/net/lcs.c b/drivers/s390/net/lcs.c index bab9b34926c6..84c8981317b4 100644 --- a/drivers/s390/net/lcs.c +++ b/drivers/s390/net/lcs.c @@ -1736,10 +1736,11 @@ lcs_get_control(struct lcs_card *card, struct lcs_cmd *cmd) lcs_schedule_recovery(card); break; case LCS_CMD_STOPLAN: - pr_warn("Stoplan for %s initiated by LGW\n", - card->dev->name); - if (card->dev) + if (card->dev) { + pr_warn("Stoplan for %s initiated by LGW\n", + card->dev->name); netif_carrier_off(card->dev); + } break; default: LCS_DBF_TEXT(5, trace, "noLGWcmd"); -- cgit v1.2.3 From 8b796475fd7882663a870456466a4fb315cc1bd6 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Tue, 10 May 2022 16:57:34 +0200 Subject: net/sched: act_pedit: really ensure the skb is writable Currently pedit tries to ensure that the accessed skb offset is writable via skb_unclone(). The action potentially allows touching any skb bytes, so it may end-up modifying shared data. The above causes some sporadic MPTCP self-test failures, due to this code: tc -n $ns2 filter add dev ns2eth$i egress \ protocol ip prio 1000 \ handle 42 fw \ action pedit munge offset 148 u8 invert \ pipe csum tcp \ index 100 The above modifies a data byte outside the skb head and the skb is a cloned one, carrying a TCP output packet. This change addresses the issue by keeping track of a rough over-estimate highest skb offset accessed by the action and ensuring such offset is really writable. Note that this may cause performance regressions in some scenarios, but hopefully pedit is not in the critical path. Fixes: db2c24175d14 ("act_pedit: access skb->data safely") Acked-by: Mat Martineau Tested-by: Geliang Tang Signed-off-by: Paolo Abeni Acked-by: Jamal Hadi Salim Link: https://lore.kernel.org/r/1fcf78e6679d0a287dd61bb0f04730ce33b3255d.1652194627.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski --- include/net/tc_act/tc_pedit.h | 1 + net/sched/act_pedit.c | 26 ++++++++++++++++++++++---- 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/include/net/tc_act/tc_pedit.h b/include/net/tc_act/tc_pedit.h index 748cf87a4d7e..3e02709a1df6 100644 --- a/include/net/tc_act/tc_pedit.h +++ b/include/net/tc_act/tc_pedit.h @@ -14,6 +14,7 @@ struct tcf_pedit { struct tc_action common; unsigned char tcfp_nkeys; unsigned char tcfp_flags; + u32 tcfp_off_max_hint; struct tc_pedit_key *tcfp_keys; struct tcf_pedit_key_ex *tcfp_keys_ex; }; diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c index 31fcd279c177..0eaaf1f45de1 100644 --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -149,7 +149,7 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, struct nlattr *pattr; struct tcf_pedit *p; int ret = 0, err; - int ksize; + int i, ksize; u32 index; if (!nla) { @@ -228,6 +228,18 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, p->tcfp_nkeys = parm->nkeys; } memcpy(p->tcfp_keys, parm->keys, ksize); + p->tcfp_off_max_hint = 0; + for (i = 0; i < p->tcfp_nkeys; ++i) { + u32 cur = p->tcfp_keys[i].off; + + /* The AT option can read a single byte, we can bound the actual + * value with uchar max. + */ + cur += (0xff & p->tcfp_keys[i].offmask) >> p->tcfp_keys[i].shift; + + /* Each key touches 4 bytes starting from the computed offset */ + p->tcfp_off_max_hint = max(p->tcfp_off_max_hint, cur + 4); + } p->tcfp_flags = parm->flags; goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); @@ -308,13 +320,18 @@ static int tcf_pedit_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { struct tcf_pedit *p = to_pedit(a); + u32 max_offset; int i; - if (skb_unclone(skb, GFP_ATOMIC)) - return p->tcf_action; - spin_lock(&p->tcf_lock); + max_offset = (skb_transport_header_was_set(skb) ? + skb_transport_offset(skb) : + skb_network_offset(skb)) + + p->tcfp_off_max_hint; + if (skb_ensure_writable(skb, min(skb->len, max_offset))) + goto unlock; + tcf_lastuse_update(&p->tcf_tm); if (p->tcfp_nkeys > 0) { @@ -403,6 +420,7 @@ bad: p->tcf_qstats.overlimits++; done: bstats_update(&p->tcf_bstats, skb); +unlock: spin_unlock(&p->tcf_lock); return p->tcf_action; } -- cgit v1.2.3 From 3f95a7472d14abef284d8968734fe2ae7ff4845f Mon Sep 17 00:00:00 2001 From: Xiaomeng Tong Date: Tue, 10 May 2022 13:48:46 -0700 Subject: i40e: i40e_main: fix a missing check on list iterator The bug is here: ret = i40e_add_macvlan_filter(hw, ch->seid, vdev->dev_addr, &aq_err); The list iterator 'ch' will point to a bogus position containing HEAD if the list is empty or no element is found. This case must be checked before any use of the iterator, otherwise it will lead to a invalid memory access. To fix this bug, use a new variable 'iter' as the list iterator, while use the origin variable 'ch' as a dedicated pointer to point to the found element. Cc: stable@vger.kernel.org Fixes: 1d8d80b4e4ff6 ("i40e: Add macvlan support on i40e") Signed-off-by: Xiaomeng Tong Tested-by: Gurucharan (A Contingent worker at Intel) Signed-off-by: Tony Nguyen Link: https://lore.kernel.org/r/20220510204846.2166999-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/intel/i40e/i40e_main.c | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 6778df2177a1..98871f014994 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -7549,42 +7549,43 @@ static void i40e_free_macvlan_channels(struct i40e_vsi *vsi) static int i40e_fwd_ring_up(struct i40e_vsi *vsi, struct net_device *vdev, struct i40e_fwd_adapter *fwd) { + struct i40e_channel *ch = NULL, *ch_tmp, *iter; int ret = 0, num_tc = 1, i, aq_err; - struct i40e_channel *ch, *ch_tmp; struct i40e_pf *pf = vsi->back; struct i40e_hw *hw = &pf->hw; - if (list_empty(&vsi->macvlan_list)) - return -EINVAL; - /* Go through the list and find an available channel */ - list_for_each_entry_safe(ch, ch_tmp, &vsi->macvlan_list, list) { - if (!i40e_is_channel_macvlan(ch)) { - ch->fwd = fwd; + list_for_each_entry_safe(iter, ch_tmp, &vsi->macvlan_list, list) { + if (!i40e_is_channel_macvlan(iter)) { + iter->fwd = fwd; /* record configuration for macvlan interface in vdev */ for (i = 0; i < num_tc; i++) netdev_bind_sb_channel_queue(vsi->netdev, vdev, i, - ch->num_queue_pairs, - ch->base_queue); - for (i = 0; i < ch->num_queue_pairs; i++) { + iter->num_queue_pairs, + iter->base_queue); + for (i = 0; i < iter->num_queue_pairs; i++) { struct i40e_ring *tx_ring, *rx_ring; u16 pf_q; - pf_q = ch->base_queue + i; + pf_q = iter->base_queue + i; /* Get to TX ring ptr */ tx_ring = vsi->tx_rings[pf_q]; - tx_ring->ch = ch; + tx_ring->ch = iter; /* Get the RX ring ptr */ rx_ring = vsi->rx_rings[pf_q]; - rx_ring->ch = ch; + rx_ring->ch = iter; } + ch = iter; break; } } + if (!ch) + return -EINVAL; + /* Guarantee all rings are updated before we update the * MAC address filter. */ -- cgit v1.2.3 From 103a2f3255a95991252f8f13375c3a96a75011cd Mon Sep 17 00:00:00 2001 From: Itay Iellin Date: Sat, 7 May 2022 08:32:48 -0400 Subject: Bluetooth: Fix the creation of hdev->name Set a size limit of 8 bytes of the written buffer to "hdev->name" including the terminating null byte, as the size of "hdev->name" is 8 bytes. If an id value which is greater than 9999 is allocated, then the "snprintf(hdev->name, sizeof(hdev->name), "hci%d", id)" function call would lead to a truncation of the id value in decimal notation. Set an explicit maximum id parameter in the id allocation function call. The id allocation function defines the maximum allocated id value as the maximum id parameter value minus one. Therefore, HCI_MAX_ID is defined as 10000. Signed-off-by: Itay Iellin Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 3 +++ net/bluetooth/hci_core.c | 6 +++--- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 8abd08245326..62d7b81b1cb7 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -36,6 +36,9 @@ /* HCI priority */ #define HCI_PRIO_MAX 7 +/* HCI maximum id value */ +#define HCI_MAX_ID 10000 + /* HCI Core structures */ struct inquiry_data { bdaddr_t bdaddr; diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index b4782a6c1025..45c2dd2e1590 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -2555,10 +2555,10 @@ int hci_register_dev(struct hci_dev *hdev) */ switch (hdev->dev_type) { case HCI_PRIMARY: - id = ida_simple_get(&hci_index_ida, 0, 0, GFP_KERNEL); + id = ida_simple_get(&hci_index_ida, 0, HCI_MAX_ID, GFP_KERNEL); break; case HCI_AMP: - id = ida_simple_get(&hci_index_ida, 1, 0, GFP_KERNEL); + id = ida_simple_get(&hci_index_ida, 1, HCI_MAX_ID, GFP_KERNEL); break; default: return -EINVAL; @@ -2567,7 +2567,7 @@ int hci_register_dev(struct hci_dev *hdev) if (id < 0) return id; - sprintf(hdev->name, "hci%d", id); + snprintf(hdev->name, sizeof(hdev->name), "hci%d", id); hdev->id = id; BT_DBG("%p name %s bus %d", hdev, hdev->name, hdev->bus); -- cgit v1.2.3 From 00832b1d1a393dfb1b9491d085e5b27e8c25d103 Mon Sep 17 00:00:00 2001 From: Yang Yingliang Date: Wed, 11 May 2022 11:08:29 +0800 Subject: net: ethernet: mediatek: ppe: fix wrong size passed to memset() 'foe_table' is a pointer, the real size of struct mtk_foe_entry should be pass to memset(). Fixes: ba37b7caf1ed ("net: ethernet: mtk_eth_soc: add support for initializing the PPE") Signed-off-by: Yang Yingliang Acked-by: Felix Fietkau Link: https://lore.kernel.org/r/20220511030829.3308094-1-yangyingliang@huawei.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/mediatek/mtk_ppe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mediatek/mtk_ppe.c b/drivers/net/ethernet/mediatek/mtk_ppe.c index 3ad10c793308..66298e2235c9 100644 --- a/drivers/net/ethernet/mediatek/mtk_ppe.c +++ b/drivers/net/ethernet/mediatek/mtk_ppe.c @@ -395,7 +395,7 @@ static void mtk_ppe_init_foe_table(struct mtk_ppe *ppe) static const u8 skip[] = { 12, 25, 38, 51, 76, 89, 102 }; int i, k; - memset(ppe->foe_table, 0, MTK_PPE_ENTRIES * sizeof(ppe->foe_table)); + memset(ppe->foe_table, 0, MTK_PPE_ENTRIES * sizeof(*ppe->foe_table)); if (!IS_ENABLED(CONFIG_SOC_MT7621)) return; -- cgit v1.2.3 From 6b77c06655b8a749c1a3d9ebc51e9717003f7e5a Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Tue, 10 May 2022 20:17:51 -0700 Subject: net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral The interrupt controller supplying the Wake-on-LAN interrupt line maybe modular on some platforms (irq-bcm7038-l1.c) and might be probed at a later time than the GENET driver. We need to specifically check for -EPROBE_DEFER and propagate that error to ensure that we eventually fetch the interrupt descriptor. Fixes: 9deb48b53e7f ("bcmgenet: add WOL IRQ check") Fixes: 5b1f0e62941b ("net: bcmgenet: Avoid touching non-existent interrupt") Signed-off-by: Florian Fainelli Reviewed-by: Stefan Wahren Link: https://lore.kernel.org/r/20220511031752.2245566-1-f.fainelli@gmail.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c index bf1ec8fdc2ad..e87e46c47387 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c @@ -3999,6 +3999,10 @@ static int bcmgenet_probe(struct platform_device *pdev) goto err; } priv->wol_irq = platform_get_irq_optional(pdev, 2); + if (priv->wol_irq == -EPROBE_DEFER) { + err = priv->wol_irq; + goto err; + } priv->base = devm_platform_ioremap_resource(pdev, 0); if (IS_ERR(priv->base)) { -- cgit v1.2.3 From 810c2f0a3f86158c1e02e74947b66d811473434a Mon Sep 17 00:00:00 2001 From: Amit Cohen Date: Wed, 11 May 2022 14:57:47 +0300 Subject: mlxsw: Avoid warning during ip6gre device removal IPv6 addresses which are used for tunnels are stored in a hash table with reference counting. When a new GRE tunnel is configured, the driver is notified and configures it in hardware. Currently, any change in the tunnel is not applied in the driver. It means that if the remote address is changed, the driver is not aware of this change and the first address will be used. This behavior results in a warning [1] in scenarios such as the following: # ip link add name gre1 type ip6gre local 2000::3 remote 2000::fffe tos inherit ttl inherit # ip link set name gre1 type ip6gre local 2000::3 remote 2000::ffff ttl inherit # ip link delete gre1 The change of the address is not applied in the driver. Currently, the driver uses the remote address which is stored in the 'parms' of the overlay device. When the tunnel is removed, the new IPv6 address is used, the driver tries to release it, but as it is not aware of the change, this address is not configured and it warns about releasing non existing IPv6 address. Fix it by using the IPv6 address which is cached in the IPIP entry, this address is the last one that the driver used, so even in cases such the above, the first address will be released, without any warning. [1]: WARNING: CPU: 1 PID: 2197 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2920 mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum] ... CPU: 1 PID: 2197 Comm: ip Not tainted 5.17.0-rc8-custom-95062-gc1e5ded51a9a #84 Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021 RIP: 0010:mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum] ... Call Trace: mlxsw_sp2_ipip_rem_addr_unset_gre6+0xf1/0x120 [mlxsw_spectrum] mlxsw_sp_netdevice_ipip_ol_event+0xdb/0x640 [mlxsw_spectrum] mlxsw_sp_netdevice_event+0xc4/0x850 [mlxsw_spectrum] raw_notifier_call_chain+0x3c/0x50 call_netdevice_notifiers_info+0x2f/0x80 unregister_netdevice_many+0x311/0x6d0 rtnl_dellink+0x136/0x360 rtnetlink_rcv_msg+0x12f/0x380 netlink_rcv_skb+0x49/0xf0 netlink_unicast+0x233/0x340 netlink_sendmsg+0x202/0x440 ____sys_sendmsg+0x1f3/0x220 ___sys_sendmsg+0x70/0xb0 __sys_sendmsg+0x54/0xa0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: e846efe2737b ("mlxsw: spectrum: Add hash table for IPv6 address mapping") Reported-by: Maksym Yaremchuk Signed-off-by: Amit Cohen Signed-off-by: Ido Schimmel Link: https://lore.kernel.org/r/20220511115747.238602-1-idosch@nvidia.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c index 01cf5a6a26bd..a2ee695a3f17 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c @@ -568,10 +568,8 @@ static int mlxsw_sp2_ipip_rem_addr_set_gre6(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_ipip_entry *ipip_entry) { - struct __ip6_tnl_parm parms6; - - parms6 = mlxsw_sp_ipip_netdev_parms6(ipip_entry->ol_dev); - return mlxsw_sp_ipv6_addr_kvdl_index_get(mlxsw_sp, &parms6.raddr, + return mlxsw_sp_ipv6_addr_kvdl_index_get(mlxsw_sp, + &ipip_entry->parms.daddr.addr6, &ipip_entry->dip_kvdl_index); } @@ -579,10 +577,7 @@ static void mlxsw_sp2_ipip_rem_addr_unset_gre6(struct mlxsw_sp *mlxsw_sp, const struct mlxsw_sp_ipip_entry *ipip_entry) { - struct __ip6_tnl_parm parms6; - - parms6 = mlxsw_sp_ipip_netdev_parms6(ipip_entry->ol_dev); - mlxsw_sp_ipv6_addr_put(mlxsw_sp, &parms6.raddr); + mlxsw_sp_ipv6_addr_put(mlxsw_sp, &ipip_entry->parms.daddr.addr6); } static const struct mlxsw_sp_ipip_ops mlxsw_sp2_ipip_gre6_ops = { -- cgit v1.2.3 From b7be130c5d52e5224ac7d89568737b37b4c4b785 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Wed, 11 May 2022 19:17:31 -0700 Subject: net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down() After commit 2d1f90f9ba83 ("net: dsa/bcm_sf2: fix incorrect usage of state->link") the interface suspend path would call our mac_link_down() call back which would forcibly set the link down, thus preventing Wake-on-LAN packets from reaching our management port. Fix this by looking at whether the port is enabled for Wake-on-LAN and not clearing the link status in that case to let packets go through. Fixes: 2d1f90f9ba83 ("net: dsa/bcm_sf2: fix incorrect usage of state->link") Signed-off-by: Florian Fainelli Link: https://lore.kernel.org/r/20220512021731.2494261-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/dsa/bcm_sf2.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c index cf82b1fa9725..87e81c636339 100644 --- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -809,6 +809,9 @@ static void bcm_sf2_sw_mac_link_down(struct dsa_switch *ds, int port, struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds); u32 reg, offset; + if (priv->wol_ports_mask & BIT(port)) + return; + if (port != core_readl(priv, CORE_IMP0_PRT_ID)) { if (priv->type == BCM4908_DEVICE_ID || priv->type == BCM7445_DEVICE_ID) -- cgit v1.2.3 From f3c46e41b32b6266cf60b0985c61748f53bf1c61 Mon Sep 17 00:00:00 2001 From: Guangguan Wang Date: Thu, 12 May 2022 11:08:20 +0800 Subject: net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending Non blocking sendmsg will return -EAGAIN when any signal pending and no send space left, while non blocking recvmsg return -EINTR when signal pending and no data received. This may makes confused. As TCP returns -EAGAIN in the conditions described above. Align the behavior of smc with TCP. Fixes: 846e344eb722 ("net/smc: add receive timeout check") Signed-off-by: Guangguan Wang Reviewed-by: Tony Lu Acked-by: Karsten Graul Link: https://lore.kernel.org/r/20220512030820.73848-1-guangguan.wang@linux.alibaba.com Signed-off-by: Jakub Kicinski --- net/smc/smc_rx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c index 51e8eb2933ff..338b9ef806e8 100644 --- a/net/smc/smc_rx.c +++ b/net/smc/smc_rx.c @@ -355,12 +355,12 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, } break; } + if (!timeo) + return -EAGAIN; if (signal_pending(current)) { read_done = sock_intr_errno(timeo); break; } - if (!timeo) - return -EAGAIN; } if (!smc_rx_data_available(conn)) { -- cgit v1.2.3 From 1fa89ffbc04545b7582518e57f4b63e2a062870f Mon Sep 17 00:00:00 2001 From: Taehee Yoo Date: Thu, 12 May 2022 05:47:09 +0000 Subject: net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe() In the NIC ->probe() callback, ->mtd_probe() callback is called. If NIC has 2 ports, ->probe() is called twice and ->mtd_probe() too. In the ->mtd_probe(), which is efx_ef10_mtd_probe() it allocates and initializes mtd partiion. But mtd partition for sfc is shared data. So that allocated mtd partition data from last called efx_ef10_mtd_probe() will not be used. Therefore it must be freed. But it doesn't free a not used mtd partition data in efx_ef10_mtd_probe(). kmemleak reports: unreferenced object 0xffff88811ddb0000 (size 63168): comm "systemd-udevd", pid 265, jiffies 4294681048 (age 348.586s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [] kmalloc_order_trace+0x19/0x120 [] __kmalloc+0x20e/0x250 [] efx_ef10_mtd_probe+0x11f/0x270 [sfc] [] efx_pci_probe.cold.17+0x3df/0x53d [sfc] [] local_pci_probe+0xdc/0x170 [] pci_device_probe+0x235/0x680 [] really_probe+0x1c2/0x8f0 [] __driver_probe_device+0x2ab/0x460 [] driver_probe_device+0x4a/0x120 [] __driver_attach+0x16e/0x320 [] bus_for_each_dev+0x110/0x190 [] bus_add_driver+0x39e/0x560 [] driver_register+0x18e/0x310 [] 0xffffffffc02e2055 [] do_one_initcall+0xc3/0x450 [] do_init_module+0x1b4/0x700 Acked-by: Martin Habets Fixes: 8127d661e77f ("sfc: Add support for Solarflare SFC9100 family") Signed-off-by: Taehee Yoo Link: https://lore.kernel.org/r/20220512054709.12513-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/sfc/ef10.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index 50d535981a35..f8edb3f1b73a 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -3579,6 +3579,11 @@ static int efx_ef10_mtd_probe(struct efx_nic *efx) n_parts++; } + if (!n_parts) { + kfree(parts); + return 0; + } + rc = efx_mtd_add(efx, &parts[0].common, n_parts, sizeof(*parts)); fail: if (rc) -- cgit v1.2.3 From 3740651bf7e200109dd42d5b2fb22226b26f960a Mon Sep 17 00:00:00 2001 From: Maxim Mikityanskiy Date: Thu, 12 May 2022 12:18:30 +0300 Subject: tls: Fix context leak on tls_device_down The commit cited below claims to fix a use-after-free condition after tls_device_down. Apparently, the description wasn't fully accurate. The context stayed alive, but ctx->netdev became NULL, and the offload was torn down without a proper fallback, so a bug was present, but a different kind of bug. Due to misunderstanding of the issue, the original patch dropped the refcount_dec_and_test line for the context to avoid the alleged premature deallocation. That line has to be restored, because it matches the refcount_inc_not_zero from the same function, otherwise the contexts that survived tls_device_down are leaked. This patch fixes the described issue by restoring refcount_dec_and_test. After this change, there is no leak anymore, and the fallback to software kTLS still works. Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up") Signed-off-by: Maxim Mikityanskiy Reviewed-by: Tariq Toukan Link: https://lore.kernel.org/r/20220512091830.678684-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski --- net/tls/tls_device.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index af875ad4a822..3919fe2c58c5 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -1347,7 +1347,10 @@ static int tls_device_down(struct net_device *netdev) /* Device contexts for RX and TX will be freed in on sk_destruct * by tls_device_free_ctx. rx_conf and tx_conf stay in TLS_HW. + * Now release the ref taken above. */ + if (refcount_dec_and_test(&ctx->refcount)) + tls_device_free_ctx(ctx); } up_write(&device_offload_lock); -- cgit v1.2.3