summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2019-08-22net/ncsi: Fix the payload copying for the request coming from NetlinkJustin.Lee1@Dell.com1-2/+9
The request coming from Netlink should use the OEM generic handler. The standard command handler expects payload in bytes/words/dwords but the actual payload is stored in data if the request is coming from Netlink. Signed-off-by: Justin Lee <justin.lee1@dell.com> Reviewed-by: Vijay Khemka <vijaykhemka@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-22nexthops: remove redundant assignment to variable errColin Ian King1-1/+1
Variable err is initialized to a value that is never read and it is re-assigned later. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused Value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-22libceph: fix PG split vs OSD (re)connect raceIlya Dryomov1-5/+4
We can't rely on ->peer_features in calc_target() because it may be called both when the OSD session is established and open and when it's not. ->peer_features is not valid unless the OSD session is open. If this happens on a PG split (pg_num increase), that could mean we don't resend a request that should have been resent, hanging the client indefinitely. In userspace this was fixed by looking at require_osd_release and get_xinfo[osd].features fields of the osdmap. However these fields belong to the OSD section of the osdmap, which the kernel doesn't decode (only the client section is decoded). Instead, let's drop this feature check. It effectively checks for luminous, so only pre-luminous OSDs would be affected in that on a PG split the kernel might resend a request that should not have been resent. Duplicates can occur in other scenarios, so both sides should already be prepared for them: see dup/replay logic on the OSD side and retry_attempt check on the client side. Cc: stable@vger.kernel.org Fixes: 7de030d6b10a ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT") Link: https://tracker.ceph.com/issues/41162 Reported-by: Jerry Lee <leisurelysw24@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Jerry Lee <leisurelysw24@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2019-08-21net: fix icmp_socket_deliver argument 2 inputLi RongQing1-1/+1
it expects a unsigned int, but got a __be32 Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-21ipv6/addrconf: allow adding multicast addr if IFA_F_MCAUTOJOIN is setHangbin Liu1-1/+2
In commit 93a714d6b53d ("multicast: Extend ip address command to enable multicast group join/leave on") we added a new flag IFA_F_MCAUTOJOIN to make user able to add multicast address on ethernet interface. This works for IPv4, but not for IPv6. See the inet6_addr_add code. static int inet6_addr_add() { ... if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, true...) } ifp = ipv6_add_addr(idev, cfg, true, extack); <- always fail with maddr if (!IS_ERR(ifp)) { ... } else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, false...) } } But in ipv6_add_addr() it will check the address type and reject multicast address directly. So this feature is never worked for IPv6. We should not remove the multicast address check totally in ipv6_add_addr(), but could accept multicast address only when IFA_F_MCAUTOJOIN flag supplied. v2: update commit description Fixes: 93a714d6b53d ("multicast: Extend ip address command to enable multicast group join/leave on") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-21Merge tag 'batadv-net-for-davem-20190821' of git://git.open-mesh.org/linux-mergeDavid S. Miller1-1/+1
Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - fix uninit-value in batadv_netlink_get_ifindex(), by Eric Dumazet ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-21mac80211: minstrel_ht: improve rate probing for devices with static fallbackFelix Fietkau3-28/+225
On some devices that only support static rate fallback tables sending rate control probing packets can be really expensive. Probing lower rates can already hurt throughput quite a bit. What hurts even more is the fact that on mt76x0/mt76x2, single probing packets can only be forced by directing packets at a different internal hardware queue, which causes some heavy reordering and extra latency. The reordering issue is mainly problematic while pushing lots of packets to a particular station. If there is little activity, the overhead of probing is neglegible. The static fallback behavior is designed to pretty much only handle rate control algorithms that use only a very limited set of rates on which the algorithm switches up/down based on packet error rate. In order to better support that kind of hardware, this patch implements a different approach to rate probing where it switches to a slightly higher rate, waits for tx status feedback, then updates the stats and switches back to the new max throughput rate. This only triggers above a packet rate of 100 per stats interval (~50ms). For that kind of probing, the code has to reduce the set of probing rates a lot more compared to single packet probing, so it uses only one packet per MCS group which is either slightly faster, or as close as possible to the max throughput rate. This allows switching between similar rates with different numbers of streams. The algorithm assumes that the hardware will work its way lower within an MCS group in case of retransmissions, so that lower rates don't have to be probed by the high packets per second rate probing code. To further reduce the search space, it also does not probe rates with lower channel bandwidth than the max throughput rate. At the moment, these changes will only affect mt76x0/mt76x2. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20190820095449.45255-4-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21mac80211: minstrel_ht: fix default max throughput rate indexesFelix Fietkau1-6/+14
Use the first supported rate instead of 0 (which can be invalid) Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20190820095449.45255-3-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21mac80211: minstrel_ht: reduce unnecessary rate probing attemptsFelix Fietkau1-0/+15
On hardware with static fallback tables (e.g. mt76x2), rate probing attempts can be very expensive. On such devices, avoid sampling rates slower than the per-group max throughput rate, based on the assumption that the fallback table will take care of probing lower rates within that group if the higher rates fail. To further reduce unnecessary probing attempts, skip duplicate attempts on rates slower than the max throughput rate. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20190820095449.45255-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21mac80211: minstrel_ht: fix per-group max throughput rate initializationFelix Fietkau1-1/+1
The group number needs to be multiplied by the number of rates per group to get the full rate index Fixes: 5935839ad735 ("mac80211: improve minstrel_ht rate sorting by throughput & probability") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20190820095449.45255-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21nl80211: Add support for EDMG channelsAlexei Avshalom Lazar5-6/+240
802.11ay specification defines Enhanced Directional Multi-Gigabit (EDMG) STA and AP which allow channel bonding of 2 channels and more. Introduce new NL attributes that are needed for enabling and configuring EDMG support. Two new attributes are used by kernel to publish driver's EDMG capabilities to the userspace: NL80211_BAND_ATTR_EDMG_CHANNELS - bitmap field that indicates the 2.16 GHz channel(s) that are supported by the driver. When this attribute is not set it means driver does not support EDMG. NL80211_BAND_ATTR_EDMG_BW_CONFIG - represent the channel bandwidth configurations supported by the driver. Additional two new attributes are used by the userspace for connect command and for AP configuration: NL80211_ATTR_WIPHY_EDMG_CHANNELS NL80211_ATTR_WIPHY_EDMG_BW_CONFIG New rate info flag - RATE_INFO_FLAGS_EDMG, can be reported from driver and used for bitrate calculation that will take into account EDMG according to the 802.11ay specification. Signed-off-by: Alexei Avshalom Lazar <ailizaro@codeaurora.org> Link: https://lore.kernel.org/r/1566138918-3823-2-git-send-email-ailizaro@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21mac80211: fix possible NULL pointerderef in obss pd codeJohn Crispin1-1/+2
he_spr_ie_elem is dereferenced before the NULL check. fix this by moving the assignment after the check. fixes commit 697f6c507c74 ("mac80211: propagate HE operation info into bss_conf") This was reported by the static code checker. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190813070712.25509-1-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21mac80211: add assoc-at supportBen Greear2-0/+5
Report timestamp for when sta becomes associated. Signed-off-by: Ben Greear <greearb@candelatech.com> Link: https://lore.kernel.org/r/20190809180001.26393-2-greearb@candelatech.com [fix ktime_get_boot_ns() to ktime_get_boottime_ns(), assoc_at type to u64] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21cfg80211: Support assoc-at timer in sta-infoBen Greear1-0/+1
Report timestamp of when sta became associated. This is the boottime clock, units are nano-seconds. Signed-off-by: Ben Greear <greearb@candelatech.com> Link: https://lore.kernel.org/r/20190809180001.26393-1-greearb@candelatech.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21cfg80211: apply same mandatory rate flags for 5GHz and 6GHzArend van Spriel1-0/+1
For the new 6GHz band the same rules apply for mandatory rates so add it to set_mandatory_flags_band() function. Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Leon Zegers <leon.zegers@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://lore.kernel.org/r/1564745465-21234-9-git-send-email-arend.vanspriel@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21cfg80211: ibss: use 11a mandatory rates for 6GHz band operationArend van Spriel1-5/+11
The default mandatory rates, ie. when not specified by user-space, is determined by the band. Select 11a rateset for 6GHz band. Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Leon Zegers <leon.zegers@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://lore.kernel.org/r/1564745465-21234-8-git-send-email-arend.vanspriel@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21cfg80211: use same IR permissive rules for 6GHz bandArend van Spriel1-1/+2
The function cfg80211_ir_permissive_chan() is applicable for 6GHz band as well so make sure it is handled. Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Leon Zegers <leon.zegers@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://lore.kernel.org/r/1564745465-21234-7-git-send-email-arend.vanspriel@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21cfg80211: add 6GHz in code handling array with NUM_NL80211_BANDS entriesArend van Spriel2-1/+3
In nl80211.c there is a policy for all bands in NUM_NL80211_BANDS and in trace.h there is a callback trace for multicast rates which is per band in NUM_NL80211_BANDS. Both need to be extended for the new NL80211_BAND_6GHZ. Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Leon Zegers <leon.zegers@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://lore.kernel.org/r/1564745465-21234-6-git-send-email-arend.vanspriel@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21cfg80211: extend ieee80211_operating_class_to_band() for 6GHzArend van Spriel1-0/+3
Add 6GHz operating class range as defined in 802.11ax D4.1 Annex E. Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Leon Zegers <leon.zegers@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://lore.kernel.org/r/1564745465-21234-5-git-send-email-arend.vanspriel@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21cfg80211: util: add 6GHz channel to freq conversion and vice versaArend van Spriel1-1/+9
Extend the functions ieee80211_channel_to_frequency() and ieee80211_frequency_to_channel() to support 6GHz band according specification in 802.11ax D4.1 27.3.22.2. Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Leon Zegers <leon.zegers@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://lore.kernel.org/r/1564745465-21234-4-git-send-email-arend.vanspriel@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21cfg80211: add 6GHz UNII band definitionsArend van Spriel1-2/+19
For the new 6GHz there are new UNII band definitions as listed in the FCC notice [1]. [1] https://docs.fcc.gov/public/attachments/FCC-18-147A1_Rcd.pdf Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Leon Zegers <leon.zegers@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://lore.kernel.org/r/1564745465-21234-3-git-send-email-arend.vanspriel@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21nl80211: add 6GHz band definition to enum nl80211_bandArend van Spriel1-0/+1
In the 802.11ax specification a new band is introduced, which is also proposed by FCC for unlicensed use. This band is referred to as 6GHz spanning frequency range from 5925 to 7125 MHz. Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Leon Zegers <leon.zegers@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://lore.kernel.org/r/1564745465-21234-2-git-send-email-arend.vanspriel@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21Revert "cfg80211: fix processing world regdomain when non modular"Hodaszi, Robert1-1/+1
This reverts commit 96cce12ff6e0 ("cfg80211: fix processing world regdomain when non modular"). Re-triggering a reg_process_hint with the last request on all events, can make the regulatory domain fail in case of multiple WiFi modules. On slower boards (espacially with mdev), enumeration of the WiFi modules can end up in an intersected regulatory domain, and user cannot set it with 'iw reg set' anymore. This is happening, because: - 1st module enumerates, queues up a regulatory request - request gets processed by __reg_process_hint_driver(): - checks if previous was set by CORE -> yes - checks if regulator domain changed -> yes, from '00' to e.g. 'US' -> sends request to the 'crda' - 2nd module enumerates, queues up a regulator request (which triggers the reg_todo() work) - reg_todo() -> reg_process_pending_hints() sees, that the last request is not processed yet, so it tries to process it again. __reg_process_hint driver() will run again, and: - checks if the last request's initiator was the core -> no, it was the driver (1st WiFi module) - checks, if the previous initiator was the driver -> yes - checks if the regulator domain changed -> yes, it was '00' (set by core, and crda call did not return yet), and should be changed to 'US' ------> __reg_process_hint_driver calls an intersect Besides, the reg_process_hint call with the last request is meaningless since the crda call has a timeout work. If that timeout expires, the first module's request will lost. Cc: stable@vger.kernel.org Fixes: 96cce12ff6e0 ("cfg80211: fix processing world regdomain when non modular") Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com> Link: https://lore.kernel.org/r/20190614131600.GA13897@a1-hr Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21mac80211: add missing length field increment when generating Radiotap headerJohn Crispin1-0/+1
The code generating the Tx Radiotap header when using tx_status_ext was missing a field increment after setting the VHT bandwidth. Fixes: 3d07ffcaf320 ("mac80211: add struct ieee80211_tx_status support to ieee80211_add_tx_radiotap_header") Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190807075949.32414-4-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21mac80211: 80Mhz was not reported properly when using tx_status_extJohn Crispin1-1/+1
When reporting 80MHz, we need to set 4 and not 2 inside the corresponding field inside the Tx Radiotap header. Fixes: 3d07ffcaf320 ("mac80211: add struct ieee80211_tx_status support to ieee80211_add_tx_radiotap_header") Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190807075949.32414-3-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21mac80211: fix bad guard when reporting legacy ratesJohn Crispin1-7/+7
When reporting legacy rates inside the TX Radiotap header we need to split the check between "uses tx_statua_ext" and "is legacy rate". Not doing so would make the code drop into the !tx_status_ext path. Fixes: 3d07ffcaf320 ("mac80211: add struct ieee80211_tx_status support to ieee80211_add_tx_radiotap_header") Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190807075949.32414-2-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21mac80211: fix TX legacy rate reporting when tx_status_ext is usedJohn Crispin1-3/+9
The RX Radiotap header length was not calculated properly when reporting legacy rates using tx_status_ext. Fixes: 3d07ffcaf320 ("mac80211: add struct ieee80211_tx_status support to ieee80211_add_tx_radiotap_header") Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190807075949.32414-1-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21cfg80211: Fix Extended Key ID key install checksAlexander Wetzel1-9/+14
Fix two shortcomings in the Extended Key ID API: 1) Allow the userspace to install pairwise keys using keyid 1 without NL80211_KEY_NO_TX set. This allows the userspace to install and activate pairwise keys with keyid 1 in the same way as for keyid 0, simplifying the API usage for e.g. FILS and FT key installs. 2) IEEE 802.11 - 2016 restricts Extended Key ID usage to CCMP/GCMP ciphers in IEEE 802.11 - 2016 "9.4.2.25.4 RSN capabilities". Enforce that when installing a key. Cc: stable@vger.kernel.org # 5.2 Fixes: 6cdd3979a2bd ("nl80211/cfg80211: Extended Key ID support") Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20190805123400.51567-1-alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21mac80211: fix possible sta leakJohannes Berg1-4/+5
If TDLS station addition is rejected, the sta memory is leaked. Avoid this by moving the check before the allocation. Cc: stable@vger.kernel.org Fixes: 7ed5285396c2 ("mac80211: don't initiate TDLS connection if station is not associated to AP") Link: https://lore.kernel.org/r/20190801073033.7892-1-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-20net: fix __ip_mc_inc_group usageLi RongQing1-2/+2
in ip_mc_inc_group, memory allocation flag, not mcast mode, is expected by __ip_mc_inc_group similar issue in __ip_mc_join_group, both mcase mode and gfp_t are needed here, so use ____ip_mc_inc_group(...) Fixes: 9fb20801dab4 ("net: Fix ip_mc_{dec,inc}_group allocation context") Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20net/ncsi: Ensure 32-bit boundary for data cksumTerry S. Duncan2-4/+7
The NCSI spec indicates that if the data does not end on a 32 bit boundary, one to three padding bytes equal to 0x00 shall be present to align the checksum field to a 32-bit boundary. Signed-off-by: Terry S. Duncan <terry.s.duncan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20net: dsa: enable and disable all portsVivien Didelot1-0/+11
Call the .port_enable and .port_disable functions for all ports, not only the user ports, so that drivers may optimize the power consumption of all ports after a successful setup. Unused ports are now disabled on setup. CPU and DSA ports are now enabled on setup and disabled on teardown. User ports were already enabled at slave creation and disabled at slave destruction. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20net: dsa: use a single switch statement for port setupVivien Didelot1-48/+39
It is currently difficult to read the different steps involved in the setup and teardown of ports in the DSA code. Keep it simple with a single switch statement for each port type: UNUSED, CPU, DSA, or USER. Also no need to call devlink_port_unregister from within dsa_port_setup as this step is inconditionally handled by dsa_port_teardown on error. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20net/smc: make sure EPOLLOUT is raisedJason Baron1-4/+2
Currently, we are only explicitly setting SOCK_NOSPACE on a write timeout for non-blocking sockets. Epoll() edge-trigger mode relies on SOCK_NOSPACE being set when -EAGAIN is returned to ensure that EPOLLOUT is raised. Expand the setting of SOCK_NOSPACE to non-blocking sockets as well that can use SO_SNDTIMEO to adjust their write timeout. This mirrors the behavior that Eric Dumazet introduced for tcp sockets. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Ursula Braun <ubraun@linux.ibm.com> Cc: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20xdp: unpin xdp umem pages in error pathIvan Khoronzhuk1-1/+3
Fix mem leak caused by missed unpin routine for umem pages. Fixes: 8aef7340ae9695 ("xsk: introduce xdp_umem_page") Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-19sctp: remove net sctp.x_enable working as a global switchXin Long1-16/+12
The netns sctp feature flags shouldn't work as a global switch, which is mostly like a firewall/netfilter's job. Also, it will break asoc as it discard or accept chunks incorrectly when net sctp.x_enable is changed after the asoc is created. Since each type of chunk's processing function will check the corresp asoc's feature flag, this 'global switch' should be removed, and net sctp.x_enable will only work as the default feature flags for the future sctp sockets/endpoints. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19sctp: add SCTP_AUTH_SUPPORTED sockoptXin Long1-0/+86
SCTP_AUTH_SUPPORTED sockopt is used to set enpoint's auth flag. With this feature, each endpoint will have its own flag for its future asoc's auth_capable, instead of netns auth flag. Note that when both ep's auth_enable is enabled, endpoint auth related data should be initialized. If asconf_enable is also set, SCTP_CID_ASCONF/SCTP_CID_ASCONF_ACK should be added into auth_chunk_list. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19sctp: add sctp_auth_init and sctp_auth_freeXin Long2-56/+74
This patch is to factor out sctp_auth_init and sctp_auth_free functions, and sctp_auth_init will also be used in the next patch for SCTP_AUTH_SUPPORTED sockopt. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19sctp: use ep and asoc auth_enable properlyXin Long2-33/+44
sctp has per endpoint auth flag and per asoc auth flag, and the asoc one should be checked when coming to asoc and the endpoint one should be checked when coming to endpoint. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19sctp: add SCTP_ASCONF_SUPPORTED sockoptXin Long1-0/+82
SCTP_ASCONF_SUPPORTED sockopt is used to set enpoint's asconf flag. With this feature, each endpoint will have its own flag for its future asoc's asconf_capable, instead of netns asconf flag. Note that when both ep's asconf_enable and auth_enable are enabled, SCTP_CID_ASCONF and SCTP_CID_ASCONF_ACK should be added into auth_chunk_list. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19sctp: check asoc peer.asconf_capable before processing asconfXin Long1-2/+4
asconf chunks should be dropped when the asoc doesn't support asconf feature. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19sctp: not set peer.asconf_capable in sctp_association_initXin Long1-9/+0
asoc->peer.asconf_capable is to be set during handshake, and its value should be initialized to 0. net->sctp.addip_noauth will be checked in sctp_process_init when processing INIT_ACK on client and COOKIE_ECHO on server. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19sctp: add asconf_enable in struct sctp_endpointXin Long3-20/+18
This patch is to make addip/asconf flag per endpoint, and its value is initialized by the per netns flag, net->sctp.addip_enable. It also replaces the checks of net->sctp.addip_enable with ep->asconf_enable in some places. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19net: remove empty inet_exit_netLi RongQing1-5/+0
Pointer members of an object with static storage duration, if not explicitly initialized, will be initialized to a NULL pointer. The net namespace API checks if this pointer is not NULL before using it, it are safe to remove the function. Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller3-15/+35
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Remove IP MASQUERADING record in MAINTAINERS file, from Denis Efremov. 2) Counter arguments are swapped in ebtables, from Todd Seidelmann. 3) Missing netlink attribute validation in flow_offload extension. 4) Incorrect alignment in xt_nfacct that breaks 32-bits userspace / 64-bits kernels, from Juliana Rodrigueiro. 5) Missing include guard in nf_conntrack_h323_types.h, from Masahiro Yamada. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19tcp: make sure EPOLLOUT wont be missedEric Dumazet1-7/+9
As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure"), it is crucial we properly set SOCK_NOSPACE when needed. However, Jason patch had a bug, because the 'nonblocking' status as far as sk_stream_wait_memory() is concerned is governed by MSG_DONTWAIT flag passed at sendmsg() time : long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(), and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE cleared, if sk->sk_sndtimeo has been set to a small (but not zero) value. This patch removes the 'noblock' variable since we must always set SOCK_NOSPACE if -EAGAIN is returned. It also renames the do_nonblock label since we might reach this code path even if we were in blocking mode. Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Baron <jbaron@akamai.com> Reported-by: Vladimir Rutsky <rutsky@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19net: flow_offload: convert block_ing_cb_list to regular list typeVlad Buslov1-6/+7
RCU list block_ing_cb_list is protected by rcu read lock in flow_block_ing_cmd() and with flow_indr_block_ing_cb_lock mutex in all functions that use it. However, flow_block_ing_cmd() needs to call blocking functions while iterating block_ing_cb_list which leads to following suspicious RCU usage warning: [ 401.510948] ============================= [ 401.510952] WARNING: suspicious RCU usage [ 401.510993] 5.3.0-rc3+ #589 Not tainted [ 401.510996] ----------------------------- [ 401.511001] include/linux/rcupdate.h:265 Illegal context switch in RCU read-side critical section! [ 401.511004] other info that might help us debug this: [ 401.511008] rcu_scheduler_active = 2, debug_locks = 1 [ 401.511012] 7 locks held by test-ecmp-add-v/7576: [ 401.511015] #0: 00000000081d71a5 (sb_writers#4){.+.+}, at: vfs_write+0x166/0x1d0 [ 401.511037] #1: 000000002bd338c3 (&of->mutex){+.+.}, at: kernfs_fop_write+0xef/0x1b0 [ 401.511051] #2: 00000000c921c634 (kn->count#317){.+.+}, at: kernfs_fop_write+0xf7/0x1b0 [ 401.511062] #3: 00000000a19cdd56 (&dev->mutex){....}, at: sriov_numvfs_store+0x6b/0x130 [ 401.511079] #4: 000000005425fa52 (pernet_ops_rwsem){++++}, at: unregister_netdevice_notifier+0x30/0x140 [ 401.511092] #5: 00000000c5822793 (rtnl_mutex){+.+.}, at: unregister_netdevice_notifier+0x35/0x140 [ 401.511101] #6: 00000000c2f3507e (rcu_read_lock){....}, at: flow_block_ing_cmd+0x5/0x130 [ 401.511115] stack backtrace: [ 401.511121] CPU: 21 PID: 7576 Comm: test-ecmp-add-v Not tainted 5.3.0-rc3+ #589 [ 401.511124] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 401.511127] Call Trace: [ 401.511138] dump_stack+0x85/0xc0 [ 401.511146] ___might_sleep+0x100/0x180 [ 401.511154] __mutex_lock+0x5b/0x960 [ 401.511162] ? find_held_lock+0x2b/0x80 [ 401.511173] ? __tcf_get_next_chain+0x1d/0xb0 [ 401.511179] ? mark_held_locks+0x49/0x70 [ 401.511194] ? __tcf_get_next_chain+0x1d/0xb0 [ 401.511198] __tcf_get_next_chain+0x1d/0xb0 [ 401.511251] ? uplink_rep_async_event+0x70/0x70 [mlx5_core] [ 401.511261] tcf_block_playback_offloads+0x39/0x160 [ 401.511276] tcf_block_setup+0x1b0/0x240 [ 401.511312] ? mlx5e_rep_indr_setup_tc_cb+0xca/0x290 [mlx5_core] [ 401.511347] ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core] [ 401.511359] tc_indr_block_get_and_ing_cmd+0x11b/0x1e0 [ 401.511404] ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core] [ 401.511414] flow_block_ing_cmd+0x7e/0x130 [ 401.511453] ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core] [ 401.511462] __flow_indr_block_cb_unregister+0x7f/0xf0 [ 401.511502] mlx5e_nic_rep_netdevice_event+0x75/0xb0 [mlx5_core] [ 401.511513] unregister_netdevice_notifier+0xe9/0x140 [ 401.511554] mlx5e_cleanup_rep_tx+0x6f/0xe0 [mlx5_core] [ 401.511597] mlx5e_detach_netdev+0x4b/0x60 [mlx5_core] [ 401.511637] mlx5e_vport_rep_unload+0x71/0xc0 [mlx5_core] [ 401.511679] esw_offloads_disable+0x5b/0x90 [mlx5_core] [ 401.511724] mlx5_eswitch_disable.cold+0xdf/0x176 [mlx5_core] [ 401.511759] mlx5_device_disable_sriov+0xab/0xb0 [mlx5_core] [ 401.511794] mlx5_core_sriov_configure+0xaf/0xd0 [mlx5_core] [ 401.511805] sriov_numvfs_store+0xf8/0x130 [ 401.511817] kernfs_fop_write+0x122/0x1b0 [ 401.511826] vfs_write+0xdb/0x1d0 [ 401.511835] ksys_write+0x65/0xe0 [ 401.511847] do_syscall_64+0x5c/0xb0 [ 401.511857] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 401.511862] RIP: 0033:0x7fad892d30f8 [ 401.511868] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 96 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 60 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89 [ 401.511871] RSP: 002b:00007ffca2a9fad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 401.511875] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fad892d30f8 [ 401.511878] RDX: 0000000000000002 RSI: 000055afeb072a90 RDI: 0000000000000001 [ 401.511881] RBP: 000055afeb072a90 R08: 00000000ffffffff R09: 000000000000000a [ 401.511884] R10: 000055afeb058710 R11: 0000000000000246 R12: 0000000000000002 [ 401.511887] R13: 00007fad893a8780 R14: 0000000000000002 R15: 00007fad893a3740 To fix the described incorrect RCU usage, convert block_ing_cb_list from RCU list to regular list and protect it with flow_indr_block_ing_cb_lock mutex in flow_block_ing_cmd(). Fixes: 1150ab0f1b33 ("flow_offload: support get multi-subsystem block") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller39-196/+429
Merge conflict of mlx5 resolved using instructions in merge commit 9566e650bf7fdf58384bb06df634f7531ca3a97e. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19netfilter: xt_nfacct: Fix alignment mismatch in xt_nfacct_match_infoJuliana Rodrigueiro1-11/+25
When running a 64-bit kernel with a 32-bit iptables binary, the size of the xt_nfacct_match_info struct diverges. kernel: sizeof(struct xt_nfacct_match_info) : 40 iptables: sizeof(struct xt_nfacct_match_info)) : 36 Trying to append nfacct related rules results in an unhelpful message. Although it is suggested to look for more information in dmesg, nothing can be found there. # iptables -A <chain> -m nfacct --nfacct-name <acct-object> iptables: Invalid argument. Run `dmesg' for more information. This patch fixes the memory misalignment by enforcing 8-byte alignment within the struct's first revision. This solution is often used in many other uapi netfilter headers. Signed-off-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-19netfilter: nft_flow_offload: missing netlink attribute policyPablo Neira Ayuso1-0/+6
The netlink attribute policy for NFTA_FLOW_TABLE_NAME is missing. Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>