summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2018-11-07udp: add support for UDP_GRO cmsgPaolo Abeni3-0/+18
When UDP GRO is enabled, the UDP_GRO cmsg will carry the ingress datagram size. User-space can use such info to compute the original packets layout. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07udp: implement GRO for plain UDP sockets.Paolo Abeni5-28/+99
This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also eligible for GRO in the rx path: UDP segments directed to such socket are assembled into a larger GSO_UDP_L4 packet. The core UDP GRO support is enabled with setsockopt(UDP_GRO). Initial benchmark numbers: Before: udp rx: 1079 MB/s 769065 calls/s After: udp rx: 1466 MB/s 24877 calls/s This change introduces a side effect in respect to UDP tunnels: after a UDP tunnel creation, now the kernel performs a lookup per ingress UDP packet, while before such lookup happened only if the ingress packet carried a valid internal header csum. rfc v2 -> rfc v3: - fixed typos in macro name and comments - really enforce UDP_GRO_CNT_MAX, instead of UDP_GRO_CNT_MAX + 1 - acquire socket lock in UDP_GRO setsockopt rfc v1 -> rfc v2: - use a new option to enable UDP GRO - use static keys to protect the UDP GRO socket lookup Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07udp: implement complete book-keeping for encap_neededPaolo Abeni4-12/+34
The *encap_needed static keys are enabled by UDP tunnels and several UDP encapsulations type, but they are never turned off. This can cause unneeded overall performance degradation for systems where such features are used transiently. This patch introduces complete book-keeping for such keys, decreasing the usage at socket destruction time, if needed, and avoiding that the same socket could increase the key usage multiple times. rfc v3 -> v1: - add socket lock around udp_tunnel_encap_enable() rfc v2 -> rfc v3: - use udp_tunnel_encap_enable() in setsockopt() Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07Merge branch ↵David S. Miller22-84/+243
'vrf-allow-simultaneous-service-instances-in-default-and-other-VRFs' Mike Manning says: ==================== vrf: allow simultaneous service instances in default and other VRFs Services currently have to be VRF-aware if they are using an unbound socket. One cannot have multiple service instances running in the default and other VRFs for services that are not VRF-aware and listen on an unbound socket. This is because there is no easy way of isolating packets received in the default VRF from those arriving in other VRFs. This series provides this isolation for stream sockets subject to the existing kernel parameter net.ipv4.tcp_l3mdev_accept not being set, given that this is documented as allowing a single service instance to work across all VRF domains. Similarly, net.ipv4.udp_l3mdev_accept is checked for datagram sockets, and net.ipv4.raw_l3mdev_accept is introduced for raw sockets. The functionality applies to UDP & TCP services as well as those using raw sockets, and is for IPv4 and IPv6. Example of running ssh instances in default and blue VRF: $ /usr/sbin/sshd -D $ ip vrf exec vrf-blue /usr/sbin/sshd $ ss -ta | egrep 'State|ssh' State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 0.0.0.0%vrf-blue:ssh 0.0.0.0:* LISTEN 0 128 0.0.0.0:ssh 0.0.0.0:* ESTAB 0 0 192.168.122.220:ssh 192.168.122.1:50282 LISTEN 0 128 [::]%vrf-blue:ssh [::]:* LISTEN 0 128 [::]:ssh [::]:* ESTAB 0 0 [3000::2]%vrf-blue:ssh [3000::9]:45896 ESTAB 0 0 [2000::2]:ssh [2000::9]:46398 v1: - Address Paolo Abeni's comments (patch 4/5) - Fix build when CONFIG_NET_L3_MASTER_DEV not defined (patch 1/5) v2: - Address David Aherns' comments (patches 4/5 and 5/5) - Remove patches 3/5 and 5/5 from series for individual submissions - Include a sysctl for raw sockets as recommended by David Ahern - Expand series into 10 patches and provide improved descriptions v3: - Update description for patch 1/10 and remove patch 6/10 v4: - Set default to enabled for raw socket sysctl as recommended by David Ahern v5: - Address review comments from David Ahern in patches 2-5 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07ipv6: do not drop vrf udp multicast packetsDewi Morgan1-3/+5
For bound udp sockets in a vrf, also check the sdif to get the index for ingress devices enslaved to an l3mdev. Signed-off-by: Dewi Morgan <morgand@vyatta.att-mail.com> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07ipv6: handling of multicast packets received in VRFMike Manning1-3/+32
If the skb for multicast packets marked as enslaved to a VRF are received, then the secondary device index should be used to obtain the real device. And verify the multicast address against the enslaved rather than the l3mdev device. Signed-off-by: Dewi Morgan <morgand@vyatta.att-mail.com> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07ipv6: allow ping to link-local address in VRFMike Manning1-1/+1
If link-local packets are marked as enslaved to a VRF, then to allow ping to the link-local from a vrf, the error handling for IPV6_PKTINFO needs to be relaxed to also allow the pkt ipi6_ifindex to be that of a slave device to the vrf. Note that the real device also needs to be retrieved in icmp6_iif() to set the ipv6 flow oif to this for icmp echo reply handling. The recent commit 24b711edfc34 ("net/ipv6: Fix linklocal to global address with VRF") takes care of this, so the sdif does not need checking here. This fix makes ping to link-local consistent with that to global addresses, in that this can now be done from within the same VRF that the address is in. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07vrf: mark skb for multicast or link-local as enslaved to VRFMike Manning1-10/+9
The skb for packets that are multicast or to a link-local address are not marked as being enslaved to a VRF, if they are received on a socket bound to the VRF. This is needed for ND and it is preferable for the kernel not to have to deal with the additional use-cases if ll or mcast packets are handled as enslaved. However, this does not allow service instances listening on unbound and bound to VRF sockets to distinguish the VRF used, if packets are sent as multicast or to a link-local address. The fix is for the VRF driver to also mark these skb as being enslaved to the VRF. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: fix raw socket lookup device bind matching with VRFsDuncan Eastoe3-6/+15
When there exist a pair of raw sockets one unbound and one bound to a VRF but equal in all other respects, when a packet is received in the VRF context, __raw_v4_lookup() matches on both sockets. This results in the packet being delivered over both sockets, instead of only the raw socket bound to the VRF. The bound device checks in __raw_v4_lookup() are replaced with a call to raw_sk_bound_dev_eq() which correctly handles whether the packet should be delivered over the unbound socket in such cases. In __raw_v6_lookup() the match on the device binding of the socket is similarly updated to use raw_sk_bound_dev_eq() which matches the handling in __raw_v4_lookup(). Importantly raw_sk_bound_dev_eq() takes the raw_l3mdev_accept sysctl into account. Signed-off-by: Duncan Eastoe <deastoe@vyatta.att-mail.com> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFsMike Manning7-2/+68
Add a sysctl raw_l3mdev_accept to control raw socket lookup in a manner similar to use of tcp_l3mdev_accept for stream and of udp_l3mdev_accept for datagram sockets. Have this default to enabled for reasons of backwards compatibility. This is so as to specify the output device with cmsg and IP_PKTINFO, but using a socket not bound to the corresponding VRF. This allows e.g. older ping implementations to be run with specifying the device but without executing it in the VRF. If the option is disabled, packets received in a VRF context are only handled by a raw socket bound to the VRF, and correspondingly packets in the default VRF are only handled by a socket not bound to any VRF. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: ensure unbound datagram socket to be chosen when not in a VRFMike Manning5-21/+31
Ensure an unbound datagram skt is chosen when not in a VRF. The check for a device match in compute_score() for UDP must be performed when there is no device match. For this, a failure is returned when there is no device match. This ensures that bound sockets are never selected, even if there is no unbound socket. Allow IPv6 packets to be sent over a datagram skt bound to a VRF. These packets are currently blocked, as flowi6_oif was set to that of the master vrf device, and the ipi6_ifindex is that of the slave device. Allow these packets to be sent by checking the device with ipi6_ifindex has the same L3 scope as that of the bound device of the skt, which is the master vrf device. Note that this check always succeeds if the skt is unbound. Even though the right datagram skt is now selected by compute_score(), a different skt is being returned that is bound to the wrong vrf. The difference between these and stream sockets is the handling of the skt option for SO_REUSEPORT. While the handling when adding a skt for reuse correctly checks that the bound device of the skt is a match, the skts in the hashslot are already incorrect. So for the same hash, a skt for the wrong vrf may be selected for the required port. The root cause is that the skt is immediately placed into a slot when it is created, but when the skt is then bound using SO_BINDTODEVICE, it remains in the same slot. The solution is to move the skt to the correct slot by forcing a rehash. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: ensure unbound stream socket to be chosen when not in a VRFMike Manning4-16/+31
The commit a04a480d4392 ("net: Require exact match for TCP socket lookups if dif is l3mdev") only ensures that the correct socket is selected for packets in a VRF. However, there is no guarantee that the unbound socket will be selected for packets when not in a VRF. By checking for a device match in compute_score() also for the case when there is no bound device and attaching a score to this, the unbound socket is selected. And if a failure is returned when there is no device match, this ensures that bound sockets are never selected, even if there is no unbound socket. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: allow binding socket in a VRF when there's an unbound socketRobert Shearman6-22/+51
Change the inet socket lookup to avoid packets arriving on a device enslaved to an l3mdev from matching unbound sockets by removing the wildcard for non sk_bound_dev_if and instead relying on check against the secondary device index, which will be 0 when the input device is not enslaved to an l3mdev and so match against an unbound socket and not match when the input device is enslaved. Change the socket binding to take the l3mdev into account to allow an unbound socket to not conflict sockets bound to an l3mdev given the datapath isolation now guaranteed. Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: Remove set but not used variable 'reset_level'YueHaibing1-10/+3
Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c: In function 'hclge_log_and_clear_ppp_error': drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c:821:24: warning: variable 'reset_level' set but not used [-Wunused-but-set-variable] enum hnae3_reset_type reset_level = HNAE3_NONE_RESET; It never used since introduction in commit 01865a50d78f ("net: hns3: Add enable and process hw errors of TM scheduler") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07Merge branch 'nfp-more-set-actions-and-notifier-refactor'David S. Miller8-104/+252
Jakub Kicinski says: ==================== nfp: more set actions and notifier refactor This series brings updates to flower offload code. First Pieter adds support for setting TTL, ToS, Flow Label and Hop Limit fields in IPv4 and IPv6 headers. Remaining 5 patches deal with factoring out netdev notifiers from flower code. We already have two instances, and more is coming, so it's time to move to one central notifier which then feeds individual feature handlers. I start that part by cleaning up the existing notifiers. Next a central notifier is added, and used by flower offloads. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07nfp: flower: use the common netdev notifierJakub Kicinski4-55/+30
Use driver's common notifier for LAG and tunnel configuration. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07nfp: register a notifier handler in a central location for the deviceJakub Kicinski2-15/+57
Code interested in networking events registers its own notifier handlers. Create one device-wide notifier instance. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07nfp: flower: make nfp_fl_lag_changels_event() voidJakub Kicinski1-8/+5
nfp_fl_lag_changels_event() never fails, and therefore we would never return NOTIFY_BAD for NETDEV_CHANGELOWERSTATE. Make this clearer by changing nfp_fl_lag_changels_event()'s return type to void. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07nfp: flower: don't try to nack device unregister eventsJakub Kicinski1-9/+12
Returning an error from a notifier means we want to veto the change. We shouldn't veto NETDEV_UNREGISTER just because we couldn't find the tracking info for given master. I can't seem to find a way to trigger this unless we have some other bug, so it's probably not fix-worthy. While at it move the checking if the netdev really is of interest into the handling functions, like we do for other events. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07nfp: flower: remove unnecessary iteration over devicesJakub Kicinski1-7/+0
For flower tunnel offloads FW has to be informed about MAC addresses of tunnel devices. We use a netdev notifier to keep track of these addresses. Remove unnecessary loop over netdevices after notifier is registered. The intention of the loop was to catch devices which already existed on the system before nfp driver got loaded, but netdev notifier will replay NETDEV_REGISTER events. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07nfp: flower: add ipv6 set flow label and hop limit offloadPieter Jansen van Vuuren2-4/+75
Add ipv6 set flow label and hop limit action offload. Since pedit sets headers per 4 byte word, we need to ensure that setting either version, priority, payload_len or nexthdr does not get offloaded. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07nfp: flower: add ipv4 set ttl and tos offloadPieter Jansen van Vuuren2-6/+73
Add ipv4 set ttl and tos action offload. Since pedit sets headers per 4 byte word, we need to ensure that setting either version, ihl, protocol, total length or checksum does not get offloaded. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07Merge branch 'hns3-next'David S. Miller12-180/+544
Huazhong Tan says: ==================== hns3: provide new interfaces & bugfixes & code optimization This patchset provides some reset interfaces for RAS & RoCE, also some bugfixes and optimization related to reset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: fix for cmd queue memory not freed problem during resetHuazhong Tan3-65/+91
It is not necessary to reallocate the descriptor and remap the descriptor memory in reset process, otherwise it may cause memory not freed problem. Also, this patch initializes the cmd queue's spinlocks in hclgevf_alloc_cmd_queue, and take the spinlocks when reinitializing cmd queue' registers. Fixes: fedd0c15d288 ("net: hns3: Add HNS3 VF IMP(Integrated Management Proc) cmd interface") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: add error handler for hclge_reset()Huazhong Tan2-22/+120
When hclge_reset() is called, it may fail for several reasons. For example, an higher-level reset event occurs, memory allocation failure, hardware reset timeout, etc. Therefore, it is necessary to add corresponding error handling for these situations. 1. A high-level reset is required due to a high-level reset failure. 2. For memory allocation failure, a high-level reset is initiated by the timer to recover. The reason for using the timer is to prevent this new high-level reset to interrupt the reset process of other pf/vf; 3. For the case of hardware reset timeout, reschedule the reset task to wait for the hardware to complete the reset. For memory allocation failure and reset timeouts, in order to prevent an infinite number of scheduled reset tasks, the number of error recovery needs to be limited. This patch also add some reset related debug log printing. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: call roce's reset notify callback when resettingHuazhong Tan1-0/+33
While doing resetting, roce should do its uninitailization part before nic's, and do its initialization part after nic's. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: adjust the process of PF resetHuazhong Tan1-2/+36
When doing PF reset, the driver needs to do some preparatory work before asserting PF reset. Since when hardware is resetting, it is necessary to stop tx/rx queue, clear hardware table, etc, otherwise hardware may run into unrecoverable state if there is still IO running when the hardware is resetting. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: move some reset information from hnae3_handle into ↵Huazhong Tan7-34/+28
hclge_dev/hclgevf_dev Saving reset related information in the hclge_dev/hclgevf_dev structure is more suitable than the hnae3_handle, since hardware related information is kept in these two structure. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: ignore new coming low-level reset while doing high-level resetHuazhong Tan1-11/+16
When processing a higher level reset, the pending lower level reset does not have to be processed anymore, because the higher level reset is the superset of the lower level reset. Therefore, when processing an higher level reset, the request of lower level reset needs to be cleared. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: use HNS3_NIC_STATE_RESETTING to indicate resettingHuazhong Tan4-0/+48
While hclge is going to reset, it will notify its client with HNAE3_DOWN_CLIENT, so this client should get into a resetting status from this moment, other operations from the stack need to be blocked as well. And when the reset is finished, the client will be notified with HNAE3_UP_CLIENT, so this is the end of the resetting status. This patch uses HNS3_NIC_STATE_RESETTING flag to implement that, and adds hns3_nic_resetting() to indicate which operation is not allowed. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: enable/disable ring in the enet while doing UP/DOWNHuazhong Tan4-44/+41
While hardware gets into reset status, the firmware will not respond to driver's command request, which may cause ring not disabled problem during reset process. So this patch uses register instead of command to enable/disable the ring in the enet while doing UP/DOWN operation. Also, HNS3_RING_RX_VM_REG is previously unused, so change it to the correct meaning, and add a wrapper function for readl(). Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: adjust the location of clearing the table when doing resetHuazhong Tan1-10/+10
When doing a function reset, the hardware table should be cleared before the hardware reset. In current code, this clearing is done in hns3_reset_notify_uninit_enet, but it is too late, because the hardware reset is already done, hns3_reset_notify_down_enet is more suitable to do that. Fixes: bb6b94a896d4 ("net: hns3: Add reset interface implementation in client") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: provide some interface & information for the clientHuazhong Tan5-0/+68
The client needs to know if the hardware is resetting when loading or unloading itself, because client may abort the loading process or wait for the reset process to finish when unloading if hardware is resetting. So this patch provides these interfaces to do it. 1. get_hw_reset_stat, the reset status of hardware. 2. ae_dev_resetting, whether reset task is scheduling. 3. ae_dev_reset_cnt, how many reset has been done. Also, the RoCE client needs some field in the hnae3_roce_private_info to save its state, and process_hw_error interface in the hnae3_client_ops to process hardware errors. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: add set_default_reset_request in the hnae3_ae_opsHuazhong Tan5-1/+45
Currently, when reset_event is called because of tx timeout, it will upgrade the reset level (For PF, HNAE3_FUNC_RESET -> HNAE3_CORE_RESET -> HNAE3_GLOBAL_RESET) if the time between the new reset and last reset is within 20 secs, or restore the reset level to HNAE3_FUNC_RESET if the time between the new reset and last reset is over 20 secs. There is requirement that the caller needs to decide the reset level when triggering a reset, for example, RAS recovery. So this patch adds the set_default_reset_request to meet this requirement. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: use HNS3_NIC_STATE_INITED to indicate the initialization state of ↵Huazhong Tan2-1/+18
enet Besides of module_init and module_exit, the process of reset will also uninitialize and initialize the enet client. When reset process fails with enet client uninitialized, the module_exit does not need to uninitialize the enet client, otherwise it may cause double uninitialization problem. So we need the HNS3_NIC_STATE_INITED flag to indicate whether the enet client is initialized. Also HNS3_NIC_STATE_REINITING is previously unused, so change it to HNS3_NIC_STATE_INITED. Fixes: bb6b94a896d4 ("net: hns3: Add reset interface implementation in client") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06Merge branch 'net-systemport-Unmap-queues-upon-DSA-unregister-event'David S. Miller3-14/+62
Florian Fainelli says: ==================== net: systemport: Unmap queues upon DSA unregister event This patch series fixes the unbinding/binding of the bcm_sf2 switch driver along with bcmsysport which monitors the switch port queues. Because the driver was not processing the DSA_PORT_UNREGISTER event, we would not be unmapping switch port/queues, which could cause incorrect decisions to be made by the HW (e.g: queue always back-pressured). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06net: systemport: Unmap queues upon DSA unregister eventFlorian Fainelli1-6/+50
Binding and unbinding the switch driver which creates the DSA slave network devices for which we set-up inspection would lead to undesireable effects since we were not clearing the port/queue mapping to the SYSTEMPORT TX queue. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06net: systemport: Simplify queue mapping logicFlorian Fainelli2-9/+9
The use of a bitmap speeds up the finding of the first available queue to which we could start establishing the mapping for, but we still have to loop over all slave network devices to set them up. Simplify the logic to have a single loop, and use the fact that a correctly configured ring has inspect set to true. This will make things simpler to unwind during device unregistration. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06net: dsa: bcm_sf2: Turn on PHY to allow successful registrationFlorian Fainelli1-0/+4
We are binding to the PHY using the SF2 slave MDIO bus that we create, binding involves reading the PHY's MII_PHYSID1/2 which won't be possible if the PHY is turned off. Temporarily turn it on/off for the bus probing to succeeed. This fixes unbind/bind problems where the port connecting to that PHY would be in error since it could not connect to it. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06Merge branch 'net-dsa-bcm_sf2-Store-rules-in-lists'David S. Miller5-315/+204
Florian Fainelli says: ==================== net: dsa: bcm_sf2: Store rules in lists This patch series changes the bcm-sf2 driver to keep a copy of the inserted rules as opposed to using the HW as a storage area for a number of reasons: - this helps us with doing duplicate rule detection in a faster way, it would have required a full rule read before - this helps with Pablo's on-going work to convert ethtool_rx_flow_spec to a more generic flow rule structure by having fewer code paths to convert to the new structure/helpers - we need to cache copies to restore them during drive resumption, because depending on the low power mode the system has entered, the switch may have lost all of its context ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06net: systemport: Restore Broadcom tag match filters upon resumeFlorian Fainelli2-0/+13
Some of the system suspend states that we support wipe out entirely the HW contents. If we had a Wake-on-LAN filter programmed prior to going into suspend, but we did not actually wake-up from Wake-on-LAN and instead used a deeper suspend state, make sure we restore the CID number that we need to match against. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06net: dsa: bcm_sf2: Get rid of unmarshalling functionsFlorian Fainelli1-310/+0
Now that we have migrated the CFP rule handling to a list with a software copy, the delete/get operation just returns what is on the list, no need to read from the hardware which is both slow and more error prone. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06net: dsa: bcm_sf2: Restore CFP rules during system resumeFlorian Fainelli3-0/+41
The hardware can lose its context during system suspend, and depending on the switch generation (7445 vs. 7278), while the rules are still there, they will have their valid bit cleared (because that's the fastest way for the HW to reset things). Just make sure we re-apply them coming back from resume. The 7445 switch is an older version of the core that has some quirky RAM technology requiring a delete then re-inser to guarantee the RAM entries are properly latched. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06net: dsa: bcm_sf2: Split rule handling from HW operationFlorian Fainelli1-35/+54
In preparation for restoring CFP rules during system wide system suspend/resume where the hardware loses its context, split the rule validation from its actual insertion as well as the rule removal from its actual hardware deletion operation. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06net: dsa: bcm_sf2: Keep copy of inserted rulesFlorian Fainelli3-11/+137
We tried hard to use the hardware as a storage area, which made things needlessly complex in that we had to both marshall and unmarshall the ethtool_rx_flow_spec into what the CFP hardware understands but it did not require any driver level allocations, so that was nice. Keep a copy of the ethtool_rx_flow_spec rule we want to insert, and also make sure we don't have a duplicate rule already. This greatly speeds up the deletion time since we only need to clear the slice's valid bit and not perform a full read. This is a preparatory step for being able to restore rules upon system resumption where the hardware loses its context partially or entirely. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06Merge branch 'net-More-extack-messages'David S. Miller11-25/+48
David Ahern says: ==================== net: More extack messages Add more extack messages for several link create errors (e.g., invalid number of queues, unknown link kind) and invalid metrics argument. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06rtnetlink: Add more extack messages to rtnl_newlinkDavid Ahern1-2/+4
Add extack arg to the nla_parse_nested calls in rtnl_newlink, and add messages for unknown device type and link network namespace id. In particular, it improves the failure message when the wrong link type is used. From $ ip li add bond1 type bonding RTNETLINK answers: Operation not supported to $ ip li add bond1 type bonding Error: Unknown device type. (The module name is bonding but the link type is bond.) Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06net: Add extack argument to ip_fib_metrics_initDavid Ahern4-11/+25
Add extack argument to ip_fib_metrics_init and add messages for invalid metrics. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06net: Add extack argument to rtnl_create_linkDavid Ahern7-12/+19
Add extack arg to rtnl_create_link and add messages for invalid number of Tx or Rx queues. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06ipv6: gro: do not use slow memcmp() in ipv6_gro_receive()Eric Dumazet1-3/+10
ipv6_gro_receive() compares 34 bytes using slow memcmp(), while handcoding with a couple of ipv6_addr_equal() is much faster. Before this patch, "perf top -e cycles:pp -C <cpu>" would see memcmp() using ~10% of cpu cycles on a 40Gbit NIC receiving IPv6 TCP traffic. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>