summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2018-03-22net: Move IP_ROUTER_ALERT out of lock_sock(sk)Kirill Tkhai1-3/+2
ip_ra_control() does not need sk_lock. Who are the another users of ip_ra_chain? ip_mroute_setsockopt() doesn't take sk_lock, while parallel IP_ROUTER_ALERT syscalls are synchronized by ip_ra_lock. So, we may move this command out of sk_lock. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Revert "ipv4: get rid of ip_ra_lock"Kirill Tkhai1-2/+10
This reverts commit ba3f571d5dde. The commit was made after 1215e51edad1 "ipv4: fix a deadlock in ip_ra_control", and killed ip_ra_lock, which became useless after rtnl_lock() made used to destroy every raw ipv4 socket. This scales very bad, and next patch in series reverts 1215e51edad1. ip_ra_lock will be used again. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22liquidio: Added support for trusted VFIntiyaz Basha3-0/+125
When a VF is trusted, all promiscuous traffic will only be sent to that VF. In normal operation promiscuous traffic is sent to the PF. There can be only one trusted VF per PF Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22Merge branch 'rmnet-next'David S. Miller9-41/+94
Subash Abhinov Kasiviswanathan says: ==================== net: qualcomm: rmnet: Updates 2018-03-12 This series contains some minor updates for rmnet driver. Patch 1 contains fixes for sparse warnings. Patch 2 updates the copyright date to 2018. Patch 3 is a cleanup in receive path. Patch 4 has the new rmnet netlink attributes in uapi and updates the usage. Patch 5 has the implementation of the fill_info operation. v1->v2: Remove the force casts since the data type is changed to __be types as mentioned by David. v2->v3: Update copyright in files which actually had changes as mentioned by Joe. v3->v4: Add new netlink attributes for mux_id and flags instead of using the the vlan attributes as mentioned by David. The rmnet specific flags are also moved to uapi. The netlink updates are done as part of #4 and #5 has the fill_info operation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: qualcomm: rmnet: Implement fill_infoSubash Abhinov Kasiviswanathan1-0/+30
This is needed to query the mux_id and flags of a rmnet device. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: qualcomm: rmnet: Export mux_id and flags to netlinkSubash Abhinov Kasiviswanathan6-29/+53
Define new netlink attributes for rmnet mux_id and flags. These flags / mux_id were earlier using vlan flags / id respectively. The flag bits are also moved to uapi and are renamed with prefix RMNET_FLAG_*. Also add the rmnet policy to handle the new netlink attributes. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: qualcomm: rmnet: Remove unnecessary device assignmentSubash Abhinov Kasiviswanathan1-1/+0
Device of the de-aggregated skb is correctly assigned after inspecting the mux_id, so remove the assignment in rmnet_map_deaggregate(). Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: qualcomm: rmnet: Update copyright year to 2018Subash Abhinov Kasiviswanathan8-8/+8
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: qualcomm: rmnet: Fix casting issuesSubash Abhinov Kasiviswanathan1-3/+3
Fix warnings which were reported when running with sparse (make C=1 CF=-D__CHECK_ENDIAN__) drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c:81:15: warning: cast to restricted __be16 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:271:37: warning: incorrect type in assignment (different base types) expected unsigned short [unsigned] [usertype] pkt_len got restricted __be16 [usertype] <noident> drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:287:29: warning: incorrect type in assignment (different base types) expected unsigned short [unsigned] [usertype] pkt_len got restricted __be16 [usertype] <noident> drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:310:22: warning: cast to restricted __be16 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:319:13: warning: cast to restricted __be16 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c:49:18: warning: cast to restricted __be16 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c:50:18: warning: cast to restricted __be32 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c:74:21: warning: cast to restricted __be16 Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22gre: fix TUNNEL_SEQ bit check on sequence numberingColin Ian King2-2/+2
The current logic of flags | TUNNEL_SEQ is always non-zero and hence sequence numbers are always incremented no matter the setting of the TUNNEL_SEQ bit. Fix this by using & instead of |. Detected by CoverityScan, CID#1466039 ("Operands don't affect result") Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22Documentation/networking: Add net DIM documentationTal Gilboa1-0/+174
Net DIM is a generic algorithm, purposed for dynamically optimizing network devices interrupt moderation. This document describes how it works and how to use it. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: mvpp2: use correct index on array mvpp2_poolsColin Ian King1-1/+1
Array mvpp2_pools is being indexed by long_log_pool, however this looks like a cut-n-paste bug and in fact should be short_log_pool. Detected by CoverityScan, CID#1466113 ("Copy-paste error") Fixes: 576193f2d579 ("net: mvpp2: jumbo frames support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22Merge branch 'tipc-diag'David S. Miller7-26/+281
GhantaKrishnamurthy MohanKrishna says: ==================== tipc: socket diagnostics additions for AF_TIPC The following patchsets add socket diagnostics support for AF_TIPC by using the sock diag framework. The patchset was created on top of commit id: fb66cb0. New iproute2 package is needed to use this functionality which will be sent for review in a seperate mail. The commit series improves diagnosis of tipc sockets by exporting the configuration, states and statistics of sockets. The series has been co-authored by Parthasarathy Bhuvaragan and consist of two parts: 1-2: Adaptations of existing code to support sock_diag framework. We modify existing functions to support socket diagnostics. Required information about the sockets are exported. 3: Step sk_drops during packet drop. This occurs if the packet cannot be queued due to queue length exceeding configured thresholds. The diag module is optional, and if enabled it will be loaded on demand when needed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22tipc: step sk->sk_drops when rcv buffer is fullGhantaKrishnamurthy MohanKrishna2-2/+8
Currently when tipc is unable to queue a received message on a socket, the message is rejected back to the sender with error TIPC_ERR_OVERLOAD. However, the application on this socket has no knowledge about these discards. In this commit, we try to step the sk_drops counter when tipc is unable to queue a received message. Export sk_drops using tipc socket diagnostics. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22tipc: implement socket diagnostics for AF_TIPCGhantaKrishnamurthy MohanKrishna7-6/+238
This commit adds socket diagnostics capability for AF_TIPC in netlink family NETLINK_SOCK_DIAG in a new kernel module (diag.ko). The following are key design considerations: - config TIPC_DIAG has default y, like INET_DIAG. - only requests with flag NLM_F_DUMP is supported (dump all). - tipc_sock_diag_req message is introduced to send filter parameters. - the response attributes are of TLV, some nested. To avoid exposing data structures between diag and tipc modules and avoid code duplication, the following additions are required: - export tipc_nl_sk_walk function to reuse socket iterator. - export tipc_sk_fill_sock_diag to fill the tipc diag attributes. - create a sock_diag response message in __tipc_add_sock_diag defined in diag.c and use the above exported tipc_sk_fill_sock_diag to fill response. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22tipc: modify socket iterator for sock_diagGhantaKrishnamurthy MohanKrishna1-24/+41
The current socket iterator function tipc_nl_sk_dump, handles socket locks and calls __tipc_nl_add_sk for each socket. To reuse this logic in sock_diag implementation, we do minor modifications to make these functions generic as described below. In this commit, we add a two new functions __tipc_nl_sk_walk, __tipc_nl_add_sk_info and modify tipc_nl_sk_dump, __tipc_nl_add_sk accordingly. In __tipc_nl_sk_walk we: 1. acquire and release socket locks 2. for each socket, execute the specified callback function In __tipc_nl_add_sk we: - Move the netlink attribute insertion to __tipc_nl_add_sk_info. tipc_nl_sk_dump calls tipc_nl_sk_walk with __tipc_nl_add_sk as argument. sock_diag will use these generic functions in a later commit. There is no functional change in this commit. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22Merge branch 'mlxsw-Update-supported-firmware-version'David S. Miller3-11/+20
Ido Schimmel says: ==================== mlxsw: Update supported firmware version The first patch bumps the firmware version supported by the driver. The second patch enables a feature introduced in the new version, auto-negotiation disable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22mlxsw: spectrum: Add support for auto-negotiation disable modeTal Bar3-9/+18
In 'auto-neg off' the device have sent AN (auto-negotiation) frames with the forced speed. Thus, fix it using an_disable_admin field in Port type and speed (PTYS) register. This field indicates if speed negotiation frames would be send by the port or not. Add the field and enable/disable it for 'auto-neg on/off', make the port to start/stop sending AN (auto-negotiation) frames. Note that for SwitchX2 the behavior doesn't change (i.e support only AN enabled with forced speed). Signed-off-by: Tal Bar <talb@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22mlxsw: spectrum: Update the supported firmware to version 13.1620.192Tal Bar1-2/+2
This new firmware contains: - Support for auto-neg disable mode Signed-off-by: Tal Bar <talb@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22Merge branch 'fix-some-bugs-for-HNS3-driver'David S. Miller12-132/+309
Peng Li says: ==================== fix some bugs for HNS3 driver This patchset fixes some bugs for HNS3 driver: [Patch 1/11 - 5/11] fix various bugs reported by hisilicon test team. [Patch 6/11 - 7/11] fix bugs about interrupt coalescing self-adaptive function. [Patch 8/11 - 11/11] fix bugs about ethtool_ops.get_link_ksettings. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: hns3: fix for not returning problem in get_link_ksettings when phy existsFuyun Liang1-2/+6
When phy exists, phy_ethtool_ksettings_get function is enough to get the link ksettings. If the phy exists, get_link_ksettings function can return directly after phy_ethtool_ksettings_get is called. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: hns3: add querying speed and duplex support to VFFuyun Liang4-2/+37
This patch adds support for querying speed and duplex by ethtool ethX to VF. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: hns3: add get_link support to VFFuyun Liang2-0/+9
This patch adds ethtool_ops.get_link support to VF. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: hns3: fix for getting wrong link mode problemFuyun Liang5-89/+107
Fixed link mode is returned by hns3_get_link_ksettings. It is unreasonable. This patch fixes it by adding some related functions to get link mode from hardware. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: hns3: change the time interval of int_gl calculatingFuyun Liang2-17/+30
Since we change the update rate of int_gl from every interrupt to every one hundred interrupts, the old way to get time interval by int_gl value is not accurate. This patch calculates the time interval using the jiffies value. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: hns3: change GL update rateFuyun Liang2-0/+10
The interrupt coalescing self-adaptive function updates the int_gl every interrupt. The GL update rate is too faster to get a better new GL value. This patch changes the GL update rate to every one hundred interrupts. The GL update rate is defined by HNS3_INT_ADAPT_DOWN_START. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: hns3: increase the max time for IMP handle commandPeng Li2-2/+2
It may need more time for IMP handle some command, such as reset. This patch enlarges the max time for cmd timeout. Driver will check the IMP result every us, it may break through the loop when get the right result. So not all command need the max time. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: hns3: export pci table of hclge and hclgevf to userspaceYunsheng Lin2-0/+4
There is no module that is dependent on hclge or hclgevf's symbol, but hns_enet need them to provide ops for it to run. When there is a need to auto load the hns3 driver, the auto load will fail because hclge or hclgevf is not loaded. Hns_enet has already exported the pci table, so this patch exports the pci table for hclge and hclgevf module too. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: hns3: fix for vlan table lost problem when resettingYunsheng Lin2-0/+29
The vlan table in hardware is clear after PF/Core/IMP/Global reset, which will cause vlan tagged packets not being received problem. This patch fixes it by restoring the vlan table after reset. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: hns3: fix the VF queue reset flow errorPeng Li4-6/+53
VF queue reset flow is different from PF queue reset flow. VF driver should stop VF queue first, then send message to PF and PF do the reset. PF should send a response to VF after PF complete the queue reset, VF can initialize the queue hw after get the response. This patch fixes the VF queue reset flow as the correct step. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: hns3: reallocate tx/rx buffer after changing mtuFuyun Liang1-14/+22
When changing the mtu, the max frame size also will be changed. The tx buffer size and the rx buffer size to be allocated are determined by max frame size. So when max frame size is changed, the tx buffer and rx buffer need to be reallocated. When the tc_num is changed, the tx buffer and rx buffer need to be reallocated too. So calling set_mtu and buffer_alloc separately is better. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22devlink: Remove top_hierarchy arg to devlink_resource_registerDavid Ahern4-10/+10
top_hierarchy arg can be determined by comparing parent_resource_id to DEVLINK_RESOURCE_ID_PARENT_TOP so it does not need to be a separate argument. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22selftests: Add multipath tests for onlink flagDavid Ahern1-3/+95
Add multipath tests for onlink flag: one test with onlink added to both nexthops, then tests with onlink added to only 1 nexthop. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22Merge branch 'cxgb4-rdma'David S. Miller9-1/+321
Raju Rangoju says: ==================== Add support for RDMA enhancements in cxgb4 Allocates the HW-resources and provide the necessary routines for the upper layer driver (rdma/iw_cxgb4) to enable the RDMA SRQ support for Chelsio adapters. Advertise support for write with immediate work request Advertise support for write with completion v3: modified memory allocation as per Stefano's suggestion v2: fixed the patching issues and also fixed the following based on review comments of Stefano Brivio - using kvzalloc instead of vzalloc - using #define instead of enum ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22cxgb4: Support firmware rdma write completion work request.Raju Rangoju5-0/+11
If FW supports RDMA WRITE_COMPLETION functionality, then advertise that to the ULDs. This will be used by iw_cxgb4 to allow WRITE_COMPLETION work requests. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22cxgb4: Support firmware rdma write with immediate work request.Raju Rangoju5-0/+10
If FW supports RDMA WRITE_WITH_IMMEDATE functionality, then advertise that to the ULDs. This will be used by iw_cxgb4 to allow WRITE_WITH_IMMEDIATE work requests. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22cxgb4: Add support to query HW SRQ parametersRaju Rangoju2-0/+21
This patch adds support to query FW for the HW SRQ table start/end, and advertise that for ULDs. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22cxgb4: Add support to initialise/read SRQ entriesRaju Rangoju4-1/+206
- This patch adds support to initialise srq table and read srq entries Signed-off-by: Raju Rangoju <rajur@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22cxgb4: Adds CPL support for Shared Receive QueuesRaju Rangoju2-0/+73
- Add srq table query cpl support for srq - Add cpl_abort_req_rss6 and cpl_abort_rpl_rss6 structs. - Add accessors, macros to get the SRQ IDX value. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22Merge branch 'r8169-small-improvements'David S. Miller1-80/+61
Heiner Kallweit says: ==================== r8169: series with smaller improvements w/o functional changes This series includes smaller improvements w/o intended functional changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22r8169: add helper tp_to_devHeiner Kallweit1-15/+22
In several places struct device is referenced by using &tp->pci_dev->dev. Add helper tp_to_dev() to improve code readability. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22r8169: change type of argument in rtl_disable/enable_clock_requestHeiner Kallweit1-22/+11
Changing the argument type to struct rtl8169_private * is more in line with the other functions in the driver and it allows to reduce the code size. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22r8169: change type of first argument in rtl_tx_performance_tweakHeiner Kallweit1-38/+24
Changing the type of the first argument to struct rtl8169_private * is more in line with the other functions in the driver and it allows to reduce the code size. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22r8169: simplify rtl_set_mac_addressHeiner Kallweit1-5/+4
Replace open-coded functionality with eth_mac_addr(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22Merge tag 'batadv-next-for-davem-20180319' of ↵David S. Miller6-37/+604
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - avoid redundant multicast TT entries, by Linus Luessing - add netlink support for distributed arp table cache and multicast flags, by Linus Luessing (2 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22rds: tcp: remove register_netdevice_notifier infrastructure.Sowmini Varadhan1-70/+23
The netns deletion path does not need to wait for all net_devices to be unregistered before dismantling rds_tcp state for the netns (we are able to dismantle this state on module unload even when all net_devices are active so there is no dependency here). This patch removes code related to netdevice notifiers and refactors all the code needed to dismantle rds_tcp state into a ->exit callback for the pernet_operations used with register_pernet_device(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22netns: send uevent messagesChristian Brauner1-1/+78
This patch adds a receive method to NETLINK_KOBJECT_UEVENT netlink sockets to allow sending uevent messages into the network namespace the socket belongs to. Currently non-initial network namespaces are already isolated and don't receive uevents. There are a number of cases where it is beneficial for a sufficiently privileged userspace process to send a uevent into a network namespace. One such use case would be debugging and fuzzing of a piece of software which listens and reacts to uevents. By running a copy of that software inside a network namespace, specific uevents could then be presented to it. More concretely, this would allow for easy testing of udevd/ueventd. This will also allow some piece of software to run components inside a separate network namespace and then effectively filter what that software can receive. Some examples of software that do directly listen to uevents and that we have in the past attempted to run inside a network namespace are rbd (CEPH client) or the X server. Implementation: The implementation has been kept as simple as possible from the kernel's perspective. Specifically, a simple input method uevent_net_rcv() is added to NETLINK_KOBJECT_UEVENT sockets which completely reuses existing af_netlink infrastructure and does neither add an additional netlink family nor requires any user-visible changes. For example, by using netlink_rcv_skb() we can make use of existing netlink infrastructure to report back informative error messages to userspace. Furthermore, this implementation does not introduce any overhead for existing uevent generating codepaths. The struct netns got a new uevent socket member that records the uevent socket associated with that network namespace including its position in the uevent socket list. Since we record the uevent socket for each network namespace in struct net we don't have to walk the whole uevent socket list. Instead we can directly retrieve the relevant uevent socket and send the message. At exit time we can now also trivially remove the uevent socket from the uevent socket list. This keeps the codepath very performant without introducing needless overhead and even makes older codepaths faster. Uevent sequence numbers are kept global. When a uevent message is sent to another network namespace the implementation will simply increment the global uevent sequence number and append it to the received uevent. This has the advantage that the kernel will never need to parse the received uevent message to replace any existing uevent sequence numbers. Instead it is up to the userspace process to remove any existing uevent sequence numbers in case the uevent message to be sent contains any. Security: In order for a caller to send uevent messages to a target network namespace the caller must have CAP_SYS_ADMIN in the owning user namespace of the target network namespace. Additionally, any received uevent message is verified to not exceed size UEVENT_BUFFER_SIZE. This includes the space needed to append the uevent sequence number. Testing: This patch has been tested and verified to work with the following udev implementations: 1. CentOS 6 with udevd version 147 2. Debian Sid with systemd-udevd version 237 3. Android 7.1.1 with ueventd Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: add uevent socket memberChristian Brauner2-11/+10
This commit adds struct uevent_sock to struct net. Since struct uevent_sock records the position of the uevent socket in the uevent socket list we can trivially remove it from the uevent socket list during cleanup. This speeds up the old removal codepath. Note, list_del() will hit __list_del_entry_valid() in its call chain which will validate that the element is a member of the list. If it isn't it will take care that the list is not modified. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Convert nf_ct_net_opsKirill Tkhai1-0/+1
These pernet_operations register and unregister sysctl. Also, there is inet_frags_exit_net() called in exit method, which has to be safe after a560002437d3 "net: Fix hlist corruptions in inet_evict_bucket()". Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Convert lowpan_frags_opsKirill Tkhai1-0/+1
These pernet_operations register and unregister sysctl. Also, there is inet_frags_exit_net() called in exit method, which has to be safe after a560002437d3 "net: Fix hlist corruptions in inet_evict_bucket()". Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>