summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2016-09-13rxrpc: Add missing unlock in rxrpc_call_accept()David Howells1-3/+5
Add a missing unlock in rxrpc_call_accept() in the path taken if there's no call to wake up. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-13rxrpc: Requeue call for recvmsg if more dataDavid Howells1-0/+4
rxrpc_recvmsg() needs to make sure that the call it has just been processing gets requeued for further attention if the buffer has been filled and there's more data to be consumed. The softirq producer only queues the call and wakes the socket if it fills the first slot in the window, so userspace might end up sleeping forever otherwise, despite there being data available. This is not a problem provided the userspace buffer is big enough or it empties the buffer completely before more data comes in. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-13rxrpc: The IDLE ACK packet should use rxrpc_idle_ack_delayDavid Howells1-1/+1
The IDLE ACK packet should use the rxrpc_idle_ack_delay setting when the timer is set for it. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-13rxrpc: Add missing wakeup on Tx window rotationDavid Howells1-0/+2
We need to wake up the sender when Tx window rotation due to an incoming ACK makes space in the buffer otherwise the sender is liable to just hang endlessly. This problem isn't noticeable if the Tx phase transfers no more than will fit in a single window or the Tx window rotates fast enough that it doesn't get full. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-13rxrpc: Make sure we initialise the peer hash keyDavid Howells1-1/+1
Peer records created for incoming connections weren't getting their hash key set. This meant that incoming calls wouldn't see more than one DATA packet - which is not a problem for AFS CM calls with small request data blobs. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-13Merge branch 'mlxsw-ethtool'David S. Miller3-146/+209
Jiri Pirko says: ==================== mlxsw: ethtool enhancements Ido says: Patches 1-4 do some minor cleanup in current ethtool ops. Patch 5 replace legacy {get,set}_settings callbacks with {get,set}_link_ksettings. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13mlxsw: spectrum: Add support for new ethtool APIIdo Schimmel2-160/+185
Remove the deprecated {get,set}_settings callbacks and instead add {get,set}_link_ksettings along with support for newly available speeds. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13mlxsw: spectrum: Indicate support of multiple port typesIdo Schimmel1-3/+5
The device can support multiple port types, so don't return on first match. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13mlxsw: spectrum: Report port type according to operational speedIdo Schimmel1-1/+0
In case port isn't operational we shouldn't report the port type, but instead return PORT_OTHER. This is consistent with most other drivers that return PORT_OTHER when media type can't be determined. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13mlxsw: spectrum: Report link partner's advertised speedsIdo Schimmel2-1/+27
If autonegotiation was performed successfully, then we should report the link partner's advertised speeds instead of the operational speed of the port. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13mlxsw: spectrum: Correctly report autonegotiationIdo Schimmel2-3/+14
Up until now the device always reported autonegotiation to be off although it was on by default. Allow the user to disable / enable autonegotiation and report its status correctly. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13net: ethernet: apm: xgene: use new api ethtool_{get|set}_link_ksettingsPhilippe Reynes1-24/+37
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13net: ethernet: apm: xgene: use phydev from struct net_devicePhilippe Reynes4-19/+18
The private structure contain a pointer to phydev, but the structure net_device already contain such pointer. So we can remove the pointer phy_dev in the private structure, and update the driver to use the one contained in struct net_device. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13net: ethernet: dwmac: fix non static symbol warningWei Yongjun1-1/+2
Fixes the following sparse warning: drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c:172:1: warning: symbol 'stm32_dwmac_pm_ops' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13net: macb: fix missing unlock on error in macb_start_xmit()Wei Yongjun1-1/+1
Fix missing unlock before return from function macb_start_xmit() in the error handling case. Fixes: 007e4ba3ee13 ("net: macb: initialize checksum when using checksum offloading") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13net: Remove NO_IRQ from powerpc-only network driversMichael Ellerman8-16/+15
We'd like to eventually remove NO_IRQ on powerpc, so remove usages of it from powerpc-only drivers. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13tipc: fix possible memory leak in tipc_udp_enable()Wei Yongjun1-1/+2
'ub' is malloced in tipc_udp_enable() and should be freed before leaving from the error handling cases, otherwise it will cause memory leak. Fixes: ba5aa84a2d22 ("tipc: split UDP nl address parsing") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13net: bridge: add helper to call /sbin/bridge-stpVivien Didelot1-12/+31
If /sbin/bridge-stp is available on the system, bridge tries to execute it instead of the kernel implementation when starting/stopping STP. If anything goes wrong with /sbin/bridge-stp, bridge silently falls back to kernel STP, making hard to debug userspace STP. This patch adds a br_stp_call_user helper to start/stop userspace STP and debug errors from the program: abnormal exit status is stored in the lower byte and normal exit status is stored in higher byte. Below is a simple example on a kernel with dynamic debug enabled: # ln -s /bin/false /sbin/bridge-stp # brctl stp br0 on br0: failed to start userspace STP (256) # dmesg br0: /sbin/bridge-stp exited with code 1 br0: failed to start userspace STP (256) br0: using kernel STP Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12iwlegacy: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas1-2/+2
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12hamradio: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas1-1/+1
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12stmmac: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas1-1/+1
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Reviewed-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12sis900: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas2-3/+3
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Acked-by: Daniele Venzano <venza@brownhat.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12sfc: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas1-2/+2
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Acked-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12natsemi: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas1-1/+1
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12net: mvneta: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas1-1/+1
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12ixgbe: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas1-2/+2
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12i825xx: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas1-2/+2
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12net/fsl_pq_mdio: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas1-4/+4
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12sundance: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas1-1/+1
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12bnx2: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas1-1/+1
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12ethernet: amd: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas2-4/+4
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12starfire: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas1-1/+1
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-123c59x: use IS_ENABLED() instead of checking for built-in or moduleJavier Martinez Canillas1-1/+1
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller473-2493/+3789
Conflicts: drivers/net/ethernet/mediatek/mtk_eth_soc.c drivers/net/ethernet/qlogic/qed/qed_dcbx.c drivers/net/phy/Kconfig All conflicts were cases of overlapping commits. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds92-610/+875
Pull networking fixes from David Miller: "Mostly small sets of driver fixes scattered all over the place. 1) Mediatek driver fixes from Sean Wang. Forward port not written correctly during TX map, missed handling of EPROBE_DEFER, and mistaken use of put_page() instead of skb_free_frag(). 2) Fix socket double-free in KCM code, from WANG Cong. 3) QED driver fixes from Sudarsana Reddy Kalluru, including a fix for using the dcbx buffers before initializing them. 4) Mellanox Switch driver fixes from Jiri Pirko, including a fix for double fib removals and an error handling fix in mlxsw_sp_module_init(). 5) Fix kernel panic when enabling LLDP in i40e driver, from Dave Ertman. 6) Fix padding of TSO packets in thunderx driver, from Sunil Goutham. 7) TCP's rcv_wup not initialized properly when using fastopen, from Neal Cardwell. 8) Don't use uninitialized flow keys in flow dissector, from Gao Feng. 9) Use after free in l2tp module unload, from Sabrina Dubroca. 10) Fix interrupt registry ordering issues in smsc911x driver, from Jeremy Linton. 11) Fix crashes in bonding having to do with enslaving and rx_handler, from Mahesh Bandewar. 12) AF_UNIX deadlock fixes from Linus. 13) In mlx5 driver, don't read skb->xmit_mode after it might have been freed from the TX reclaim path. From Tariq Toukan. 14) Fix a bug from 2015 in TCP Yeah where the congestion window does not increase, from Artem Germanov. 15) Don't pad frames on receive in NFP driver, from Jakub Kicinski. 16) Fix chunk fragmenting in SCTP wrt. GSO, from Marcelo Ricardo Leitner. 17) Fix deletion of VRF routes, from Mark Tomlinson. 18) Fix device refcount leak when DAD fails in ipv6, from Wei Yongjun" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (101 commits) net/mlx4_en: Fix panic on xmit while port is down net/mlx4_en: Fixes for DCBX net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_state() net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_all() net: ethernet: renesas: sh_eth: add POST registers for rz drivers: net: phy: mdio-xgene: Add hardware dependency dwc_eth_qos: do not register semi-initialized device sctp: identify chunks that need to be fragmented at IP level mlxsw: spectrum: Set port type before setting its address mlxsw: spectrum_router: Fix error path in mlxsw_sp_router_init nfp: don't pad frames on receive nfp: drop support for old firmware ABIs nfp: remove linux/version.h includes tcp: cwnd does not increase in TCP YeAH net/mlx5e: Fix parsing of vlan packets when updating lro header net/mlx5e: Fix global PFC counters replication net/mlx5e: Prevent casting overflow net/mlx5e: Move an_disable_cap bit to a new position net/mlx5e: Fix xmit_more counter race issue tcp: fastopen: avoid negative sk_forward_alloc ...
2016-09-11Linux 4.8-rc6v4.8-rc6Linus Torvalds1-1/+1
2016-09-11Merge branch 'mlx4-fixes'David S. Miller5-59/+50
Tariq Toukan says: ==================== mlx4 fixes This patchset contains several bug fixes from the team to the mlx4 Eth driver. Series generated against net commit: c2f57fb97da5 "drivers: net: phy: mdio-xgene: Add hardware dependency" v2: * excluded some cleanup patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11net/mlx4_en: Fix panic on xmit while port is downMoshe Shemesh1-5/+7
When port is down, tx drop counter update is not needed. Updating the counter in this case can cause a kernel panic as when the port is down, ring can be NULL. Fixes: 63a664b7e92b ("net/mlx4_en: fix tx_dropped bug") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11net/mlx4_en: Fixes for DCBXTariq Toukan4-43/+28
This patch adds a capability check before enabling DCBX. In addition, it re-organizes the relevant data structures, and fixes a typo in a define. Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_state()Kamal Heib1-1/+4
mlx4_en_dcbnl_set_state() returns u8, the return value from mlx4_en_setup_tc() could be negative in case of failure, so fix that. Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_all()Kamal Heib1-10/+11
mlx4_en_dcbnl_set_all() returns u8, so return value can't be negative in case of failure. Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Rana Shahout <ranas@mellanox.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11net: dsa: bcm_sf2: Get VLAN_PORT_MASK from b53_deviceFlorian Fainelli2-3/+1
While migrating the bcm_sf2 driver to use b53_common, we left a small piece untouched where we kept our local copy of the per-port port_vlan_ctl bitmask value. This value is now maintained by b53_device so we need to use it instead of our local (and now stale) copy of it. Fixes: f458995b9ad8 ("net: dsa: bcm_sf2: Utilize core B53 driver when possible") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11nvme: make NVME_RDMA depend on BLOCKLinus Torvalds1-1/+1
Commit aa71987472a9 ("nvme: fabrics drivers don't need the nvme-pci driver") removed the dependency on BLK_DEV_NVME, but the cdoe does depend on the block layer (which used to be an implicit dependency through BLK_DEV_NVME). Otherwise you get various errors from the kbuild test robot random config testing when that happens to hit a configuration with BLOCK device support disabled. Cc: Christoph Hellwig <hch@lst.de> Cc: Jay Freyensee <james_p_freyensee@linux.intel.com> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-11Merge tag 'staging-4.8-rc6' of ↵Linus Torvalds6-8/+19
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull IIO fixes from Greg KH: "Here are a few small IIO fixes for 4.8-rc6. Nothing major, full details are in the shortlog, all of these have been in linux-next with no reported issues" * tag 'staging-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio:core: fix IIO_VAL_FRACTIONAL sign handling iio: ensure ret is initialized to zero before entering do loop iio: accel: kxsd9: Fix scaling bug iio: accel: bmc150: reset chip at init time iio: fix pressure data output unit in hid-sensor-attributes tools:iio:iio_generic_buffer: fix trigger-less mode
2016-09-11Merge tag 'usb-4.8-rc6' of ↵Linus Torvalds8-8/+39
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB gadget, phy, and xhci fixes for 4.8-rc6. All of these resolve minor issues that have been reported, and all have been in linux-next with no reported issues" * tag 'usb-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: chipidea: udc: fix NULL ptr dereference in isr_setup_status_phase xhci: fix null pointer dereference in stop command timeout function usb: dwc3: pci: fix build warning on !PM_SLEEP usb: gadget: prevent potenial null pointer dereference on skb->len usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition usb: phy: phy-generic: Check clk_prepare_enable() error usb: gadget: udc: renesas-usb3: clear VBOUT bit in DRD_CON Revert "usb: dwc3: gadget: always decrement by 1"
2016-09-10Merge branch 'vrf-tx-hook'David S. Miller19-364/+315
David Ahern says: ==================== net: Convert vrf to tx hook The motivation for this series is that ICMP Unreachable - Fragmentation Needed packets are not handled properly for VRFs. Specifically, the FIB lookup in __ip_rt_update_pmtu fails so no nexthop exception is created with the reduced MTU. As a result connections stall if packets larger than the smallest MTU in the path are generated. While investigating that problem I also noticed that the MSS for all connections in a VRF is based on the VRF device's MTU and not the route the packets ultimately go through. VRF currently uses a dst to direct packets to the device. The first FIB lookup returns this dst and then the lookup in the VRF driver gets the actual output route. A side effect of this design is that the VRF dst is cached on sockets and then used for calculations like the MSS. This series fixes this problem by removing the hook in the FIB lookups that returns the dst pointing to the VRF device to the VRF and always doing the actual FIB lookup. This allows the real dst to be used throughout the stack (for example the MSS). Packets are diverted to the VRF device on Tx using an l3mdev hook in the output path similar to to what is done for Rx. The end result is a simpler implementation for VRF with fewer intrusions into the network stack and symmetrical packet handling for Rx and Tx paths. Comparison of netperf performance for a build without l3mdev (best case performance), the old vrf driver and the VRF driver from this series. Data are collected using VMs with virtio + vhost. The netperf client runs in the VM and netserver runs in the host. 1-byte RR tests are done as these packets exaggerate the performance hit due to the extra lookups done for l3mdev and VRF. Command: netperf -cC -H ${ip} -l 60 -t {TCP,UDP}_RR [-J red] TCP_RR UDP_RR IPv4 IPv6 IPv4 IPv6 no l3mdev 29,996 30,601 31,638 24,336 vrf old 27,417 27,626 29,159 24,801 vrf new 28,036 28,372 30,110 24,857 l3mdev, no vrf 29,534 30,465 30,670 24,346 * Transactions per second as reported by netperf * netperf modified to take a bind-to-device argument -- the -J red option 1. 'no l3mdev' == NET_L3_MASTER_DEV is unset so code is compiled out 2. 'vrf old' == data for existing implementation 3. 'vrf new' == data with this series 4. 'l3mdev, no vrf' == NET_L3_MASTER_DEV is enabled but traffic is not going through a VRF About the series - patch 1 adds the flow update (changing oif or iif to L3 master device and setting the flag to skip the oif check) to ipv4 and ipv6 paths just before hitting the rules. This catches all code paths in a single spot. - patch 2 adds the Tx hook to push the packet to the l3mdev if relevant - patch 3 adds some checks so the vrf device can act as a vrf-local loopback. These changes were not needed before since the vrf dst was returned from the lookup. - patches 4 and 5 flip the ipv4 and ipv6 stacks to the tx hook leaving the route lookup to be the real one. The dst flip happens at the beginning of the L3 output path so the VRFs can have device based features such as netfilter, tc and tcpdump. - patches 6-11 remove no longer needed l3mdev code v2 - properly handle IPv6 link scope addresses - keep the device xmit path and associated dst which is switched in by the l3_out hook. packets still need to go through the xmit path in case the user puts a qdisc on the vrf device and to allow tc rules. version 1 short circuited the tx handling and only covered netfilter and tcpdump. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10net: flow: Remove FLOWI_FLAG_L3MDEV_SRC flagDavid Ahern2-5/+3
No longer used Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10net: l3mdev: remove get_rtable methodDavid Ahern2-42/+0
No longer used Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10net: l3mdev: Remove l3mdev_fib_oifDavid Ahern1-29/+0
No longer used Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10net: ipv6: Remove l3mdev_get_saddr6David Ahern4-84/+1
No longer needed Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>