summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2013-01-26can: rename LED trigger name on netdev renamesKurt Van Dijck3-0/+52
The LED trigger name for CAN devices is based on the initial CAN device name, but does never change. The LED trigger name is not guaranteed to be unique in case of hotplugging CAN devices. This patch tries to address this problem by modifying the LED trigger name according to the CAN device name when the latter changes. v1 - Kurt Van Dijck v2 - Fabio Baltieri - remove rename blocking if trigger is bound - use led-subsystem function for the actual rename (still WiP) - call init/exit functions from dev.c v3 - Kurt Van Dijck - safe operation for non-candev based devices (vcan, slcan) based on earlier patch v4 - Kurt Van Dijck - trivial patch mistakes fixed Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be> Signed-off-by: Fabio Baltieri <fabio.baltieri@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2013-01-26can: export a safe netdev_priv wrapper for candevKurt Van Dijck2-0/+16
In net_device notifier calls, it was impossible to determine if a CAN device is based on candev in a safe way. This patch adds such test in order to access candev storage from within those notifiers. Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Fabio Baltieri <fabio.baltieri@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2013-01-26can: add tx/rx LED trigger supportFabio Baltieri5-0/+149
This patch implements the functions to add two LED triggers, named <ifname>-tx and <ifname>-rx, to a canbus device driver. Triggers are called from specific handlers by each CAN device driver and can be disabled altogether with a Kconfig option. The implementation keeps the LED on when the interface is UP and blinks the LED on network activity at a configurable rate. This only supports can-dev based drivers, as it uses some support field in the can_priv structure. Supported drivers should call devm_can_led_init() and can_led_event() as needed. Cleanup is handled automatically by devres, so no *_exit function is needed. Supported events are: - CAN_LED_EVENT_OPEN: turn on tx/rx LEDs - CAN_LED_EVENT_STOP: turn off tx/rx LEDs - CAN_LED_EVENT_TX: trigger tx LED blink - CAN_LED_EVENT_RX: trigger tx LED blink Cc: Wolfgang Grandegger <wg@grandegger.com> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Fabio Baltieri <fabio.baltieri@gmail.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2013-01-26can: usb_8dev: Add support for USB2CAN interface from 8 devicesBernd Krumboeck3-0/+1029
Add device driver for USB2CAN interface from "8 devices" (http://www.8devices.com). changes since v10: * small cleanups changes since v9: * fixed syslog messages * fixed crc error number * increased MAX_RX_URBS and MAX_TX_URBS changes since v8: * remove all sysfs files changes since v7: * add sysfs documentation * fix minor styling issue * fixed can state for passive mode * changed handling for crc errors changes since v6: * changed some variable types to big endian equivalent * small cleanups changes since v5: * unlock mutex on error changes since v4: * removed FSF address * renamed struct usb_8dev * removed unused variable free_slots * replaced some _to_cpu functions with pointer equivalent * fix return value for usb_8dev_set_mode * handle can errors with separate function * fix overrun error handling * rewrite error handling for usb_8dev_start_xmit * fix urb submit in usb_8dev_start * various small fixes Acked-by: Wolfgang Grandegger <wg@grandegger.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Bernd Krumboeck <krumboeck@universalnet.at> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2013-01-26can: sja1000: correct indention of Kconfig help textMarc Kleine-Budde1-6/+6
This patch changes the indention of the Kconfig help text to the default <tab> + 2 <space>. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2013-01-26can: Kconfig: switch on all CAN protocolls by defaultMarc Kleine-Budde1-3/+3
This patch enables all basic CAN protocol by default. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2013-01-26can: Kconfig: convert 'depends on CAN_DEV' into 'if CAN_DEV...endif' blockMarc Kleine-Budde7-15/+18
This patch adds an 'if CAN_DEV...endif' Block around the CAN driver symbols in drivers/net/can/Kconfig. So the 'depends on CAN' dependencies can be removed. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2013-01-26can: Kconfig: convert 'depends on CAN' into 'if CAN...endif' blockMarc Kleine-Budde2-8/+4
This patch adds an 'if CAN...endif' Block around all CAN symbols in net/can/Kconfig. So the 'depends on CAN' dependencies can be removed. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2013-01-23ixgbevf: Fix link speed message to support 100MbpsGreg Rose1-3/+16
The X540 can link at 100Mbps - fix the link speed indicator message to show that value. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23ixgbe: Limit number of reported VFs to device specific valueDonald Dutile1-0/+1
ixgbe claims it supports 64 VFs in its SRIOV capability structure, but the driver only supports 63. Adjust it so sysfs sriov configuration checking will check with the proper totalvf value. Signed-off-by: Donald Dutile <ddutile@redhat.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23ixgbe: Implement PCI SR-IOV sysfs callback operationGreg Rose4-6/+122
Implement callbacks in the driver for the new PCI bus driver interface that allows the user to enable/disable SR-IOV VFs in a device via the sysfs interface. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> CC: Don Dutile <ddutile@redhat.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23ixgbe: Modularize SR-IOV enablement codeGreg Rose1-40/+54
In preparation for enable/disable of SR-IOV via the PCI sysfs interface move some core SR-IOV enablement code that would be common to module parameter usage or callback from the PCI bus driver to a separate function so that it can be used by either method. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> CC: Don Dutile <ddutile@redhat.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <Sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23ixgbe: Make mailbox ops initialization unconditionalGreg Rose3-11/+10
There is no actual dependency on initialization of the mailbox ops on whether SR-IOV is enabled or not and it doesn't hurt to go ahead and initialize ops unconditionally. Move the initialization into the device probe so that the mailbox ops are initialized at the time we have the board info necessary to do it. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> CC: Don Dutile <ddutile@redhat.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <Sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23ixgbe: only compile ixgbe_debugfs.o when enabledJacob Keller2-6/+2
This patch modifies ixgbe_debugfs.c and the Makefile for the ixgbe driver to only compile the file when the config is enabled. This means we can remove the #ifdef inside the ixgbe_debugfs.c file. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23ixgbe: Inline Rx PTP descriptor handlingAlexander Duyck2-19/+23
This change is meant to inline the Rx PTP descriptor handling. The main motivation is to avoid unnecessary jumps into function calls that we then immediately exit because we are not performing timestamps. The net result of this change is that ixgbe_ptp_rx_tstamp drops from .5% CPU utilization in my performance runs to 0%, and the only value tested is the Rx descriptor which should already be warm in the cache if not stored in a register. Cc: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Jacob Keller <Jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23ixgbe: Fix overwriting of rx_mtrl in ixgbe_ptp_hwtstamp_ioctlJacob Keller1-2/+2
This patch corrects a bug introduced by commit f3444d8b. The rxmtrl value for the UDP port to timestamp on was moved above the switch statement, but was overwritten to 0 if the ioctl selected one of the V1 filters. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23ixgbe: add warning when scheduling resetJacob Keller1-0/+2
This patch adds warnings when a reset of the adapter is scheduled so that the user can see log of why the reset occurred. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23ixgbe: Add ptp work item to poll for the Tx timestampJacob Keller3-34/+65
This patch copies the igb implementation of Tx timestamps, which uses a work item to poll for the Tx timestamp. In addition it adds a timeout value of 15 seconds, after which it will stop polling. This is necessary due to an issue with the descriptor being marked done before the Tx timestamp event has occurred. These two events don't correlate, so using the done bit on the descriptor as indication that the timestamp must already have been taken leads to potentially dropped Tx timestamps (especially under heavy packet load) Reported-by: Matthew Vick <matthew.vick@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23ixgbe: Use watchdog check in favor of BPF for detecting latched timestampJacob Keller3-87/+51
This patch removes ixgbe_ptp_match, and the corresponding packet filtering from ixgbe driver. This code was previously causing some issues within the hotpath of the driver. However the code also provided a check against possible frozen Rx timestamp due to dropped packets when the Rx ring is full. This patch provides a replacement solution based on the watchdog. To this end, whenever a packet consumes the Rx timestamp it stores the jiffy value in the rx_ring structure. Watchdog updates its own jiffy timer whenever there is no valid timestamp in the registers. If watchdog detects a valid timestamp in the registers, (meaning that no Rx packet has consumed it yet) it will check which time is most recent, the last time in the watchdog, or any time in the rx_rings. If the most recent "event" was more than 5seconds ago, it will flush the Rx timestamp and print a warning message to the syslog. Reported-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23ixgbe: Update ptp_overflow check comment and jiffiesJacob Keller1-7/+7
This patch fixes the comment on ptp_overflow_check to match up with what is currently used as the parameters. Also change the jiffies check to use time_is_after_jiffies macro. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23ixgbe: add missing supported filters to get_ts_infoJacob Keller1-0/+8
This patch updates the filters for ethtool's get_ts_info to return support for all filters which can be supported by upscaling to ptp_v2_event. The intent behind this change is due to reasoning that we do in fact support the filters. (hwtstamp_ioctl returns success after setting the filter to the upscaled version). In this way we can remain consistent over which filters are supported via the get_ts_info ioctl and which filters are in practice actually supported. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23ixgbe: ethtool ixgbe_diag_test cleanupJacob Keller1-12/+26
This patch cleans up the ethtool diagnostics test by ensuring that the tests work properly regardless of what state the adapter was in. The SRIOV VF check is done at the beginning, forgoing the link test. The if_running -> dev_close is moved before the link test, as well as a call to enable the Tx laser. This ensures that the link test will return valid results even when adapter was previously down. Also, a call to disable the Tx laser is added if the device was down before the start. This ensures consistent behavior of the Tx laser before and after the diagnostic checks. The end result is consistent behavior regardless of device state. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-01-23Merge branch 'testing' of ↵David S. Miller10-176/+185
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Add a statistic counter for invalid output states and remove a superfluous state valid check, from Li RongQing. 2) Probe for asynchronous block ciphers instead of synchronous block ciphers to make the asynchronous variants available even if no synchronous block ciphers are found, from Jussi Kivilinna. 3) Make rfc3686 asynchronous block cipher and make use of the new asynchronous variant, from Jussi Kivilinna. 4) Replace some rwlocks by rcu, from Cong Wang. 5) Remove some unused defines. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23bnx2x: SR-IOV version compatibility bugfixAriel Elior3-4/+5
When posting a message on the bulletin board, the PF calculates crc over the message and places the result in the message. When the VF samples the Bulletin Board it copies the message aside and validates this crc. The length of the message is crucial here and must be the same in both parties. Since the PF is running in the Hypervisor and the VF is running in a Vm, they can possibly be of different versions. As the Bulletin Board is designed to grow forward in future versions, in the VF the length must not be the size of the message structure but instead it should be a field in the message itself. Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23bnx2x: Fix compilation with stop-on-errorYuval Mintz2-2/+2
Commit 823e1d9 caused bnx2x to fail once BNX2X_STOP_ON_ERROR is set. Fixes compilation by moving function declarations between header files. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23cnic, bnx2x: Add CNIC_DRV_STATE_HANDLES_IRQ to ethdev->drv_stateMichael Chan3-8/+11
In INTA mode, cnic and bnx2x share the same IRQ. During chip reset, for example, cnic will stop servicing IRQs after it has shutdown the cnic hardware resources. However, the shared IRQ is still active as bnx2x needs to finish the reset. There is a window when bnx2x does not know that cnic is no longer handling IRQ and things don't always work properly. Add a flag to tell bnx2x that cnic is handling IRQ. The flag is set before the first cnic IRQ is expected and cleared when no more cnic IRQs are expected, so there should be no race conditions. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23bnx2x: correct memory release schemeYuval Mintz2-4/+3
Fix an incorrect SR-IOV memory release which was committed in 1ab4434. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23bnx2x: Remove many sparse warningsYuval Mintz9-159/+195
Remove most of the sparse warnings in the bnx2x compilation (i.e., thus resulting when compiling with `C=2 CF=-D__CHECK_ENDIAN__'). Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23bnx2x: Modify unload conditionsYuval Mintz1-3/+11
Don't unload the bnx2x driver if its in a recovery process, or if the previous load have failed. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23bnx2x: Correct memory preparation and releaseDmitry Kravkov2-44/+16
Since commit 15192a8cf there have been a memory leak upon rmmod of the bnx2x driver. This corrects the memory leak and corrects the zeroing of internal memories upon driver load. Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23bnx2x: Add missing VFs reference in macrosYuval Mintz1-1/+3
Add missing 57712_VF and 57800_VF to CHIP_IS_E2 and CHIP_IS_E3 macros (missing from commit 8395be5). Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23bnx2x: Add additional debug informationYuval Mintz4-6/+38
Add/Revise several debug prints in the bnx2x driver - on regular flows as well as error flows. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23bnx2x: correct usleep_range usageYuval Mintz4-10/+10
Change the incorrect usage of `usleep_range(1000, 1000)' into `usleep_range(1000, 2000)'. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23bnx2x: reorganization and beautificationYuval Mintz5-139/+122
Slightly changes the bnx2x code without `true' functional changes. Changes include: 1. Gathering macros into a single macro when combination is used multiple times. 2. Exporting parts of functions into their own functions. 3. Return values after if-else instead of only on the else condition (where current flow would simply return same value later in the code) 4. Removing some unnecessary code (either dead-code or incorrect conditions) Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23bnx2x: Semantic renovationYuval Mintz9-105/+86
Mostly corrects white spaces, indentations, and comments. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23gianfar: Restore promisc mode on gfar_init_mac()Claudiu Manoil1-0/+4
Reactivate promiscuous mode in H/W upon gfar_init_mac(), if the net dev requires it (IFF_PROMISC flag set). This way the promisc mode is preserved accross device reset conditions like tx timeout, device restore, a.s.o. Signed-off-by: Voncken C Acksys <cedric.voncken@acksys.fr> Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23Merge branch 'soreuseport'David S. Miller29-70/+213
Tom Herbert says: ==================== This series implements so_reuseport (SO_REUSEPORT socket option) for TCP and UDP.  For TCP, so_reuseport allows multiple listener sockets to be bound to the same port.  In the case of UDP, so_reuseport allows multiple sockets to bind to the same port.  To prevent port hijacking all sockets bound to the same port using so_reuseport must have the same uid.  Received packets are distributed to multiple sockets bound to the same port using a 4-tuple hash. The motivating case for so_resuseport in TCP would be something like a web server binding to port 80 running with multiple threads, where each thread might have it's own listener socket.  This could be done as an alternative to other models: 1) have one listener thread which dispatches completed connections to workers. 2) accept on a single listener socket from multiple threads.  In case #1 the listener thread can easily become the bottleneck with high connection turn-over rate. In case #2, the proportion of connections accepted per thread tends to be uneven under high connection load (assuming simple event loop: while (1) { accept(); process() }, wakeup does not promote fairness among the sockets.  We have seen the  disproportion to be as high as 3:1 ratio between thread accepting most connections and the one accepting the fewest.  With so_reusport the distribution is uniform. The TCP implementation has a problem in that the request sockets for a listener are attached to a listener socket.  If a SYN is received, a listener socket is chosen and request structure is created (SYN-RECV state).  If the subsequent ack in 3WHS does not match the same port by so_reusport, the connection state is not found (reset) and the request structure is orphaned.  This scenario would occur when the number of listener sockets bound to a port changes (new ones are added, or old ones closed).  We are looking for a solution to this, maybe allow multiple sockets to share the same request table... The motivating case for so_reuseport in UDP would be something like a DNS server.  An alternative would be to recv on the same socket from multiple threads.  As in the case of TCP, the load across these threads tends to be disproportionate and we also see a lot of contection on the socket lock.  Note that SO_REUSEADDR already allows multiple UDP sockets to bind to the same port, however there is no provision to prevent hijacking and nothing to distribute packets across all the sockets sharing the same bound port.  This patch does not change the semantics of SO_REUSEADDR, but provides usable functionality of it for unicast. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23soreuseport: UDP/IPv6 implementationTom Herbert1-3/+27
Motivation for soreuseport would be something like a DNS server.  An alternative would be to recv on the same socket from multiple threads. As in the case of TCP, the load across these threads tends to be disproportionate and we also see a lot of contection on the socket lock. Note that SO_REUSEADDR already allows multiple UDP sockets to bind to the same port, however there is no provision to prevent hijacking and nothing to distribute packets across all the sockets sharing the same bound port.  This patch does not change the semantics of SO_REUSEADDR, but provides usable functionality of it for unicast. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23soreuseport: TCP/IPv6 implementationTom Herbert5-10/+38
Motivation for soreuseport would be something like a web server binding to port 80 running with multiple threads, where each thread might have it's own listener socket. This could be done as an alternative to other models: 1) have one listener thread which dispatches completed connections to workers. 2) accept on a single listener socket from multiple threads. In case #1 the listener thread can easily become the bottleneck with high connection turn-over rate. In case #2, the proportion of connections accepted per thread tends to be uneven under high connection load (assuming simple event loop: while (1) { accept(); process() }, wakeup does not promote fairness among the sockets. We have seen the disproportion to be as high as 3:1 ratio between thread accepting most connections and the one accepting the fewest. With so_reusport the distribution is uniform. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23soreuseport: UDP/IPv4 implementationTom Herbert1-18/+43
Allow multiple UDP sockets to bind to the same port. Motivation soreuseport would be something like a DNS server.  An alternative would be to recv on the same socket from multiple threads. As in the case of TCP, the load across these threads tends to be disproportionate and we also see a lot of contection on the socketlock. Note that SO_REUSEADDR already allows multiple UDP sockets to bind to the same port, however there is no provision to prevent hijacking and nothing to distribute packets across all the sockets sharing the same bound port.  This patch does not change the semantics of SO_REUSEADDR, but provides usable functionality of it for unicast. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23soreuseport: TCP/IPv4 implementationTom Herbert5-21/+73
Allow multiple listener sockets to bind to the same port. Motivation for soresuseport would be something like a web server binding to port 80 running with multiple threads, where each thread might have it's own listener socket. This could be done as an alternative to other models: 1) have one listener thread which dispatches completed connections to workers. 2) accept on a single listener socket from multiple threads. In case #1 the listener thread can easily become the bottleneck with high connection turn-over rate. In case #2, the proportion of connections accepted per thread tends to be uneven under high connection load (assuming simple event loop: while (1) { accept(); process() }, wakeup does not promote fairness among the sockets. We have seen the disproportion to be as high as 3:1 ratio between thread accepting most connections and the one accepting the fewest. With so_reusport the distribution is uniform. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23soreuseport: infrastructureTom Herbert18-18/+32
Definitions and macros for implementing soreusport. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23xen-netback: allow changing the MAC address of the interfaceMatt Wilson1-0/+2
Sometimes it is useful to be able to change the MAC address of the interface for netback devices. For example, when using ebtables it may be useful to be able to distinguish traffic from different interfaces without depending on the interface name. Reported-by: Nikita Borzykh <sample.n@gmail.com> Reported-by: Paul Harvey <stockingpaul@hotmail.com> Cc: netdev@vger.kernel.org Cc: xen-devel@lists.xen.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Matt Wilson <msw@amazon.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23netfilter: nf_conntrack: fix compilation if sysctl are disabledPablo Neira Ayuso1-2/+9
In (f94161c netfilter: nf_conntrack: move initialization out of pernet operations), some ifdefs were missing for sysctl dependent code. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_conntrack: refactor l4proto support for netnsGao feng8-106/+204
Move the code that register/unregister l4proto to the module_init/exit context. Given that we have to modify some interfaces to accomodate these changes, it is a good time to use shorter function names for this using the nf_ct_* prefix instead of nf_conntrack_*, that is: nf_ct_l4proto_register nf_ct_l4proto_pernet_register nf_ct_l4proto_unregister nf_ct_l4proto_pernet_unregister We same many line breaks with it. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_conntrack: refactor l3proto support for netnsGao feng4-41/+49
Move the code that register/unregister l3proto to the module_init/exit context. Given that we have to modify some interfaces to accomodate these changes, it is a good time to use shorter function names for this using the nf_ct_* prefix instead of nf_conntrack_*, that is: nf_ct_l3proto_register nf_ct_l3proto_pernet_register nf_ct_l3proto_unregister nf_ct_l3proto_pernet_unregister We same many line breaks with it. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_ct_proto: move initialization out of pernet_operationsGao feng3-20/+34
Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_ct_labels: move initialization out of pernet_operationsGao feng3-20/+16
Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_ct_helper: move initialization out of pernet_operationsGao feng3-34/+41
Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_ct_timeout: move initialization out of pernet_operationsGao feng3-27/+19
Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>