summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2012-11-09ipip: advertise tunnel param via rtnlNicolas Dichtel1-1/+56
It is usefull for daemons that monitor link event to have the full parameters of these interfaces when a rtnl message is sent. It allows also to dump them via rtnetlink. It is based on what is done for GRE tunnels. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-09ipip: add GSO supportEric Dumazet1-0/+12
In commit 6b78f16e4b (gre: add GSO support) we added GSO support to GRE tunnels. This patch does the same for IPIP tunnels. Performance of single TCP flow over an IPIP tunnel is increased by 40% Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-08vlan: set sysfs device_type to 'vlan'Doug Goldstein1-0/+6
Sets the sysfs device_type to 'vlan' for udev. This makes it easier for applications that query network information via udev to identify vlans instead of using strrchr(). Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-08ipv6: remove rt6i_peer_genid from rt6_info and its handlerLi RongQing1-16/+2
6431cbc25f(Create a mechanism for upward inetpeer propagation into routes) introduces these codes, but this mechanism is never enabled since rt6i_peer_genid always is zero whether it is not assigned or assigned by rt6_peer_genid(). After 5943634fc5 (ipv4: Maintain redirect and PMTU info in struct rtable again), the ipv4 related codes of this mechanism has been removed, I think we maybe able to remove them now. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-07Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller21-54/+1680
Included changes: - minimal fixes to the packet layout to avoid the __packed attribute when not needed - new packet type called UNICAST_4ADDR: in this packet it is possible to find both source and destination node (in the classic UNICAST header only the destination field exists). - a new feature: Distributed ARP Table (D.A.T.). It aims to reduce ARP lookups latency by means of a simil-DHT approach.
2012-11-07ndisc: fix a typo in a comment in ndisc_recv_na()Nicolas Dichtel1-1/+1
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-07packet: tx_ring: allow the user to choose tx data offsetPaul Chavent2-1/+46
The tx data offset of packet mmap tx ring used to be : (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll)) The problem is that, with SOCK_RAW socket, the payload (14 bytes after the beginning of the user data) is misaligned. This patch allows to let the user gives an offset for it's tx data if he desires. Set sock option PACKET_TX_HAS_OFF to 1, then specify in each frame of your tx ring tp_net for SOCK_DGRAM, or tp_mac for SOCK_RAW. Signed-off-by: Paul Chavent <paul.chavent@onera.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-07batman-adv: enable fast client detection using unicast_4addr packetsAntonio Quartulli1-2/+10
The "early client detection mechanism" can be extended to find new clients by means of unicast_4addr packets. The unicast_4addr packet contains as well as the broadcast packet (which is currently used in this mechanism) the address of the originating node and can therefore be used to install new entries in the Global Translation Table Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2012-11-07batman-adv: Add get_ethtool_stats() support for DATMartin Hundebøll5-3/+65
Added additional counters for D.A.T. Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-11-07batman-adv: Distributed ARP Table - add runtime switchAntonio Quartulli4-0/+31
This patch adds a runtime switch that enables the user to turn the DAT feature on or off at runtime Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-11-07batman-adv: Distributed ARP Table - add compile optionAntonio Quartulli5-4/+94
This patch makes it possible to decide whether to include DAT within the batman-adv binary or not. It is extremely useful when the user wants to reduce the size of the resulting module by cutting off any not needed feature. Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-11-07batman-adv: Distributed ARP Table - add snooping functions for ARP messagesAntonio Quartulli6-2/+325
In case of an ARP message going in or out the soft_iface, it is intercepted and a special action is performed. In particular the DHT helper functions previously implemented are used to store all the ARP entries belonging to the network in order to provide a fast and unicast lookup instead of the classic broadcast flooding mechanism. Each node stores the entries it is responsible for (following the DHT rules) in its soft_iface ARP table. This makes it possible to reuse the kernel data structures and functions for ARP management. Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-11-07batman-adv: Distributed ARP Table - add ARP parsing functionsAntonio Quartulli4-3/+213
ARP messages are now parsed to make it possible to trigger special actions depending on their types (snooping). Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-11-07batman-adv: Distributed ARP Table - implement local storageAntonio Quartulli6-0/+346
Since batman-adv cannot inter-operate with the host ARP table, this patch introduces a batman-adv private storage for ARP entries exchanged within DAT. This storage will represent the node local cache in the DAT protocol. Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-11-07batman-adv: Distributed ARP Table - create DHT helper functionsAntonio Quartulli9-4/+378
Add all the relevant functions in order to manage a Distributed Hash Table over the B.A.T.M.A.N.-adv network. It will later be used to store several ARP entries and implement DAT (Distributed ARP Table) Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-11-07batman-adv: Distributed ARP Table - add a new debug log levelAntonio Quartulli1-1/+3
A new log level has been added to concentrate messages regarding DAT: ARP snooping, requests, response and DHT related messages. The new log level is named BATADV_DBG_DAT Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-11-07batman-adv: add UNICAST_4ADDR packet typeAntonio Quartulli6-26/+194
The current unicast packet type does not contain the orig source address. This patches add a new unicast packet (called UNICAST_4ADDR) which provides two new fields: the originator source address and the subtype (the type of the data contained in the packet payload). The former is useful to identify the node which injected the packet into the network and the latter is useful to avoid creating new unicast packet types in the future: a macro defining a new subtype will be enough. Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-11-07batman-adv: Mark correctly aligned headers not as __packedSven Eckelmann1-6/+15
Headers which are already perfectly aligned and create a 4 byte boundary non-ethernet header payload can have the __packed attribute removed. The __packed attribute doesn't change the appeareance of the packet for these headers because no extra padding is necessary to align the data members. The compiler will also create slightly faster code for loads of multi-byte members. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-11-07batman-adv: Reserve extra bytes in skb for better alignmentSven Eckelmann4-19/+22
The ethernet header is 14 bytes long. Therefore, the data after it is not 4 byte aligned and may cause problems on systems without unaligned data access. Reserving NET_IP_ALIGN more byes can fix the misalignment of the ethernet header. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-11-06htb: fix two bugsEric Dumazet1-12/+5
Commit 56b765b79e9 (htb: improved accuracy at high rates) introduced two bugs : 1) one bstats_update() was inadvertently removed from htb_dequeue_tree(), breaking statistics/rate estimation. 2) Missing qdisc_put_rtab() calls in htb_change_class(), leaking kernel memory, now struct htb_class no longer retains pointers to qdisc_rate_table structs. Since only rate is used, dont use qdisc_get_rtab() calls copying data we ignore anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vimalkumar <j.vimal@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-04bridge: Avoid 'statement with no effect' compiler warningsLee Jones1-4/+4
Instead of issuing (0) statements when !CONFIG_SYSFS which will cause 'warning: ', we'll use inline statements instead. This will effectively do the same thing, but suppress any unnecessary warnings. Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: bridge@lists.linux-foundation.org Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03net: fix bridge notify hook to manage flags correctlyJohn Fastabend1-8/+18
The bridge notify hook rtnl_bridge_notify() was not handling the case where the master flags was set or with both flags set. First flags are not being passed correctly and second the logic to parse them is broken. This patch passes the original flags value and fixes the logic. Reported-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03htb: improved accuracy at high ratesVimalkumar1-38/+90
Current HTB (and TBF) uses rate table computed by the "tc" userspace program, which has the following issue: The rate table has 256 entries to map packet lengths to token (time units). With TSO sized packets, the 256 entry granularity leads to loss/gain of rate, making the token bucket inaccurate. Thus, instead of relying on rate table, this patch explicitly computes the time and accounts for packet transmission times with nanosecond granularity. This greatly improves accuracy of HTB with a wide range of packet sizes. Example: tc qdisc add dev $dev root handle 1: \ htb default 1 tc class add dev $dev classid 1:1 parent 1: \ rate 5Gbit mtu 64k Here is an example of inaccuracy: $ iperf -c host -t 10 -i 1 With old htb: eth4: 34.76 Mb/s In 5827.98 Mb/s Out - 65836.0 p/s In 481273.0 p/s Out [SUM] 9.0-10.0 sec 669 MBytes 5.61 Gbits/sec [SUM] 0.0-10.0 sec 6.50 GBytes 5.58 Gbits/sec With new htb: eth4: 28.36 Mb/s In 5208.06 Mb/s Out - 53704.0 p/s In 430076.0 p/s Out [SUM] 9.0-10.0 sec 594 MBytes 4.98 Gbits/sec [SUM] 0.0-10.0 sec 5.80 GBytes 4.98 Gbits/sec The bits per second on the wire is still 5200Mb/s with new HTB because qdisc accounts for packet length using skb->len, which is smaller than total bytes on the wire if GSO is used. But that is for another patch regardless of how time is accounted. Many thanks to Eric Dumazet for review and feedback. Signed-off-by: Vimalkumar <j.vimal@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03ipv6: introduce ip6_rt_put()Amerigo Wang10-26/+23
As suggested by Eric, we could introduce a helper function for ipv6 too, to avoid checking if rt is NULL before dst_release(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03sctp: Clean up type-punning in sctp_cmd_t unionNeil Horman2-25/+23
Lots of points in the sctp_cmd_interpreter function treat the sctp_cmd_t arg as a void pointer, even though they are written as various other types. Theres no need for this as doing so just leads to possible type-punning issues that could cause crashes, and if we remain type-consistent we can actually just remove the void * member of the union entirely. Change Notes: v2) * Dropped chunk that modified SCTP_NULL to create a marker pattern should anyone try to use a SCTP_NULL() assigned sctp_arg_t, Assigning to .zero provides the same effect and should be faster, per Vlad Y. v3) * Reverted part of V2, opting to use memset instead of .zero, so that the entire union is initalized thus avoiding the i164 speculative load problems previously encountered, per Dave M.. Also rewrote SCTP_[NO]FORCE so as to use common infrastructure a little more Signed-off-by: Neil Horman <nhorman@tuxdriver.com CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: linux-sctp@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03ipv6: remove a useless NULL checkAmerigo Wang1-1/+1
In dev_forward_change(), it is useless to check if idev->dev is NULL, it is always non-NULL here. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03pktgen: clean up ktime_t helpersDaniel Borkmann1-28/+13
Some years ago, the ktime_t helper functions ktime_now() and ktime_lt() have been introduced. Instead of defining them inside pktgen.c, they should either use ktime_t library functions or, if not available, they should be defined in ktime.h, so that also others can benefit from them. ktime_compare() is introduced with a similar notion as in timespec_compare(). Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03tcp: better retrans tracking for defer-acceptEric Dumazet10-31/+48
For passive TCP connections using TCP_DEFER_ACCEPT facility, we incorrectly increment req->retrans each time timeout triggers while no SYNACK is sent. SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for which we received the ACK from client). Only the last SYNACK is sent so that we can receive again an ACK from client, to move the req into accept queue. We plan to change this later to avoid the useless retransmit (and potential problem as this SYNACK could be lost) TCP_INFO later gives wrong information to user, claiming imaginary retransmits. Decouple req->retrans field into two independent fields : num_retrans : number of retransmit num_timeout : number of timeouts num_timeout is the counter that is incremented at each timeout, regardless of actual SYNACK being sent or not, and used to compute the exponential timeout. Introduce inet_rtx_syn_ack() helper to increment num_retrans only if ->rtx_syn_ack() succeeded. Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans when we re-send a SYNACK in answer to a (retransmitted) SYN. Prior to this patch, we were not counting these retransmits. Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS only if a synack packet was successfully queued. Reported-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: Vijay Subramanian <subramanian.vijay@gmail.com> Cc: Elliott Hughes <enh@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02net: Fix continued iteration in rtnl_bridge_getlink()Ben Hutchings1-16/+7
Commit e5a55a898720096f43bc24938f8875c0a1b34cd7 ('net: create generic bridge ops') broke the handling of a non-zero starting index in rtnl_bridge_getlink() (based on the old br_dump_ifinfo()). When the starting index is non-zero, we need to increment the current index for each entry that we are skipping. Also, we need to check the index before both cases, since we may previously have stopped iteration between getting information about a device from its master and from itself. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Tested-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02ipv6/multipath: remove flag NLM_F_EXCL after the first nexthopNicolas Dichtel1-0/+6
fib6_add_rt2node() will reject the nexthop if this flag is set, so we perform the check only for the first nexthop. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02eth: Rename and properly align br_reserved_address arrayBen Hutchings1-1/+1
Since this array is no longer part of the bridge driver, it should have an 'eth' prefix not 'br'. We also assume that either it's 16-bit-aligned or the architecture has efficient unaligned access. Ensure the first of these is true by explicitly aligning it. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02eth: Make is_link_local() consistent with other address testsBen Hutchings2-2/+2
Function name should include '_ether_addr'. Return type should be bool. Parameter name should be 'addr' not 'dest' (also matching kernel-doc). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02bridge: Use is_link_local() in store_group_addr()Ben Hutchings1-8/+3
Parse the string into an array of bytes rather than ints, so we can use is_link_local() rather than reimplementing it. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02skb: api to report errors for zero copy skbsMichael S. Tsirkin1-0/+20
Orphaning frags for zero copy skbs needs to allocate data in atomic context so is has a chance to fail. If it does we currently discard the skb which is safe, but we don't report anything to the caller, so it can not recover by e.g. disabling zero copy. Add an API to free skb reporting such errors: this is used by tun in case orphaning frags fails. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02skb: report completion status for zero copy skbsMichael S. Tsirkin1-2/+2
Even if skb is marked for zero copy, net core might still decide to copy it later which is somewhat slower than a copy in user context: besides copying the data we need to pin/unpin the pages. Add a parameter reporting such cases through zero copy callback: if this happens a lot, device can take this into account and switch to copying in user context. This patch updates all users but ignores the passed value for now: it will be used by follow-up patches. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01vlan: use IS_ENABLED()Amerigo Wang2-4/+4
#if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE) can be replaced by #if IS_ENABLED(CONFIG_FOO) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01ipv6: use IS_ENABLED()Amerigo Wang15-49/+46
#if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE) can be replaced by #if IS_ENABLED(CONFIG_FOO) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01rtnl/ipv4: use netconf msg to advertise rp_filter statusNicolas Dichtel1-0/+24
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01sk-filter: Add ability to get socket filter program (v2)Pavel Emelyanov2-0/+136
The SO_ATTACH_FILTER option is set only. I propose to add the get ability by using SO_ATTACH_FILTER in getsockopt. To be less irritating to eyes the SO_GET_FILTER alias to it is declared. This ability is required by checkpoint-restore project to be able to save full state of a socket. There are two issues with getting filter back. First, kernel modifies the sock_filter->code on filter load, thus in order to return the filter element back to user we have to decode it into user-visible constants. Fortunately the modification in question is interconvertible. Second, the BPF_S_ALU_DIV_K code modifies the command argument k to speed up the run-time division by doing kernel_k = reciprocal(user_k). Bad news is that different user_k may result in same kernel_k, so we can't get the original user_k back. Good news is that we don't have to do it. What we need to is calculate a user2_k so, that reciprocal(user2_k) == reciprocal(user_k) == kernel_k i.e. if it's re-loaded back the compiled again value will be exactly the same as it was. That said, the user2_k can be calculated like this user2_k = reciprocal(kernel_k) with an exception, that if kernel_k == 0, then user2_k == 1. The optlen argument is treated like this -- when zero, kernel returns the amount of sock_fprog elements in filter, otherwise it should be large enough for the sock_fprog array. changes since v1: * Declared SO_GET_FILTER in all arch headers * Added decode of vlan-tag codes Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-31Merge branch 'master' of ↵David S. Miller4-18/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== This series contains updates to ixgbe, ixgbevf, igbvf, igb and networking core (bridge). Most notably is the addition of support for local link multicast addresses in SR-IOV mode to the networking core. Also note, the ixgbe patch "ixgbe: Add support for pipeline reset" and "ixgbe: Fix return value from macvlan filter function" is revised based on community feedback. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-31net: filter: add vlan tag accessEric Dumazet1-0/+9
BPF filters lack ability to access skb->vlan_tci This patch adds two new ancillary accessors : SKF_AD_VLAN_TAG (44) mapped to vlan_tx_tag_get(skb) SKF_AD_VLAN_TAG_PRESENT (48) mapped to vlan_tx_tag_present(skb) This allows libpcap/tcpdump to use a kernel filter instead of having to fallback to accept all packets, then filter them in user space. Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Ani Sinha <ani@aristanetworks.com> Suggested-by: Daniel Borkmann <danborkmann@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-31tcp: make tcp_clear_md5_list staticstephen hemminger1-1/+1
Trivial. Only used in one file. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-31Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller16-247/+238
included changes: - some code cleanups and minor fixes (3 of them were reported by Coverity) - 'struct hard_iface' re-shaping to improve multi-protocol support - ECTP packets silent drop - transfer the WIFI flag on clients in case of roaming
2012-10-31net/ipv4/ipconfig: add device address to a KERN_INFO messageClaudio Fontana1-2/+4
adds a "hwaddr" to the "IP-Config: Complete" KERN_INFO message with the dev_addr of the device selected for auto configuration. Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-31ixgbe: add setlink, getlink support to ixgbe and ixgbevfJohn Fastabend1-0/+50
This adds support for the net device ops to manage the embedded hardware bridge on ixgbe devices. With this patch the bridge mode can be toggled between VEB and VEPA to support stacking macvlan devices or using the embedded switch without any SW component in 802.1Qbg/br environments. Additionally, this adds source address pruning to the ixgbevf driver to prune any frames sent back from a reflective relay on the switch. This is required because the existing hardware does not support this. Without it frames get pushed into the stack with its own src mac which is invalid per 802.1Qbg VEPA definition. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-31net: set and query VEB/VEPA bridge mode via PF_BRIDGEJohn Fastabend3-7/+84
Hardware switches may support enabling and disabling the loopback switch which puts the device in a VEPA mode defined in the IEEE 802.1Qbg specification. In this mode frames are not switched in the hardware but sent directly to the switch. SR-IOV capable NICs will likely support this mode I am aware of at least two such devices. Also I am told (but don't have any of this hardware available) that there are devices that only support VEPA modes. In these cases it is important at a minimum to be able to query these attributes. This patch adds an additional IFLA_BRIDGE_MODE attribute that can be set and dumped via the PF_BRIDGE:{SET|GET}LINK operations. Also anticipating bridge attributes that may be common for both embedded bridges and software bridges this adds a flags attribute IFLA_BRIDGE_FLAGS currently used to determine if the command or event is being generated to/from an embedded bridge or software bridge. Finally, the event generation is pulled out of the bridge module and into rtnetlink proper. For example using the macvlan driver in VEPA mode on top of an embedded switch requires putting the embedded switch into a VEPA mode to get the expected results. -------- -------- | VEPA | | VEPA | <-- macvlan vepa edge relays -------- -------- | | | | ------------------ | VEPA | <-- embedded switch in NIC ------------------ | | ------------------- | external switch | <-- shiny new physical ------------------- switch with VEPA support A packet sent from the macvlan VEPA at the top could be loopbacked on the embedded switch and never seen by the external switch. So in order for this to work the embedded switch needs to be set in the VEPA state via the above described commands. By making these attributes nested in IFLA_AF_SPEC we allow future extensions to be made as needed. CC: Lennert Buytenhek <buytenh@wantstofly.org> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-31net: create generic bridge opsJohn Fastabend4-60/+98
The PF_BRIDGE:RTM_{GET|SET}LINK nlmsg family and type are currently embedded in the ./net/bridge module. This prohibits them from being used by other bridging devices. One example of this being hardware that has embedded bridging components. In order to use these nlmsg types more generically this patch adds two net_device_ops hooks. One to set link bridge attributes and another to dump the current bride attributes. ndo_bridge_setlink() ndo_bridge_getlink() CC: Lennert Buytenhek <buytenh@wantstofly.org> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-29net, ixgbe: handle link local multicast addresses in SR-IOV modeJohn Fastabend4-18/+3
In SR-IOV mode the PF driver acts as the uplink port and is used to send control packets e.g. lldpad, stp, etc. eth0.1 eth0.2 eth0 VF VF PF | | | <-- stand-in for uplink | | | -------------------------- | Embedded Switch | -------------------------- | MAC <-- uplink But the embedded switch is setup to forward multicast addresses to all interfaces both VFs and PF and onto the physical link. This results in reserved MAC addresses used by control protocols to be forwarded over the switch onto the VF. In the LLDP case the PF sends an LLDPDU and it is currently being forwarded to all the VFs who then see the PF as a peer. This is incorrect. This patch adds the multicast addresses to the RAR table in the hardware to prevent this behavior. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-10-29batman-adv: add kernel-doc for enum batadv_dbg_levelAntonio Quartulli1-4/+11
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-10-29batman-adv: pass the WIFI flag from the local to global entryAntonio Quartulli2-9/+28
in case of client roaming a new global entry is added while a corresponding local one is still present. In this case the node can safely pass the WIFI flag from the local to the global entry. This change is required to let the AP-isolation correctly working in case of roaming: if a generic WIFI client C roams from node A to B, A adds a global entry for C without adding any WIFI flag. The latter will be set only later, once A has received C's advertisement from B. In this time period the AP-Isolation (if enabled) would not correctly work since C is not marked as WIFI, so allowing it to communicate with other WIFI clients. Signed-off-by: Antonio Quartulli <ordex@autistici.org>