summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2015-10-03tcp: move qlen/young out of struct listen_sockEric Dumazet3-8/+8
qlen_inc & young_inc were protected by listener lock, while qlen_dec & young_dec were atomic fields. Everything needs to be atomic for upcoming lockless listener. Also move qlen/young in request_sock_queue as we'll get rid of struct listen_sock eventually. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03tcp: add a spinlock to protect struct request_sock_queueEric Dumazet2-14/+8
struct request_sock_queue fields are currently protected by the listener 'lock' (not a real spinlock) We need to add a private spinlock instead, so that softirq handlers creating children do not have to worry with backlog notion that the listener 'lock' carries. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller14-74/+70
Conflicts: net/dsa/slave.c net/dsa/slave.c simply had overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds10-42/+76
Pull networking fixes from David Miller: 1) Fix regression in SKB partial checksum handling, from Pravin B Shalar. 2) Fix VLAN inside of VXLAN handling in i40e driver, from Jesse Brandeburg. 3) Cure softlockups during accept() in SCTP, from Karl Heiss. 4) MSG_PEEK should return multiple SKBs worth of data in AF_UNIX, from Aaron Conole. 5) IPV6 erroneously ignores output interface specifier in lookup key for route lookups, fix from David Ahern. 6) In Marvell DSA driver, forward unknown frames to CPU port, from Andrew Lunn. 7) Mission flow flag initializations in some code paths, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: Initialize flow flags in input path net: dsa: fix preparation of a port STP update testptp: Silence compiler warnings on ppc64 net/mlx4: Handle return codes in mlx4_qp_attach_common dsa: mv88e6xxx: Enable forwarding for unknown to the CPU port skbuff: Fix skb checksum partial check. net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set net sysfs: Print link speed as signed integer bna: fix error handling af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag af_unix: Convert the unix_sk macro to an inline function for type safety net: sctp: Don't use 64 kilobyte lookup table for four elements l2tp: protect tunnel->del_work by ref_count net/ibm/emac: bump version numbers for correct work with ethtool sctp: Prevent soft lockup when sctp_accept() is called during a timeout event sctp: Whitespace fix i40e/i40evf: check for stopped admin queue i40e: fix VLAN inside VXLAN r8169: fix handling rtl_readphy result net: hisilicon: fix handling platform_get_irq result
2015-10-01bridge: vlan: don't pass flags when creating context onlyNikolay Aleksandrov1-1/+1
We should not pass the original flags when creating a context vlan only because they may contain some flags that change behaviour in the bridge. The new global context should be with minimal set of flags, so pass 0 and let br_vlan_add() set the master flag only. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-01bridge: vlan: fix possible null ptr derefs on port init and deinitNikolay Aleksandrov2-7/+12
When a new port is being added we need to make vlgrp available after rhashtable has been initialized and when removing a port we need to flush the vlans and free the resources after we're sure noone can use the port, i.e. after it's removed from the port list and synchronize_rcu is executed. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-01bridge: vlan: move pvid inside net_bridge_vlan_groupNikolay Aleksandrov5-118/+75
One obvious way to converge more code (which was also used by the previous vlan code) is to move pvid inside net_bridge_vlan_group. This allows us to simplify some and remove other port-specific functions. Also gives us the ability to simply pass the vlan group and use all of the contained information. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-01bridge: vlan: fix possible null vlgrp deref while registering new portNikolay Aleksandrov1-1/+3
While a new port is being initialized the rx_handler gets set, but the vlans get initialized later in br_add_if() and in that window if we receive a frame with a link-local address we can try to dereference p->vlgrp in: br_handle_frame() -> br_handle_local_finish() -> br_should_learn() Fix this by checking vlgrp before using it. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-01bridge: vlan: adjust rhashtable initial size and hash locks sizeNikolay Aleksandrov1-0/+2
As Stephen pointed out the default initial size is more than we need, so let's start small (4 elements, thus nelem_hint = 3). Also limit the hash locks to the number of CPUs as we don't need any write-side scaling and this looks like the minimum. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-01Merge tag 'for-linus' of ↵Linus Torvalds5-35/+2
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: - Fixes for mlx5 related issues - Fixes for ipoib multicast handling * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/ipoib: increase the max mcast backlog queue IB/ipoib: Make sendonly multicast joins create the mcast group IB/ipoib: Expire sendonly multicast joins IB/mlx5: Remove pa_lkey usages IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY IB/iser: Add module parameter for always register memory xprtrdma: Replace global lkey with lkey local to PD
2015-09-29net: Initialize flow flags in input pathDavid Ahern2-0/+2
The fib_table_lookup tracepoint found 2 places where the flowi4_flags is not initialized. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller33-561/+491
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following pull request contains Netfilter/IPVS updates for net-next containing 90 patches from Eric Biederman. The main goal of this batch is to avoid recurrent lookups for the netns pointer, that happens over and over again in our Netfilter/IPVS code. The idea consists of passing netns pointer from the hook state to the relevant functions and objects where this may be needed. You can find more information on the IPVS updates from Simon Horman's commit merge message: c3456026adc0 ("Merge tag 'ipvs2-for-v4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next"). Exceptionally, this time, I'm not posting the patches again on netdev, Eric already Cc'ed this mailing list in the original submission. If you need me to make, just let me know. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: dsa: fix preparation of a port STP updateVivien Didelot1-3/+8
Because of the default 0 value of ret in dsa_slave_port_attr_set, a driver may return -EOPNOTSUPP from the commit phase of a STP state, which triggers a WARN() from switchdev. This happened on a 6185 switch which does not support hardware bridging. Fixes: 3563606258cf ("switchdev: convert STP update to switchdev attr set") Reported-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: dsa: fix preparation of a port STP updateVivien Didelot1-3/+8
Because of the default 0 value of ret in dsa_slave_port_attr_set, a driver may return -EOPNOTSUPP from the commit phase of a STP state, which triggers a WARN() from switchdev. This happened on a 6185 switch which does not support hardware bridging. Reported-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: Add support for filtering neigh dump by master deviceDavid Ahern1-1/+31
Add support for filtering neighbor dumps by master device by adding the NDA_MASTER attribute to the dump request. A new netlink flag, NLM_F_DUMP_FILTERED, is added to indicate the kernel supports the request and output is filtered as requested. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29tcp: fix tcp_v6_md5_do_lookup prototypeEric Dumazet1-1/+1
tcp_v6_md5_do_lookup() now takes a const socket, even if CONFIG_TCP_MD5SIG is not set. Fixes: b83e3deb974c ("tcp: md5: constify tcp_md5_do_lookup() socket argument") From: Eric Dumazet <edumazet@google.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: switchdev: abstract object in add/del opsVivien Didelot4-100/+77
Similar to the notifier_call callback of a notifier_block, change the function signature of switchdev add and del operations to: int switchdev_port_obj_add/del(struct net_device *dev, enum switchdev_obj_id id, void *obj); This allows the caller to pass a specific switchdev_obj_* structure instead of the generic switchdev_obj one. Drivers implementation of these operations and switchdev have been changed accordingly. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: switchdev: pass callback to dump operationVivien Didelot2-35/+36
Similar to the notifier_call callback of a notifier_block, change the function signature of switchdev dump operation to: int switchdev_port_obj_dump(struct net_device *dev, enum switchdev_obj_id id, void *obj, int (*cb)(void *obj)); This allows the caller to pass and expect back a specific switchdev_obj_* structure instead of the generic switchdev_obj one. Drivers implementation of dump operation can now expect this specific structure and call the callback with it. Drivers have been changed accordingly. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: switchdev: remove dev from switchdev_obj cbVivien Didelot2-6/+4
The net_device associated to a dump operation does not have to be passed to the callback. switchdev stores it in a superset struct, if needed. Also some drivers (such as DSA drivers) may not have easy access to it. This will simplify pushing the callback function down to the drivers. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: switchdev: move dev in switchdev_fdb_dumpVivien Didelot1-1/+3
The FDB dump callback requires the related net_device so move it to the struct switchdev_fdb_dump superset instead of using a callback param. With this done, it'll be simpler to change the dump function signature. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: switchdev: remove dev in port_vlan_dump_putVivien Didelot1-6/+5
The static switchdev_port_vlan_dump_put function does not need the net_device parameter, so remove it. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: Replace calls to vrf_dev_get_rthDavid Ahern1-5/+3
Replace calls to vrf_dev_get_rth with l3mdev_get_rtable. The check on the flow flags is handled in the l3mdev operation. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: Replace vrf_dev_table and friendsDavid Ahern2-6/+5
Replace calls to vrf_dev_table and friends with l3mdev_fib_table and kin. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: Replace vrf_master_ifindex{, _rcu} with l3mdev equivalentsDavid Ahern6-22/+20
Replace calls to vrf_master_ifindex_rcu and vrf_master_ifindex with either l3mdev_master_ifindex_rcu or l3mdev_master_ifindex. The pattern: oif = vrf_master_ifindex(dev) ? : dev->ifindex; is replaced with oif = l3mdev_fib_oif(dev); And remove the now unused vrf macros. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: Introduce L3 Master device abstractionDavid Ahern5-0/+111
L3 master devices allow users of the abstraction to influence FIB lookups for enslaved devices. Current API provides a means for the master device to return a specific FIB table for an enslaved device, to return an rtable/custom dst and influence the OIF used for fib lookups. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTERDavid Ahern3-3/+3
Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER and update the name of the netif_is_vrf and netif_index_is_vrf macros. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29tcp: prepare fastopen code for upcoming listener changesEric Dumazet7-37/+29
While auditing TCP stack for upcoming 'lockless' listener changes, I found I had to change fastopen_init_queue() to properly init the object before publishing it. Otherwise an other cpu could try to lock the spinlock before it gets properly initialized. Instead of adding appropriate barriers, just remove dynamic memory allocations : - Structure is 28 bytes on 64bit arches. Using additional 8 bytes for holding a pointer seems overkill. - Two listeners can share same cache line and performance would suffer. If we really want to save few bytes, we would instead dynamically allocate whole struct request_sock_queue in the future. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29tcp: constify tcp_syn_flood_action() socket argumentEric Dumazet1-4/+5
tcp_syn_flood_action() will soon be called with unlocked socket. In order to avoid SYN flood warning being emitted multiple times, use xchg(). Extend max_qlen_log and synflood_warned fields in struct listen_sock to u32 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29tcp: constify tcp_v{4|6}_route_req() sock argumentEric Dumazet2-2/+4
These functions do not change the listener socket. Goal is to make sure tcp_conn_request() is not messing with listener in a racy way. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29tcp: cookie_init_sequence() cleanupsEric Dumazet2-9/+2
Some common IPv4/IPv6 code can be factorized. Also constify cookie_init_sequence() socket argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29tcp/dccp: constify syn_recv_sock() method sock argumentEric Dumazet5-7/+10
We'll soon no longer hold listener socket lock, these functions do not modify the socket in any way. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29tcp: constify tcp_create_openreq_child() socket argumentEric Dumazet1-1/+3
This method does not touch the listener socket. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29dccp: constify dccp_create_openreq_child() sock argumentEric Dumazet2-2/+2
socket no longer needs to be read/write Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29inet: constify __inet_inherit_port() sock argumentEric Dumazet1-1/+1
socket is not touched, make it const. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29inet: constify inet_csk_route_child_sock() socket argumentEric Dumazet1-1/+1
The socket points to the (shared) listener. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29dccp: use inet6_csk_route_req() helperEric Dumazet3-20/+12
Before changing dccp_v6_request_recv_sock() sock argument to const, we need to get rid of security_sk_classify_flow(), and it seems doable by reusing inet6_csk_route_req() helper. We need to add a proto parameter to inet6_csk_route_req(), not assume it is TCP. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29tcp: remove tcp_rcv_state_process() tcp_hdr argumentEric Dumazet4-5/+5
Factorize code to get tcp header from skb. It makes no sense to duplicate code in callers. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29tcp: remove unused len argument from tcp_rcv_state_process()Eric Dumazet4-7/+6
Once we realize tcp_rcv_synsent_state_process() does not use its 'len' argument and we get rid of it, then it becomes clear this argument is no longer used in tcp_rcv_state_process() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29tcp/dccp: constify send_synack and send_reset socket argumentEric Dumazet6-12/+12
None of these functions need to change the socket, make it const. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29skbuff: Fix skb checksum partial check.Pravin B Shelar1-4/+5
Earlier patch 6ae459bda tried to detect void ckecksum partial skb by comparing pull length to checksum offset. But it does not work for all cases since checksum-offset depends on updates to skb->data. Following patch fixes it by validating checksum start offset after skb-data pointer is updated. Negative value of checksum offset start means there is no need to checksum. Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull") Reported-by: Andrew Vagin <avagin@odin.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: Remove martian_source_keep_err goto labelDavid Ahern1-4/+2
err is initialized to -EINVAL when it is declared. It is not reset until fib_lookup which is well after the 3 users of the martian_source jump. So resetting err to -EINVAL at martian_source label is not needed. Removing that line obviates the need for the martian_source_keep_err label so delete it. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: Swap ordering of tests in ip_route_input_mcAlexander Duyck1-3/+2
This patch just swaps the ordering of one of the conditional tests in ip_route_input_mc. Specifically it swaps the testing for the source address to see if it is loopback, and the test to see if we allow a loopback source address. The reason for swapping these two tests is because it is much faster to test if an address is loopback than it is to dereference several pointers to get at the net structure to see if the use of loopback is allowed. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net/ipv4: Pass proto as u8 instead of u16 in ip_check_mc_rcuAlexander Duyck1-1/+1
This patch updates ip_check_mc_rcu so that protocol is passed as a u8 instead of a u16. The motivation is just to avoid any unneeded type transitions since some systems will require an instruction to zero extend a u8 field to a u16. Also it makes it a bit more readable as to the fact that protocol is a u8 so there are no byte ordering changes needed to pass it. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is setDavid Ahern1-1/+2
Wolfgang reported that IPv6 stack is ignoring oif in output route lookups: With ipv6, ip -6 route get always returns the specific route. $ ip -6 r 2001:db8:e2::1 dev enp2s0 proto kernel metric 256 2001:db8:e2::/64 dev enp2s0 metric 1024 2001:db8:e3::1 dev enp3s0 proto kernel metric 256 2001:db8:e3::/64 dev enp3s0 metric 1024 fe80::/64 dev enp3s0 proto kernel metric 256 default via 2001:db8:e3::255 dev enp3s0 metric 1024 $ ip -6 r get 2001:db8:e2::100 2001:db8:e2::100 from :: dev enp2s0 src 2001:db8:e3::1 metric 0 cache $ ip -6 r get 2001:db8:e2::100 oif enp3s0 2001:db8:e2::100 from :: dev enp2s0 src 2001:db8:e3::1 metric 0 cache The stack does consider the oif but a mismatch in rt6_device_match is not considered fatal because RT6_LOOKUP_F_IFACE is not set in the flags. Cc: Wolfgang Nothdurft <netdev@linux-dude.de> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29netpoll: Drop budget parameter from NAPI polling call hierarchyAlexander Duyck1-12/+11
For some reason we were carrying the budget value around between the various calls to napi->poll. If for example one of the drivers called had a bug in which it returned a non-zero value for work this could result in the budget value becoming negative. Rather than carry around a value of budget that is 0 or less we can instead just loop through and pass 0 to each napi->poll call. If any driver returns a value for work done that is non-zero then we can report that driver and continue rather than allowing a bad actor to make the budget value negative and pass that negative value to napi->poll. Note, the only actual change here is that instead of letting budget become negative we are keeping it at 0 regardless of the value returned for work since it should not be possible for the polling routine to do any actual work with a budget of 0. So if the polling routine returns a non-0 value we are just reporting it and continuing with a budget of 0 rather than letting that work value be subtracted from the budget of 0. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net sysfs: Print link speed as signed integerAlexander Stein1-2/+1
Otherwise 4294967295 (MBit/s) (-1) will be printed when there is no link. Documentation/ABI/testing/sysfs-class-net does not state if this shall be signed or unsigned. Also remove the now unused variable fmt_udec. Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29af_unix: return data from multiple SKBs on recv() with MSG_PEEK flagAaron Conole1-1/+14
AF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag is set. This is referenced in kernel bugzilla #12323 @ https://bugzilla.kernel.org/show_bug.cgi?id=12323 As described both in the BZ and lkml thread @ http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an AF_UNIX socket only reads a single skb, where the desired effect is to return as much skb data has been queued, until hitting the recv buffer size (whichever comes first). The modified MSG_PEEK path will now move to the next skb in the tree and jump to the again: label, rather than following the natural loop structure. This requires duplicating some of the loop head actions. This was tested using the python socketpair python code attached to the bugzilla issue. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29bridge: vlan: add per-vlan struct and move to rhashtablesNikolay Aleksandrov8-460/+731
This patch changes the bridge vlan implementation to use rhashtables instead of bitmaps. The main motivation behind this change is that we need extensible per-vlan structures (both per-port and global) so more advanced features can be introduced and the vlan support can be extended. I've tried to break this up but the moment net_port_vlans is changed and the whole API goes away, thus this is a larger patch. A few short goals of this patch are: - Extensible per-vlan structs stored in rhashtables and a sorted list - Keep user-visible behaviour (compressed vlans etc) - Keep fastpath ingress/egress logic the same (optimizations to come later) Here's a brief list of some of the new features we'd like to introduce: - per-vlan counters - vlan ingress/egress mapping - per-vlan igmp configuration - vlan priorities - avoid fdb entries replication (e.g. local fdb scaling issues) The structure is kept single for both global and per-port entries so to avoid code duplication where possible and also because we'll soon introduce "port0 / aka bridge as port" which should simplify things further (thanks to Vlad for the suggestion!). Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port rhashtable, if an entry is added to a port it'll get a pointer to its global context so it can be quickly accessed later. There's also a sorted vlan list which is used for stable walks and some user-visible behaviour such as the vlan ranges, also for error paths. VLANs are stored in a "vlan group" which currently contains the rhashtable, sorted vlan list and the number of "real" vlan entries. A good side-effect of this change is that it resembles how hw keeps per-vlan data. One important note after this change is that if a VLAN is being looked up in the bridge's rhashtable for filtering purposes (or to check if it's an existing usable entry, not just a global context) then the new helper br_vlan_should_use() needs to be used if the vlan is found. In case the lookup is done only with a port's vlan group, then this check can be skipped. Things tested so far: - basic vlan ingress/egress - pvids - untagged vlans - undef CONFIG_BRIDGE_VLAN_FILTERING - adding/deleting vlans in different scenarios (with/without global ctx, while transmitting traffic, in ranges etc) - loading/removing the module while having/adding/deleting vlans - extracting bridge vlan information (user ABI), compressed requests - adding/deleting fdbs on vlans - bridge mac change, promisc mode - default pvid change - kmemleak ON during the whole time Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29bridge: Pass net into br_validate_ipv4 and br_validate_ipv6Eric W. Biederman2-16/+14
The network namespace is easiliy available in state->net so use it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-29ipv6: Pass struct net into ip6_route_me_harderEric W. Biederman5-7/+6
Don't make ip6_route_me_harder guess which network namespace it is routing in, pass the network namespace in. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>