summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2015-08-03bridge: mdb: fix vlan_enabled access when vlans are not configuredNikolay Aleksandrov1-2/+2
Instead of trying to access br->vlan_enabled directly use the provided helper br_vlan_enabled(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-03act_bpf: properly support late binding of bpf action to a classifierDaniel Borkmann1-24/+27
Since the introduction of the BPF action in d23b8ad8ab23 ("tc: add BPF based action"), late binding was not working as expected. I.e. setting the action part for a classifier only via 'bpf index <num>', where <num> is the index of an existing action, is being rejected by the kernel due to other missing parameters. It doesn't make sense to require these parameters such as BPF opcodes etc, as they are not going to be used anyway: in this case, they're just allocated/parsed and then freed again w/o doing anything meaningful. Instead, parse and verify the remaining parameters *after* the test on tcf_hash_check(), when we really know that we're dealing with creation of a new action or replacement of an existing one and where late binding is thus irrelevant. After patch, test case is now working: FOO="1,6 0 0 4294967295," tc actions add action bpf bytecode "$FOO" tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action bpf index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 2 bind 1 tc filter show dev foo filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 flowid 1:1 bytecode '1,6 0 0 4294967295' action order 1: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 2 bind 1 Late binding of a BPF action can be useful for preloading maps (e.g. before they hit traffic) in case of eBPF programs, or to share a single eBPF action with multiple classifiers. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-03bridge: mdb: add/del entry on all vlans if vlan_filter is enabled and vid is 0Satish Ashok1-8/+60
Before this patch when a vid was not specified, the entry was added with vid 0 which is useless when vlan_filtering is enabled. This patch makes the entry to be added on all configured vlans when vlan filtering is enabled and respectively deleted from all, if the entry vid is 0. This is also closer to the way fdb works with regard to vid 0 and vlan filtering. Example: Setup: $ bridge vlan add vid 256 dev eth4 $ bridge vlan add vid 1024 dev eth4 $ bridge vlan add vid 64 dev eth3 $ bridge vlan add vid 128 dev eth3 $ bridge vlan port vlan ids eth3 1 PVID Egress Untagged 64 128 eth4 1 PVID Egress Untagged 256 1024 $ echo 1 > /sys/class/net/br0/bridge/vlan_filtering Before: $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 $ bridge mdb dev br0 port eth3 grp 239.0.0.1 temp After: $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 $ bridge mdb dev br0 port eth3 grp 239.0.0.1 temp vid 1 dev br0 port eth3 grp 239.0.0.1 temp vid 128 dev br0 port eth3 grp 239.0.0.1 temp vid 64 Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-03bridge: Don't segment multiple tagged packets on bridge deviceToshiaki Makita1-0/+1
Bridge devices don't need to segment multiple tagged packets since thier ports can segment them. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-02ebpf: add skb->hash to offset map for usage in {cls, act}_bpf or filtersDaniel Borkmann1-0/+7
Add skb->hash to the __sk_buff offset map, so it can be accessed from an eBPF program. We currently already do this for classic BPF filters, but not yet on eBPF, it might be useful as a demuxer in combination with helpers like bpf_clone_redirect(), toy example: __section("cls-lb") int ingress_main(struct __sk_buff *skb) { unsigned int which = 3 + (skb->hash & 7); /* bpf_skb_store_bytes(skb, ...); */ /* bpf_l{3,4}_csum_replace(skb, ...); */ bpf_clone_redirect(skb, which, 0); return -1; } I was thinking whether to add skb_get_hash(), but then concluded the raw skb->hash seems fine in this case: we can directly access the hash w/o extra eBPF helper function call, it's filled out by many NICs on ingress, and in case the entropy level would not be sufficient, people can still implement their own specific sw fallback hash mix anyway. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller47-242/+427
Conflicts: arch/s390/net/bpf_jit_comp.c drivers/net/ethernet/ti/netcp_ethss.c net/bridge/br_multicast.c net/ipv4/ip_fragment.c All four conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds43-228/+402
Pull networking fixes from David Miller: 1) Must teardown SR-IOV before unregistering netdev in igb driver, from Alex Williamson. 2) Fix ipv6 route unreachable crash in IPVS, from Alex Gartrell. 3) Default route selection in ipv4 should take the prefix length, table ID, and TOS into account, from Julian Anastasov. 4) sch_plug must have a reset method in order to purge all buffered packets when the qdisc is reset, likewise for sch_choke, from WANG Cong. 5) Fix deadlock and races in slave_changelink/br_setport in bridging. From Nikolay Aleksandrov. 6) mlx4 bug fixes (wrong index in port even propagation to VFs, overzealous BUG_ON assertion, etc.) from Ido Shamay, Jack Morgenstein, and Or Gerlitz. 7) Turn off klog message about SCTP userspace interface compat that makes no sense at all, from Daniel Borkmann. 8) Fix unbounded restarts of inet frag eviction process, causing NMI watchdog soft lockup messages, from Florian Westphal. 9) Suspend/resume fixes for r8152 from Hayes Wang. 10) Fix busy loop when MSG_WAITALL|MSG_PEEK is used in TCP recv, from Sabrina Dubroca. 11) Fix performance regression when removing a lot of routes from the ipv4 routing tables, from Alexander Duyck. 12) Fix device leak in AF_PACKET, from Lars Westerhoff. 13) AF_PACKET also has a header length comparison bug due to signedness, from Alexander Drozdov. 14) Fix bug in EBPF tail call generation on x86, from Daniel Borkmann. 15) Memory leaks, TSO stats, watchdog timeout and other fixes to thunderx driver from Sunil Goutham and Thanneeru Srinivasulu. 16) act_bpf can leak memory when replacing programs, from Daniel Borkmann. 17) WOL packet fixes in gianfar driver, from Claudiu Manoil. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits) stmmac: fix missing MODULE_LICENSE in stmmac_platform gianfar: Enable device wakeup when appropriate gianfar: Fix suspend/resume for wol magic packet gianfar: Fix warning when CONFIG_PM off act_pedit: check binding before calling tcf_hash_release() net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket net: sched: fix refcount imbalance in actions r8152: reset device when tx timeout r8152: add pre_reset and post_reset qlcnic: Fix corruption while copying act_bpf: fix memory leaks when replacing bpf programs net: thunderx: Fix for crash while BGX teardown net: thunderx: Add PCI driver shutdown routine net: thunderx: Fix crash when changing rss with mutliple traffic flows net: thunderx: Set watchdog timeout value net: thunderx: Wakeup TXQ only if CQE_TX are processed net: thunderx: Suppress alloc_pages() failure warnings net: thunderx: Fix TSO packet statistic net: thunderx: Fix memory leak when changing queue count net: thunderx: Fix RQ_DROP miscalculation ...
2015-07-31ipv6: Disable flowlabel state ranges by defaultTom Herbert1-1/+1
Per RFC6437 stateful flow labels (e.g. labels set by flow label manager) cannot "disturb" nodes taking part in stateless flow labels. While the ranges only reduce the flow label entropy by one bit, it is conceivable that this might bias the algorithm on some routers causing a load imbalance. For best results on the Internet we really need the full 20 bits. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31ipv6: Implement different admin modes for automatic flow labelsTom Herbert4-5/+11
Change the meaning of net.ipv6.auto_flowlabels to provide a mode for automatic flow labels generation. There are four modes: 0: flow labels are disabled 1: flow labels are enabled, sockets can opt-out 2: flow labels are allowed, sockets can opt-in 3: flow labels are enabled and enforced, no opt-out for sockets np->autoflowlabel is initialized according to the sysctl value. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabelTom Herbert3-5/+6
We can't call skb_get_hash here since the packet is not complete to do flow_dissector. Create hash based on flowi6 instead. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31net: Add functions to get skb->hash based on flow structuresTom Herbert1-4/+54
Add skb_get_hash_flowi6 and skb_get_hash_flowi4 which derive an sk_buff hash from flowi6 and flowi4 structures respectively. These functions can be called when creating a packet in the output path where the new sk_buff does not yet contain a fully formed packet that is parsable by flow dissector. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31net: dsa: Add netconsole supportFlorian Fainelli2-0/+71
Add support for using DSA slave network devices with netconsole, which requires us to allocate and free custom netpoll instances and invoke the parent network device poll controller callback. In order for netconsole to work, we need to construct the DSA tag, but not queue the skb for transmission on the master network device xmit function. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31net: dsa: Refactor transmit path to eliminate duplicationFlorian Fainelli6-49/+33
All tagging protocols do the same thing: increment device statistics, make room for the tag to be inserted, create the tag, invoke the parent network device transmit function. In order to prepare for adding netpoll support, which requires the tag creation, but not using the parent network device transmit function, do some little refactoring which eliminates duplication between the 4 tagging protocols supported. We need to return a sk_buff pointer back to the caller because the tag specific transmit function may have to reallocate the original skb (e.g: tag_trailer.c) and this is the one we should be transmitting, not the original sk_buff we were passed. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31br2684: Remove unnecessary formatting macros b1 and bsJoe Perches1-6/+3
Use vsprintf extension %pI4 instead. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31act_pedit: check binding before calling tcf_hash_release()WANG Cong1-3/+2
When we share an action within a filter, the bind refcnt should increase, therefore we should not call tcf_hash_release(). Fixes: 1a29321ed045 ("net_sched: act: Dont increment refcnt on replace") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31af_mpls: fix undefined reference to ip6_route_outputRoopa Prabhu1-8/+33
Undefined reference to ip6_route_output and ip_route_output was reported with CONFIG_INET=n and CONFIG_IPV6=n. This patch uses ipv6_stub_impl.ipv6_dst_lookup instead of ip6_route_output. And wraps affected code under IS_ENABLED(CONFIG_INET) and IS_ENABLED(CONFIG_IPV6). Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argumentRoopa Prabhu4-11/+21
This patch adds net argument to ipv6_stub_impl.ipv6_dst_lookup for use cases where sk is not available (like mpls). sk appears to be needed to get the namespace 'net' and is optional otherwise. This patch series changes ipv6_stub_impl.ipv6_dst_lookup to take net argument. sk remains optional. All callers of ipv6_stub_impl.ipv6_dst_lookup have been modified to pass net. I have modified them to use already available 'net' in the scope of the call. I can change them to sock_net(sk) to avoid any unintended change in behaviour if sock namespace is different. They dont seem to be from code inspection. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31bpf: add helpers to access tunnel metadataAlexei Starovoitov2-6/+106
Introduce helpers to let eBPF programs attached to TC manipulate tunnel metadata: bpf_skb_[gs]et_tunnel_key(skb, key, size, flags) skb: pointer to skb key: pointer to 'struct bpf_tunnel_key' size: size of 'struct bpf_tunnel_key' flags: room for future extensions First eBPF program that uses these helpers will allocate per_cpu metadata_dst structures that will be used on TX. On RX metadata_dst is allocated by tunnel driver. Typical usage for TX: struct bpf_tunnel_key tkey; ... populate tkey ... bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), 0); bpf_clone_redirect(skb, vxlan_dev_ifindex, 0); RX: struct bpf_tunnel_key tkey = {}; bpf_skb_get_tunnel_key(skb, &tkey, sizeof(tkey), 0); ... lookup or redirect based on tkey ... 'struct bpf_tunnel_key' will be extended in the future by adding elements to the end and the 'size' argument will indicate which fields are populated, thereby keeping backwards compatibility. The 'flags' argument may be used as well when the 'size' is not enough or to indicate completely different layout of bpf_tunnel_key. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: clean up link creationJon Paul Maloy4-121/+86
We simplify the link creation function tipc_link_create() and the way the link struct it is connected to the node struct. In particular, we remove the duplicate initialization of some fields which are anyway set in tipc_link_reset(). Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: use temporary, non-protected skb queue for bundle receptionJon Paul Maloy1-15/+19
Currently, when we extract small messages from a message bundle, or when many messages have accumulated in the link arrival queue, those messages are added one by one to the lock protected link input queue. This may increase contention with the reader of that queue, in the function tipc_sk_rcv(). This commit introduces a temporary, unprotected input queue in tipc_link_rcv() for such cases. Only when the arrival queue has been emptied, and the function is ready to return, does it splice the whole temporary queue into the real input queue. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: remove implicit message delivery in node_unlock()Jon Paul Maloy4-63/+10
After the most recent changes, all access calls to a link which may entail addition of messages to the link's input queue are postpended by an explicit call to tipc_sk_rcv(), using a reference to the correct queue. This means that the potentially hazardous implicit delivery, using tipc_node_unlock() in combination with a binary flag and a cached queue pointer, now has become redundant. This commit removes this implicit delivery mechanism both for regular data messages and for binding table update messages. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: make resetting of links non-atomicJon Paul Maloy3-70/+127
In order to facilitate future improvements to the locking structure, we want to make resetting and establishing of links non-atomic. I.e., the functions tipc_node_link_up() and tipc_node_link_down() should be called from outside the node lock context, and grab/release the node lock themselves. This requires that we can freeze the link state from the moment it is set to RESETTING or PEER_RESET in one lock context until it is set to RESET or ESTABLISHING in a later context. The recently introduced link FSM makes this possible, so we are now ready to introduce the above change. This commit implements this. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: move received discovery data evaluation inside node.cJon Paul Maloy3-123/+127
The node lock is currently grabbed and and released in the function tipc_disc_rcv() in the file discover.c. As a preparation for the next commits, we need to move this node lock handling, along with the code area it is covering, to node.c. This commit introduces this change. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: merge link->exec_mode and link->state into one FSMJon Paul Maloy3-180/+226
Until now, we have been handling link failover and synchronization by using an additional link state variable, "exec_mode". This variable is not independent of the link FSM state, something causing a risk of inconsistencies, apart from the fact that it clutters the code. The conditions are now in place to define a new link FSM that covers all existing use cases, including failover and synchronization, and eliminate the "exec_mode" field altogether. The FSM must also support non-atomic resetting of links, which will be introduced later. The new link FSM is shown below, with 7 states and 8 events. Only events leading to state change are shown as edges. +------------------------------------+ |RESET_EVT | | | | +--------------+ | +-----------------| SYNCHING |-----------------+ | |FAILURE_EVT +--------------+ PEER_RESET_EVT| | | A | | | | | | | | | | | | | | |SYNCH_ |SYNCH_ | | | |BEGIN_EVT |END_EVT | | | | | | | V | V V | +-------------+ +--------------+ +------------+ | | RESETTING |<---------| ESTABLISHED |--------->| PEER_RESET | | +-------------+ FAILURE_ +--------------+ PEER_ +------------+ | | EVT | A RESET_EVT | | | | | | | | | | | | | +--------------+ | | | RESET_EVT| |RESET_EVT |ESTABLISH_EVT | | | | | | | | | | | | V V | | | +-------------+ +--------------+ RESET_EVT| +--->| RESET |--------->| ESTABLISHING |<----------------+ +-------------+ PEER_ +--------------+ | A RESET_EVT | | | | | | | |FAILOVER_ |FAILOVER_ |FAILOVER_ |BEGIN_EVT |END_EVT |BEGIN_EVT | | | V | | +-------------+ | | FAILINGOVER |<----------------+ +-------------+ These changes are fully backwards compatible. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: move protocol message sending away from link FSMJon Paul Maloy3-21/+33
The implementation of the link FSM currently takes decisions about and sends out link protocol messages. This is unnecessary, since such actions are not the result of any link state change, and are even decided based on non-FSM state information ("silent_intv_cnt"). We now move the sending of unicast link protocol messages to the function tipc_link_timeout(), and the initial broadcast synchronization message to tipc_node_link_up(). The latter is done because a link instance should not need to know whether it is the first or second link to a destination. Such information is now restricted to and handled by the link aggregation layer in node.c Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: move link synch and failover to link aggregation levelJon Paul Maloy5-508/+342
Link failover and synchronization have until now been handled by the links themselves, forcing them to have knowledge about and to access parallel links in order to make the two algorithms work correctly. In this commit, we move the control part of this functionality to the link aggregation level in node.c, which is the right location for this. As a result, the two algorithms become easier to follow, and the link implementation becomes simpler. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: extend node FSMJon Paul Maloy2-11/+92
In the next commit, we will move link synch/failover orchestration to the link aggregation level. In order to do this, we first need to extend the node FSM with two more states, NODE_SYNCHING and NODE_FAILINGOVER, plus four new events to enter and leave those states. This commit introduces this change, without yet making use of it. The node FSM now looks as follows: +-----------------------------------------+ | PEER_DOWN_EVT| | | +------------------------+----------------+ | |SELF_DOWN_EVT | | | | | | | | +-----------+ +-----------+ | | |NODE_ | |NODE_ | | | +----------|FAILINGOVER|<---------|SYNCHING |------------+ | | |SELF_ +-----------+ FAILOVER_+-----------+ PEER_ | | | |DOWN_EVT | A BEGIN_EVT A | DOWN_EVT| | | | | | | | | | | | | | | | | | | | |FAILOVER_|FAILOVER_ |SYNCH_ |SYNCH_ | | | | |END_EVT |BEGIN_EVT |BEGIN_EVT|END_EVT | | | | | | | | | | | | | | | | | | | | | +--------------+ | | | | | +------->| SELF_UP_ |<-------+ | | | | +----------------| PEER_UP |------------------+ | | | | |SELF_DOWN_EVT +--------------+ PEER_DOWN_EVT| | | | | | A A | | | | | | | | | | | | | | PEER_UP_EVT| |SELF_UP_EVT | | | | | | | | | | | V V V | | V V V +------------+ +-----------+ +-----------+ +------------+ |SELF_DOWN_ | |SELF_UP_ | |PEER_UP_ | |PEER_DOWN | |PEER_LEAVING|<------|PEER_COMING| |SELF_COMING|------>|SELF_LEAVING| +------------+ SELF_ +-----------+ +-----------+ PEER_ +------------+ | DOWN_EVT A A DOWN_EVT | | | | | | | | | | SELF_UP_EVT| |PEER_UP_EVT | | | | | | | | | |PEER_DOWN_EVT +--------------+ SELF_DOWN_EVT| +------------------->| SELF_DOWN_ |<--------------------+ | PEER_DOWN | +--------------+ Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: reverse call order for link_reset()->node_link_down()Jon Paul Maloy2-14/+22
In many cases the call order when a link is reset goes as follows: tipc_node_xx()->tipc_link_reset()->tipc_node_link_down() This is not the right order if we want the node to be in control, so in this commit we change the order to: tipc_node_xx()->tipc_node_link_down()->tipc_link_reset() The fact that tipc_link_reset() now is called from only one location with a well-defined state will also facilitate later simplifications of tipc_link_reset() and the link FSM. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: move all link_reset() calls to link aggregation levelJon Paul Maloy5-69/+104
In line with our effort to let the node level have full control over its links, we want to move all link reset calls from link.c to node.c. Some of the calls can be moved by simply moving the calling function, when this is the right thing to do. For the remaining calls we use the now established technique of returning a TIPC_LINK_DOWN_EVT flag from tipc_link_rcv(), whereafter we perform the reset call when the call returns. This change serves as a preparation for the coming commits. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: eliminate function tipc_link_activate()Jon Paul Maloy3-16/+8
The function tipc_link_activate() is redundant, since it mostly performs settings that have already been done in a preceding tipc_link_reset(). There are three exceptions to this: - The actual state change to TIPC_LINK_WORKING. This should anyway be done in the FSM, and not in a separate function. - Registration of the link with the bearer. This should be done by the node, since we don't want the link to have any knowledge about its specific bearer. - Call to tipc_node_link_up() for user access registration. With the new role distribution between link aggregation and link level this becomes the wrong call order; tipc_node_link_up() should instead be called directly as a result of a TIPC_LINK_UP event, hence by the node itself. This commit implements those changes. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30Merge branch 'for-upstream' of ↵David S. Miller23-214/+461
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-07-30 Here's a set of Bluetooth & 802.15.4 patches intended for the 4.3 kernel. - Cleanups & fixes to mac802154 - Refactoring of Intel Bluetooth HCI driver - Various coding style fixes to Bluetooth HCI drivers - Support for Intel Lightning Peak Bluetooth devices - Generic class code in interface descriptor in btusb to match more HW - Refactoring of Bluetooth HS code together with a new config option - Support for BCM4330B1 Broadcom UART controller Let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30net: sk_clone_lock() should only do get_net() if the parent is not a kernel ↵Sowmini Varadhan1-1/+2
socket The newsk returned by sk_clone_lock should hold a get_net() reference if, and only if, the parent is not a kernel socket (making this similar to sk_alloc()). E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock->..inet_csk_clone_lock sets up the syn_recv newsk from sk_clone_lock. When the parent (listen) socket is a kernel socket (defined in sk_alloc() as having sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt and should not hold a get_net() reference. Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Acked-by: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30net/ipv6: add sysctl option accept_ra_min_hop_limitHangbin Liu2-9/+17
Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface") disabled accept hop limit from RA if it is smaller than the current hop limit for security stuff. But this behavior kind of break the RFC definition. RFC 4861, 6.3.4. Processing Received Router Advertisements A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time, and Retrans Timer) may contain a value denoting that it is unspecified. In such cases, the parameter should be ignored and the host should continue using whatever value it is already using. If the received Cur Hop Limit value is non-zero, the host SHOULD set its CurHopLimit variable to the received value. So add sysctl option accept_ra_min_hop_limit to let user choose the minimum hop limit value they can accept from RA. And set default to 1 to meet RFC standards. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30net: sched: fix refcount imbalance in actionsDaniel Borkmann1-5/+6
Since commit 55334a5db5cd ("net_sched: act: refuse to remove bound action outside"), we end up with a wrong reference count for a tc action. Test case 1: FOO="1,6 0 0 4294967295," BAR="1,6 0 0 4294967294," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \ action bpf bytecode "$FOO" tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 1 bind 1 tc actions replace action bpf bytecode "$BAR" index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe index 1 ref 2 bind 1 tc actions replace action bpf bytecode "$FOO" index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 3 bind 1 Test case 2: FOO="1,6 0 0 4294967295," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 1 bind 1 tc actions add action drop index 1 RTNETLINK answers: File exists [...] tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 2 bind 1 tc actions add action drop index 1 RTNETLINK answers: File exists [...] tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 3 bind 1 What happens is that in tcf_hash_check(), we check tcf_common for a given index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've found an existing action. Now there are the following cases: 1) We do a late binding of an action. In that case, we leave the tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init() handler. This is correctly handeled. 2) We replace the given action, or we try to add one without replacing and find out that the action at a specific index already exists (thus, we go out with error in that case). In case of 2), we have to undo the reference count increase from tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to do so because of the 'tcfc_bindcnt > 0' check which bails out early with an -EPERM error. Now, while commit 55334a5db5cd prevents 'tc actions del action ...' on an already classifier-bound action to drop the reference count (which could then become negative, wrap around etc), this restriction only accounts for invocations outside a specific action's ->init() handler. One possible solution would be to add a flag thus we possibly trigger the -EPERM ony in situations where it is indeed relevant. After the patch, above test cases have correct reference count again. Fixes: 55334a5db5cd ("net_sched: act: refuse to remove bound action outside") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30Bluetooth: 6lowpan: Fix possible raceAlexander Aring1-10/+13
This patch fix a possible race after calling register_netdev. After calling netdev_register it could be possible that netdev_ops callbacks use the uninitialized private data of lowpan_dev. By moving the initialization of this data before netdev_register we can be sure that initialized private data is be used after netdev_register. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30mac802154: Fix memory corruption with global deferred transmit state.Lennert Buytenhek3-21/+12
When transmitting a packet via a mac802154 driver that can sleep in its transmit function, mac802154 defers the call to the driver's transmit function to a per-device workqueue. However, mac802154 uses a single global work_struct for this, which means that if you have more than one registered mac802154 interface in the system, and you transmit on more than one of them at the same time, you'll very easily cause memory corruption. This patch moves the deferred transmit processing state from global variables to struct ieee802154_local, and this seems to fix the memory corruption issue. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30Bluetooth: cmtp: Do not use list_for_each_safe when not neededChristophe JAILLET1-4/+4
There is no need to use the safe version of list_for_each here. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30Bluetooth: Move create/accept phy link completed callback to amp.cArron Wang2-51/+51
To avoid amp module hooks from hci_event.c Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30Bluetooth: Move amp assoc read/write completed callback to amp.cArron Wang2-63/+77
To avoid amp module hooks from hci_event.c Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30Bluetooth: Move get info completed callback to a2mp.cArron Wang2-5/+17
To avoid a2mp module hooks from hci_event.c and send getinfo response operation only required by a2mp module, we can move this callback to a2mp.c Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30Bluetooth: Move high speed specific event under BT_HS optionArron Wang1-20/+24
Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30Bluetooth: Add BT_HS config optionArron Wang4-1/+40
Move A2MP Module under BT_HS config option and allow the user have flexible option to choose the feature only they need a2mp_discover_amp() & a2mp_channel_create() are a2mp module entry point for master and slave, and this is dynamic invoked depends on the userspace or remote request, then we defined their implementation depends on BT_HS config Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-29act_bpf: fix memory leaks when replacing bpf programsDaniel Borkmann1-18/+35
We currently trigger multiple memory leaks when replacing bpf actions, besides others: comm "tc", pid 1909, jiffies 4294851310 (age 1602.796s) hex dump (first 32 bytes): 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ 18 b0 98 6d 00 88 ff ff 00 00 00 00 00 00 00 00 ...m............ backtrace: [<ffffffff817e623e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff8120a22d>] __vmalloc_node_range+0x1bd/0x2c0 [<ffffffff8120a37a>] __vmalloc+0x4a/0x50 [<ffffffff811a8d0a>] bpf_prog_alloc+0x3a/0xa0 [<ffffffff816c0684>] bpf_prog_create+0x44/0xa0 [<ffffffffa09ba4eb>] tcf_bpf_init+0x28b/0x3c0 [act_bpf] [<ffffffff816d7001>] tcf_action_init_1+0x191/0x1b0 [<ffffffff816d70a2>] tcf_action_init+0x82/0xf0 [<ffffffff816d4d12>] tcf_exts_validate+0xb2/0xc0 [<ffffffffa09b5838>] cls_bpf_modify_existing+0x98/0x340 [cls_bpf] [<ffffffffa09b5cd6>] cls_bpf_change+0x1a6/0x274 [cls_bpf] [<ffffffff816d56e5>] tc_ctl_tfilter+0x335/0x910 [<ffffffff816b9145>] rtnetlink_rcv_msg+0x95/0x240 [<ffffffff816df34f>] netlink_rcv_skb+0xaf/0xc0 [<ffffffff816b909e>] rtnetlink_rcv+0x2e/0x40 [<ffffffff816deaaf>] netlink_unicast+0xef/0x1b0 Issue is that the old content from tcf_bpf is allocated and needs to be released when we replace it. We seem to do that since the beginning of act_bpf on the filter and insns, later on the name as well. Example test case, after patch: # FOO="1,6 0 0 4294967295," # BAR="1,6 0 0 4294967294," # tc actions add action bpf bytecode "$FOO" index 2 # tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 2 ref 1 bind 0 # tc actions replace action bpf bytecode "$BAR" index 2 # tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe index 2 ref 1 bind 0 # tc actions replace action bpf bytecode "$FOO" index 2 # tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 2 ref 1 bind 0 # tc actions del action bpf index 2 [...] # echo "scan" > /sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak | grep "comm \"tc\"" | wc -l 0 Fixes: d23b8ad8ab23 ("tc: add BPF based action") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29openvswitch: Re-add CONFIG_OPENVSWITCH_VXLANThomas Graf5-200/+235
This readds the config option CONFIG_OPENVSWITCH_VXLAN to avoid a hard dependency of OVS on VXLAN. It moves the VXLAN config compat code to vport-vxlan.c and allows compliation as a module. Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Fixes: 2661371ace96 ("openvswitch: fix compilation when vxlan is a module") Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29ipv6: flush nd cache on IFF_NOARP changeEric Dumazet1-0/+6
This patch is the IPv6 equivalent of commit 6c8b4e3ff81b ("arp: flush arp cache on IFF_NOARP change") Without it, we keep buggy neighbours in the cache, with destination MAC address equal to our own MAC address. Tested: tcpdump -i eth0 -s 0 ip6 -n -e & ip link set dev eth0 arp off ping6 remote // sends buggy frames ip link set dev eth0 arp on ping6 remote // should work once kernel is patched Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mario Fanelli <mariofanelli@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29net: pktgen: Remove unused 'allocated_skbs' fieldBogdan Hamciuc1-2/+0
Field pktgen_dev.allocated_skbs had been written to, but never read from. The number of allocated skbs can be deduced anyway, from the total number of sent packets and the 'clone_skb' param. Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29net: pktgen: Observe needed_headroom of the deviceBogdan Hamciuc1-1/+2
Allocate enough space so as not to force the outgoing net device to do skb_realloc_headroom(). Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29lwtunnel: Make lwtun_encaps[] staticThomas Graf1-1/+1
Any external user should use the registration API instead of accessing this directly. Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29net: Set sk_txhash from a random numberTom Herbert4-6/+6
This patch creates sk_set_txhash and eliminates protocol specific inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a random number instead of performing flow dissection. sk_set_txash is also allowed to be called multiple times for the same socket, we'll need this when redoing the hash for negative routing advice. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29tipc: fix bug in broadcast synch message create functionJon Maloy1-0/+3
In commit d999297c3dbbe7fdd832f7fa4ec84301e170b3e6 ("tipc: reduce locking scope during packet reception") we introduced a new function tipc_build_bcast_sync_msg(), which carries initial synchronization data between two nodes at first contact and at re-contact. In this function, we missed to add synchronization data, with the effect that the broadcast link endpoints will fail to synchronize correctly at re-contact between a running and a restarted node. All other cases work as intended. With this commit, we fix this bug. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>