summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-07-04Merge tag 'batadv-next-for-davem-20160704' of ↵David S. Miller41-1217/+3716
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature patchset includes the following changes: - Cleanup work by Markus Pargmann and Sven Eckelmann (six patches) - Initial Netlink support by Matthias Schiffer (two patches) - Throughput Meter implementation by Antonio Quartulli, a kernel-space traffic generator to estimate link speeds. This feature is useful on low-end WiFi APs where running iperf or netperf from userspace gives wrong results due to heavy userspace/kernelspace overhead. (two patches) - API clean-up work by Antonio Quartulli (one patch) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04mlxsw: spectrum: Add couple of lower device helper functionsJiri Pirko1-0/+46
Add functions that iterate over lower devices and find port device. As a dependency add netdev_for_each_all_lower_dev and netdev_for_each_all_lower_dev_rcu macro with netdev_all_lower_get_next and netdev_all_lower_get_next_rcu shelpers. Also, add functions to return mlxsw struct according to lower device found and mlxsw_port struct with a reference to lower device. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04bpf: add bpf_get_hash_recalc helperDaniel Borkmann1-0/+19
If skb_clear_hash() was invoked due to mangling of relevant headers and BPF program needs skb->hash later on, we can add a helper to trigger hash recalculation via bpf_get_hash_recalc(). The helper will return the newly retrieved hash directly, but later access can also be done via skb context again through skb->hash directly (inline) without needing to call the helper once more. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04net: pktgen: support injecting packets for qdisc testingJohn Fastabend1-2/+40
Add another xmit_mode to pktgen to allow testing xmit functionality of qdiscs. The new mode "queue_xmit" injects packets at __dev_queue_xmit() so that qdisc is called. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04net sched actions: skbedit convert to use more modern nla_put_xxxJamal Hadi Salim1-8/+4
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04net sched actions: skbedit add support for mod-ing skb pkt_typeJamal Hadi Salim1-1/+17
Extremely useful for setting packet type to host so i dont have to modify the dst mac address using pedit (which requires that i know the mac address) Example usage: tc filter add dev eth0 parent ffff: protocol ip pref 9 u32 \ match ip src 5.5.5.5/32 \ flowid 1:5 action skbedit ptype host This will tag all packets incoming from 5.5.5.5 with type PACKET_HOST Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04net: simplify and make pkt_type_ok() available for other usersJamal Hadi Salim1-8/+1
Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04batman-adv: split routing API data structure in subobjectsAntonio Quartulli12-124/+150
The routing API data structure contains several function pointers that can easily be grouped together based on the component they work with. Split the API in subobjects in order to improve definition readability. At the same time, remove the "bat_" prefix from the API object and its fields names. These are batman-adv private structs and there is no need to always prepend such prefix, which only makes function invocations much much longer. Signed-off-by: Antonio Quartulli <a@unstable.cc> Reviewed-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-04batman-adv: throughput meter implementationAntonio Quartulli12-10/+1978
The throughput meter module is a simple, kernel-space replacement for throughtput measurements tool like iperf and netperf. It is intended to approximate TCP behaviour. It is invoked through batctl: the protocol is connection oriented, with cumulative acknowledgment and a dynamic-size sliding window. The test *can* be interrupted by batctl. A receiver side timeout avoids unlimited waitings for sender packets: after one second of inactivity, the receiver abort the ongoing test. Based on a prototype from Edo Monticelli <montik@autistici.org> Signed-off-by: Antonio Quartulli <antonio.quartulli@open-mesh.com> Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-04batman-adv: return netdev status in the TX pathAntonio Quartulli5-44/+68
Return the proper netdev TX status along the TX path so that the tp_meter can understand when the queue is full and should stop sending packets. Signed-off-by: Antonio Quartulli <antonio.quartulli@open-mesh.com> Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-04batman-adv: add netlink command to query generic mesh information filesMatthias Schiffer1-0/+137
BATADV_CMD_GET_MESH_INFO is used to query basic information about a batman-adv softif (name, index and MAC address for both the softif and the primary hardif; routing algorithm; batman-adv version). Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Andrew Lunn <andrew@lunn.ch> [sven.eckelmann@open-mesh.com: Reduce the number of changes to BATADV_CMD_GET_MESH_INFO, add missing kerneldoc, add policy for attributes] Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-04batman-adv: add generic netlink family for batman-advMatthias Schiffer4-0/+87
debugfs is currently severely broken virtually everywhere in the kernel where files are dynamically added and removed (see http://lkml.iu.edu/hypermail/linux/kernel/1506.1/02196.html for some details). In addition to that, debugfs is not namespace-aware. Instead of adding new debugfs entries, the whole infrastructure should be moved to netlink. This will fix the long standing problem of large buffers for debug tables and hard to parse text files. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Andrew Lunn <andrew@lunn.ch> [sven.eckelmann@open-mesh.com: Strip down patch to only add genl family, add missing kerneldoc] Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-02net/devlink: Add E-Switch mode controlOr Gerlitz1-0/+87
Add the commands to set and show the mode of SRIOV E-Switch, two modes are supported: * legacy: operating in the "old" L2 based mode (DMAC --> VF vport) * switchdev: the E-Switch is referred to as whitebox switch configured using standard tools such as tc, bridge, openvswitch etc. To allow working with the tools, for each VF, a VF representor netdevice is created by the E-Switch manager vendor device driver instance (e.g PF). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01Merge tag 'batadv-next-for-davem-20160701' of ↵David S. Miller25-182/+694
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature patchset includes the following changes: - two patches with minimal clean up work by Antonio Quartulli and Simon Wunderlich - eight patches of B.A.T.M.A.N. V, API and documentation clean up work, by Antonio Quartulli and Marek Lindner - Andrew Lunn fixed the skb priority adoption when forwarding fragmented packets (two patches) - Multicast optimization support is now enabled for bridges which comes with some protocol updates, by Linus Luessing ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: Do not send a pong to an incoming ping with 0 src portSowmini Varadhan1-0/+4
RDS ping messages are sent with a non-zero src port to a zero dst port, so that the rds pong messages can be sent back to the originators src port. However if a confused/malicious sender sends a ping with a 0 src port, we'd have an infinite ping-pong loop. To avoid this, the receiver should ignore ping messages with a 0 src port. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Simplify reconnect to avoid duelling reconnnect attemptsSowmini Varadhan2-3/+6
When reconnecting, the peer with the smaller IP address will initiate the reconnect, to avoid needless duelling SYN issues. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Hooks to set up a single connection pathSowmini Varadhan9-17/+20
This patch adds ->conn_path_connect callbacks in the rds_transport that are used to set up a single connection path. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: make receive path use the rds_conn_pathSowmini Varadhan9-22/+26
The ->sk_user_data contains a pointer to the rds_conn_path for the socket. Use this consistently in the rds_tcp_data_ready callbacks to get the rds_conn_path for rds_recv_incoming. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: make ->sk_user_data point to a rds_conn_pathSowmini Varadhan6-40/+41
The socket callbacks should all operate on a struct rds_conn_path, in preparation for a MP capable RDS-TCP. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Refactor connection destruction to handle multiple pathsSowmini Varadhan1-7/+39
A single rds_connection may have multiple rds_conn_paths that have to be carefully and correctly destroyed, for both rmmod and netns-delete cases. For both cases, we extract a single rds_tcp_connection for each conn into a temporary list, and then invoke rds_conn_destroy() which iteratively dismantles every path in the rds_connection. For the netns deletion case, we additionally have to make sure that we do not leave a socket in TIME_WAIT state, as this will hold up the netns deletion. Thus we call rds_tcp_conn_paths_destroy() to reset state quickly. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Make rds_tcp_connection track the rds_conn_pathSowmini Varadhan5-42/+48
The struct rds_tcp_connection is the transport-specific private data structure that tracks TCP information per rds_conn_path. Modify this structure to have a back-pointer to the rds_conn_path for which it is the ->cp_transport_data. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Remove dead logic around c_passive in rds-tcpSowmini Varadhan1-6/+1
The c_passive bit is only intended for the IB transport and will never be encountered in rds-tcp, so remove the dead logic that predicates on this bit. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: Rework path specific indirectionsSowmini Varadhan12-40/+29
Refactor code to avoid separate indirections for single-path and multipath transports. All transports (both single and mp-capable) will get a pointer to the rds_conn_path, and can trivially derive the rds_connection from the ->cp_conn. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01cgroup: bpf: Add bpf_skb_in_cgroup_protoMartin KaFai Lau1-0/+38
Adds a bpf helper, bpf_skb_in_cgroup, to decide if a skb->sk belongs to a descendant of a cgroup2. It is similar to the feature added in netfilter: commit c38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match") The user is expected to populate a BPF_MAP_TYPE_CGROUP_ARRAY which will be used by the bpf_skb_in_cgroup. Modifications to the bpf verifier is to ensure BPF_MAP_TYPE_CGROUP_ARRAY and bpf_skb_in_cgroup() are always used together. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01bpf: refactor bpf_prog_get and type check into helperDaniel Borkmann5-36/+5
Since bpf_prog_get() and program type check is used in a couple of places, refactor this into a small helper function that we can make use of. Since the non RO prog->aux part is not used in performance critical paths and a program destruction via RCU is rather very unlikley when doing the put, we shouldn't have an issue just doing the bpf_prog_get() + prog->type != type check, but actually not taking the ref at all (due to being in fdget() / fdput() section of the bpf fd) is even cleaner and makes the diff smaller as well, so just go for that. Callsites are changed to make use of the new helper where possible. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net: introduce NETDEV_CHANGE_TX_QUEUE_LENJason Wang2-5/+26
This patch introduces a new event - NETDEV_CHANGE_TX_QUEUE_LEN, this will be triggered when tx_queue_len. It could be used by net device who want to do some processing at that time. An example is tun who may want to resize tx array when tx_queue_len is changed. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net/sched/sch_hfsc.c: anchor virtual curve at proper vt in hfsc_change_fsc()Michal Soltys1-1/+1
cl->cl_vt alone is relative only to the current backlog period, while the curve operates on cumulative virtual time. This patch adds missing cl->cl_vtoff. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net/sched/sch_hfsc.c: go passive after vt updateMichal Soltys1-16/+15
When a class is going passive, it should update its cl_vt first to be consistent with the last dequeue operation. Otherwise its cl_vt will be one packet behind and parent's cvtmax might not be updated as well. One possible side effect is if some class goes passive and subsequently goes active /without/ its parent going passive - with cl_vt lagging one packet behind - comparison made in init_vf() will be affected (same period). Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net/sched/sch_hfsc.c: remove leftover dlist and droplistMichal Soltys1-8/+0
This is update to: commit a09ceb0e08140a ("sched: remove qdisc->drop") That commit removed qdisc->drop, but left alone dlist and droplist that no longer serve any meaningful purpose. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net/sched/sch_hfsc.c: add unlikely() in qdisc_peek_len()Michal Soltys1-1/+1
The condition can only succeed on wrong configurations. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net/sched/sch_hfsc.c: handle corner cases where head may change invalidating ↵Michal Soltys1-1/+10
calculated deadline Realtime scheduling implemented in HFSC uses head of the queue to make the decision about which packet to schedule next. But in case of any head drop, the deadline calculated for the previous head is not necessarily correct for the next head (unless both packets have the same length). Thanks to peek() function used during dequeue - which internally is a dequeue operation - hfsc is almost safe from this issue, as peek() dequeues and isolates the head storing it temporarily until the real dequeue happens. But there is one exception: if after the class activation a drop happens before the first dequeue operation, there's never a chance to do the peek(). Adding peek() call in enqueue - if this is the first packet in a new backlog period AND the scheduler has realtime curve defined - fixes that one corner case. The 1st hfsc_dequeue() will use that peeked packet, similarly as every subsequent hfsc_dequeue() call uses packet peeked by the previous call. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01tcp: md5: use kmalloc() backed scratch areasEric Dumazet3-30/+40
Some arches have virtually mapped kernel stacks, or will soon have. tcp_md5_hash_header() uses an automatic variable to copy tcp header before mangling th->check and calling crypto function, which might be problematic on such arches. David says that using percpu storage is also problematic on non SMP builds. Just use kmalloc() to allocate scratch areas. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30fib_rules: Added NLM_F_EXCL support to fib_nl_newruleMateusz Bajorski1-0/+49
When adding rule with NLM_F_EXCL flag then check if the same rule exist. If yes then exit with -EEXIST. This is already implemented in iproute2: if (cmd == RTM_NEWRULE) { req.n.nlmsg_flags |= NLM_F_CREATE|NLM_F_EXCL; req.r.rtm_type = RTN_UNICAST; } Tested ipv4 and ipv6 with net-next linux on qemu x86 expected behavior after patch: localhost ~ # ip rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005 localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005 RTNETLINK answers: File exists localhost ~ # ip rule 0: from all lookup local 1005: from 10.46.177.97 lookup 104 32766: from all lookup main 32767: from all lookup default There was already topic regarding this but I don't see any changes merged and problem still occurs. https://lkml.kernel.org/r/1135778809.5944.7.camel+%28%29+localhost+%21+localdomain Signed-off-by: Mateusz Bajorski <mateusz.bajorski@nokia.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30tcp: add an ability to dump and restore window parametersAndrey Vagin1-0/+57
We found that sometimes a restored tcp socket doesn't work. A reason of this bug is incorrect window parameters and in this case tcp_acceptable_seq() returns tcp_wnd_end(tp) instead of tp->snd_nxt. The other side drops packets with this seq, because seq is less than tp->rcv_nxt ( tcp_sequence() ). Data from a send queue is sent only if there is enough space in a window, so when we restore unacked data, we need to expand a window to fit this data. This was in a first version of this patch: "tcp: extend window to fit all restored unacked data in a send queue" Then Alexey recommended me to restore window parameters instead of adjusted them according with data in a sent queue. This sounds resonable. rcv_wnd has to be restored, because it was reported to another side and the offered window is never shrunk. One of reasons why we need to restore snd_wnd was described above. Cc: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30net: bridge: add support for IGMP/MLD stats and export them via netlinkNikolay Aleksandrov8-46/+363
This patch adds stats support for the currently used IGMP/MLD types by the bridge. The stats are per-port (plus one stat per-bridge) and per-direction (RX/TX). The stats are exported via netlink via the new linkxstats API (RTM_GETSTATS). In order to minimize the performance impact, a new option is used to enable/disable the stats - multicast_stats_enabled, similar to the recent vlan stats. Also in order to avoid multiple IGMP/MLD type lookups and checks, we make use of the current "igmp" member of the bridge private skb->cb region to record the type on Rx (both host-generated and external packets pass by multicast_rcv()). We can do that since the igmp member was used as a boolean and all the valid IGMP/MLD types are positive values. The normal bridge fast-path is not affected at all, the only affected paths are the flooding ones and since we make use of the IGMP/MLD type, we can quickly determine if the packet should be counted using cache-hot data (cb's igmp member). We add counters for: * IGMP Queries * IGMP Leaves * IGMP v1/v2/v3 reports * MLD Queries * MLD Leaves * MLD v1/v2 reports These are invaluable when monitoring or debugging complex multicast setups with bridges. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30net: rtnetlink: add support for the IFLA_STATS_LINK_XSTATS_SLAVE attributeNikolay Aleksandrov2-5/+104
This patch adds support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute which allows to export per-slave statistics if the master device supports the linkxstats callback. The attribute is passed down to the linkxstats callback and it is up to the callback user to use it (an example has been added to the only current user - the bridge). This allows us to query only specific slaves of master devices like bridge ports and export only what we're interested in instead of having to dump all ports and searching only for a single one. This will be used to export per-port IGMP/MLD stats and also per-port vlan stats in the future, possibly other statistics as well. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30bpf: add bpf_skb_change_type helperDaniel Borkmann1-0/+24
This work adds a helper for changing skb->pkt_type in a controlled way. We only allow a subset of possible values and can extend that in future should other use cases come up. Doing this as a helper has the advantage that errors can be handeled gracefully and thus helper kept extensible. It's a write counterpart to pkt_type member we can already read from struct __sk_buff context. Major use case is to change incoming skbs to PACKET_HOST in a programmatic way instead of having to recirculate via redirect(..., BPF_F_INGRESS), for example. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30bpf: add bpf_skb_change_proto helperDaniel Borkmann1-0/+200
This patch adds a minimal helper for doing the groundwork of changing the skb->protocol in a controlled way. Currently supported is v4 to v6 and vice versa transitions, which allows f.e. for a minimal, static nat64 implementation where applications in containers that still require IPv4 can be transparently operated in an IPv6-only environment. For example, host facing veth of the container can transparently do the transitions in a programmatic way with the help of clsact qdisc and cls_bpf. Idea is to separate concerns for keeping complexity of the helper lower, which means that the programs utilize bpf_skb_change_proto(), bpf_skb_store_bytes() and bpf_lX_csum_replace() to get the job done, instead of doing everything in a single helper (and thus partially duplicating helper functionality). Also, bpf_skb_change_proto() shouldn't need to deal with raw packet data as this is done by other helpers. bpf_skb_proto_6_to_4() and bpf_skb_proto_4_to_6() unclone the skb to operate on a private one, push or pop additionally required header space and migrate the gso/gro meta data from the shared info. We do mark the gso type as dodgy so that headers are checked and segs recalculated by the gso/gro engine. The gso_size target is adapted as well. The flags argument added is currently reserved and can be used for future extensions. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30bpf: don't use raw processor id in generic helperDaniel Borkmann1-1/+9
Use smp_processor_id() for the generic helper bpf_get_smp_processor_id() instead of the raw variant. This allows for preemption checks when we have DEBUG_PREEMPT, and otherwise uses the raw variant anyway. We only need to keep the raw variant for socket filters, but we can reuse the helper that is already there from cBPF side. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller65-355/+465
Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30batman-adv: Fix bat_(iv|v) function declaration headerSven Eckelmann9-36/+84
The bat_algo.h had some functions declared which were not part of the bat_algo.c file. These are instead stored in bat_v.c and bat_iv_ogm.c. The declaration should therefore be also in bat_v.h and bat_iv_ogm,h to make them easier to find. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-06-30batman-adv: Add debugfs table for mcast flagsLinus Lüssing3-0/+130
This patch adds a debugfs table with originators and their according multicast flags to help users figure out why multicast optimizations might be enabled or disabled for them. Tested-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-06-30batman-adv: Consolidate logging related functionsSven Eckelmann23-282/+364
There are several places in batman-adv which provide logging related functions. These should be grouped together in the log.* files to make them easier to find. Reported-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-06-30batman-adv: Adding logging of mcast flag changesLinus Lüssing4-3/+142
With this patch changes relevant to a node's own multicast flags are printed to the 'mcast' log level. Tested-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-06-30batman-adv: move bat_algo functions into a separate fileSven Eckelmann11-108/+160
The bat_algo functionality in main.c is mostly unrelated to the rest of the content. It still takes up a large portion of this source file (~15%, 103 lines). Moving it to a separate file makes it better visible as a main component of the batman-adv implementation and hides it less in the other helper functions in main.c. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-06-30batman-adv: Add multicast optimization support for bridged setupsLinus Lüssing3-19/+177
With this patch we are finally able to support multicast optimizations in bridged setups, too. So far, if a bridge was added on top of a soft-interface (e.g. bat0) the batman-adv multicast optimizations needed to be disabled to avoid packetloss. Current Linux bridge implementations and API can now provide us with the so far missing information about interested but "remote" multicast receivers behind bridge ports. The Linux bridge performs the detection of remote participants interested in multicast packets with its own and mature so called IGMP and MLD snooping code and stores that in its database. With the new API provided by the bridge batman-adv can now simply hook into this database. We then reliably announce the gathered multicast listeners to other nodes through the batman-adv translation table. Additionally, the Linux bridge provides us with the information about whether an IGMP/MLD querier exists. If there is none then we need to disable multicast optimizations as we cannot learn about multicast listeners on external, bridged-in host then. Tested-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-06-30batman-adv: split tvlv into a separate fileMarkus Pargmann13-625/+700
The tvlv functionality in main.c is mostly unrelated to the rest of the content. It still takes up a large portion of this source file (~45%, 588 lines). Moving it to a separate file makes it better visible as a main component of the batman-adv implementation and hides it less in the other helper functions in main.c Signed-off-by: Markus Pargmann <mpa@pengutronix.de> [sven@narfation.org: fix conflicts with current version, fix includes, rewrote commit message] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-06-30batman-adv: Always flood IGMP/MLD reportsLinus Lüssing2-14/+75
With this patch IGMP or MLD reports are always flooded. This is necessary for the upcoming bridge integration to function without multicast packet loss. With the report handling so far bridges might miss interested multicast listeners, leading to wrongly excluding ports from multicast packet forwarding. Currently we are treating IGMP/MLD reports, the messages bridges use to learn about interested multicast listeners, just as any other multicast packet: We try to send them to nodes matching its multicast destination. Unfortunately, the destination address of reports of the older IGMPv2/MLDv1 protocol families do not strictly adhere to their own protocol: More precisely, the interested receiver, an IGMPv2 or MLDv1 querier, itself usually does not listen to the multicast destination address of any reports. Therefore with this patch we are simply excluding IGMP/MLD reports from the multicast forwarding code path and keep flooding them. By that any bridge receives them and can properly learn about listeners. To avoid compatibility issues with older nodes not yet implementing this report handling, we need to force them to flood reports: We do this by bumping the multicast TVLV version to 2, effectively disabling their multicast optimization. Tested-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-06-30batman-adv: Keep includes ordered by filenameSven Eckelmann7-11/+11
It is easier to detect if a include is already there for a used functionality when the includes are ordered. Using an alphabetic order together with the grouping in commit 1e2c2a4fe4a5 ("batman-adv: Add required includes to all files") makes includes better manageable. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-06-30batman-adv: Include frame priority in fragment headerAndrew Lunn3-2/+16
Unfragmented frames which traverse a node have their skb->priority set by looking at the IP ToS byte, or the 802.1p header. However for fragments this is not possible, only one of the fragments will contain the headers. Instead, place the priority into the fragment header and on receiving a fragment, use this information to set the skb->priority for when the fragment is forwarded. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>