summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-03-02RDS: IB: Support Fastreg MR (FRMR) memory registration modeAvinash Repaka6-5/+422
Fastreg MR(FRMR) is another method with which one can register memory to HCA. Some of the newer HCAs supports only fastreg mr mode, so we need to add support for it to have RDS functional on them. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: allocate extra space on queues for FRMR supportsantosh.shilimkar@oracle.com2-4/+16
Fastreg MR(FRMR) memory registration and invalidation makes use of work request and completion queues for its operation. Patch allocates extra queue space towards these operation(s). Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: add Fastreg MR (FRMR) detection supportsantosh.shilimkar@oracle.com3-0/+15
Discovere Fast Memmory Registration support using IB device IB_DEVICE_MEM_MGT_EXTENSIONS. Certain HCA might support just FRMR or FMR or both FMR and FRWR. In case both mr type are supported, default FMR is used. Default MR is still kept as FMR against what everyone else is following. Default will be changed to FRMR once the RDS performance with FRMR is comparable with FMR. The work is in progress for the same. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: add mr reused statssantosh.shilimkar@oracle.com3-1/+10
Add MR reuse statistics to RDS IB transport. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: handle the RDMA CM time wait eventsantosh.shilimkar@oracle.com1-0/+8
Drop the RDS connection on RDMA_CM_EVENT_TIMEWAIT_EXIT so that it can reconnect and resume. While testing fastreg, this error happened in couple of tests but was getting un-noticed. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: add connection info to ibmrsantosh.shilimkar@oracle.com1-8/+9
Preperatory patch for FRMR support. From connection info, we can retrieve cm_id which contains qp handled needed for work request posting. We also need to drop the RDS connection on QP error states where connection handle becomes useful. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: move FMR code to its own filesantosh.shilimkar@oracle.com3-106/+134
No functional change. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: create struct rds_ib_fmrsantosh.shilimkar@oracle.com3-13/+29
Keep fmr related filed in its own struct. Fastreg MR structure will be added to the union. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: Re-organise ibmr codesantosh.shilimkar@oracle.com6-347/+422
No functional changes. This is in preperation towards adding fastreg memory resgitration support. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: Remove the RDS_IB_SEND_OP dependencysantosh.shilimkar@oracle.com3-20/+29
This helps to combine asynchronous fastreg MR completion handler with send completion handler. No functional change. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: Add support for SO_TIMESTAMP for incoming messagessantosh.shilimkar@oracle.com3-2/+45
The SO_TIMESTAMP generates time stamp for each incoming RDS messages User app can enable it by using SO_TIMESTAMP setsocketopt() at SOL_SOCKET level. CMSG data of cmsg type SO_TIMESTAMP contains the time stamp in struct timeval format. Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: Drop stale iWARP RDMA transportsantosh.shilimkar@oracle.com13-4611/+6
RDS iWarp support code has become stale and non testable. As indicated earlier, am dropping the support for it. If new iWarp user(s) shows up in future, we can adapat the RDS IB transprt for the special RDMA READ sink case. iWarp needs an MR for the RDMA READ sink. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02batman-adv: clarify CFG80211 dependencyArnd Bergmann1-1/+1
The driver calls cfg80211_get_station, which may be part of a module, so we must not enable BATMAN_ADV_BATMAN_V if BATMAN_ADV=y and CFG80211=m: net/built-in.o: In function `batadv_v_elp_get_throughput': (text+0x5c62c): undefined reference to `cfg80211_get_station' This clarifies the dependency to cover all combinations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: c833484e5f38 ("batman-adv: ELP - compute the metric based on the estimated throughput") Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller22-36/+2173
Antonio Quartulli says: ==================== batman-adv 20160229 this is our (hopefully) latest batch of patches intended for net-next. With this patchset we finally introduce B.A.T.M.A.N. V: the latest version of our routing protocol. Technical documentation describing the protocol in more detail can be found in our wiki[1][2][3][4]. For what concerns this pull request, you can find the high level description right below. [1] https://www.open-mesh.org/projects/batman-adv/wiki/BATMAN_V [2] https://www.open-mesh.org/projects/batman-adv/wiki/OGMv2 [3] https://www.open-mesh.org/projects/batman-adv/wiki/ELP [4] https://www.open-mesh.org/projects/batman-adv/wiki/BATMAN_V_Tests ... With this patchset we finally introduce our new routing protocol: B.A.T.M.A.N. V. Its implementation started quite some years ago, but due to the big changes being introduced it took a while to be discussed, designed, worked, re-worked, tested and debugged (well, we're never done with the latest). The entire operation has basically been a team work involving all the core contributors together with other people interested in the project. The new protocol is divided into two main subcomponents, called respectively ELP and OGMv2. The former is in charge of dealing with the neighbour discovery and link quality estimation, while the latter implements the algorithm that spreads the metrics around the network and computes optimal paths. The biggest change introduced with B.A.T.M.A.N. V is the new metric: the protocol won't rely on packet loss anymore, but it will use the estimated throughput extracted directly from the wifi driver (when available) by querying cfg80211. Batman-adv will also send some unicast probing packets when an interface is not used for payload traffic to make sure that such values are current. The new protocol can be compiled-in or not like other features we have and when selected will pull in CFG80211 as dependency for the reason described above. Thanks to the big work brought up in the past by Marek Lindner, batman-adv can easily deal several protocol implementations, therefore compiling in this new version does not exclude the older. This means that the user is offered the option to choose the protocol when creating the mesh interface (default is the old one to keep backward compatibility). Along with the protocol there are some sysfs knobs that are introduced to fine tune some of its behaviours, but users are recommended to keep the default values unless they know what they are doing. The last patch is about advertising our own patchwork platform (thanks to Sven Eckelmann for having set that up!) in the MAINTAINERS file. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: pktgen: use reset to set mac headerZhang Shengju1-2/+2
Since offset is zero, it's not necessary to use set function. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01sch_mqprio: Fix build with older gcc.David S. Miller1-1/+1
CC [M] net/sched/sch_mqprio.o net/sched/sch_mqprio.c: In function ?mqprio_init?: net/sched/sch_mqprio.c:145: error: unknown field ?tc? specified in initializer net/sched/sch_mqprio.c:145: warning: missing braces around initializer net/sched/sch_mqprio.c:145: warning: (near initialization for ?tc.<anonymous>?) make[2]: *** [net/sched/sch_mqprio.o] Error 1 make[1]: *** [net/sched] Error 2 make: *** [net] Error 2 Several people reported this, surround the unnamed union member initialization with braces to fix. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: remove skb_sender_cpu_clear()WANG Cong8-14/+0
After commit 52bd2d62ce67 ("net: better skb->sender_cpu and skb->napi_id cohabitation") skb_sender_cpu_clear() becomes empty and can be removed. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: ipv6/l3mdev: Move host route on saved address if necessaryDavid Ahern1-0/+26
Commit f1705ec197e70 allows IPv6 addresses to be retained on a link down. The address can have a cached host route which can point to the wrong FIB table if the L3 enslavement is changed (e.g., route can point to local table instead of VRF table if device is added to an L3 domain). On link up check the table of the cached host route against the FIB table associated with the device and correct if needed. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: sctp: Convert log timestamps to be y2038 safeDeepa Dinamani1-5/+5
SCTP probe log timestamps use struct timespec which is not y2038 safe. Use struct timespec64 which is 2038 safe instead. Use monotonic time instead of real time as only time differences are logged. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-sctp@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: ipv4: tcp_probe: Replace timespec with timespec64Deepa Dinamani1-4/+4
TCP probe log timestamps use struct timespec which is not y2038 safe. Even though timespec might be good enough here as it is used to represent delta time, the plan is to get rid of all uses of timespec in the kernel. Replace with struct timespec64 which is y2038 safe. Prints still use unsigned long format and type. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: ipv4: Convert IP network timestamps to be y2038 safeDeepa Dinamani3-12/+33
ICMP timestamp messages and IP source route options require timestamps to be in milliseconds modulo 24 hours from midnight UT format. Add inet_current_timestamp() function to support this. The function returns the required timestamp in network byte order. Timestamp calculation is also changed to call ktime_get_real_ts64() which uses struct timespec64. struct timespec64 is y2038 safe. Previously it called getnstimeofday() which uses struct timespec. struct timespec is not y2038 safe. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01Support to encoding decoding skb prio on IFE actionJamal Hadi Salim3-0/+82
Example usage: Set the skb priority using skbedit then allow it to be encoded sudo tc qdisc add dev $ETH root handle 1: prio sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action skbedit prio 17 \ action ife encode \ allow prio \ dst 02:15:15:15:15:15 Note: You dont need the skbedit action if you are already encoding the skb priority earlier. A zero skb priority will not be sent Alternative hard code static priority of decimal 33 (unlike skbedit) then mark of 0x12 every time the filter matches sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action ife encode \ type 0xDEAD \ use prio 33 \ use mark 0x12 \ dst 02:15:15:15:15:15 Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01Support to encoding decoding skb mark on IFE actionJamal Hadi Salim3-0/+85
Example usage: Set the skb using skbedit then allow it to be encoded sudo tc qdisc add dev $ETH root handle 1: prio sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action skbedit mark 17 \ action ife encode \ allow mark \ dst 02:15:15:15:15:15 Note: You dont need the skbedit action if you are already encoding the skb mark earlier. A zero skb mark, when seen, will not be encoded. Alternative hard code static mark of 0x12 every time the filter matches sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action ife encode \ type 0xDEAD \ use mark 0x12 \ dst 02:15:15:15:15:15 Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01introduce IFE actionJamal Hadi Salim3-0/+883
This action allows for a sending side to encapsulate arbitrary metadata which is decapsulated by the receiving end. The sender runs in encoding mode and the receiver in decode mode. Both sender and receiver must specify the same ethertype. At some point we hope to have a registered ethertype and we'll then provide a default so the user doesnt have to specify it. For now we enforce the user specify it. Lets show example usage where we encode icmp from a sender towards a receiver with an skbmark of 17; both sender and receiver use ethertype of 0xdead to interop. YYYY: Lets start with Receiver-side policy config: xxx: add an ingress qdisc sudo tc qdisc add dev $ETH ingress xxx: any packets with ethertype 0xdead will be subjected to ife decoding xxx: we then restart the classification so we can match on icmp at prio 3 sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \ u32 match u32 0 0 flowid 1:1 \ action ife decode reclassify xxx: on restarting the classification from above if it was an icmp xxx: packet, then match it here and continue to the next rule at prio 4 xxx: which will match based on skb mark of 17 sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \ u32 match ip protocol 1 0xff flowid 1:1 \ action continue xxx: match on skbmark of 0x11 (decimal 17) and accept sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \ handle 0x11 fw flowid 1:1 \ action ok xxx: Lets show the decoding policy sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead xxx: filter pref 2 u32 filter pref 2 u32 fh 800: ht divisor 1 filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 (rule hit 0 success 0) match 00000000/00000000 at 0 (success 0 ) action order 1: ife decode action reclassify index 1 ref 1 bind 1 installed 14 sec used 14 sec type: 0x0 Metadata: allow mark allow hash allow prio allow qmap Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 xxx: Observe that above lists all metadatum it can decode. Typically these submodules will already be compiled into a monolithic kernel or loaded as modules YYYY: Lets show the sender side now .. xxx: Add an egress qdisc on the sender netdev sudo tc qdisc add dev $ETH root handle 1: prio xxx: xxx: Match all icmp packets to 192.168.122.237/24, then xxx: tag the packet with skb mark of decimal 17, then xxx: Encode it with: xxx: ethertype 0xdead xxx: add skb->mark to whitelist of metadatum to send xxx: rewrite target dst MAC address to 02:15:15:15:15:15 xxx: sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 u32 \ match ip dst 192.168.122.237/24 \ match ip protocol 1 0xff \ flowid 1:2 \ action skbedit mark 17 \ action ife encode \ type 0xDEAD \ allow mark \ dst 02:15:15:15:15:15 xxx: Lets show the encoding policy sudo tc -s filter ls dev $ETH parent 1: protocol ip xxx: filter pref 10 u32 filter pref 10 u32 fh 800: ht divisor 1 filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2 (rule hit 0 success 0) match c0a87aed/ffffffff at 16 (success 0 ) match 00010000/00ff0000 at 8 (success 0 ) action order 1: skbedit mark 17 index 6 ref 1 bind 1 Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 2: ife encode action pipe index 3 ref 1 bind 1 dst MAC: 02:15:15:15:15:15 type: 0xDEAD Metadata: allow mark Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 xxx: test by sending ping from sender to destination Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01Merge tag 'mac80211-next-for-davem-2016-02-26' of ↵David S. Miller40-687/+1096
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Here's another round of updates for -next: * big A-MSDU RX performance improvement (avoid linearize of paged RX) * rfkill changes: cleanups, documentation, platform properties * basic PBSS support in cfg80211 * MU-MIMO action frame processing support * BlockAck reordering & duplicate detection offload support * various cleanups & little fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01bridge: mcast: add support for more router port information dumpingNikolay Aleksandrov1-2/+14
Allow for more multicast router port information to be dumped such as timer and type attributes. For that that purpose we need to extend the MDBA_ROUTER_PORT attribute similar to how it was done for the mdb entries recently. The new format is thus: [MDBA_ROUTER_PORT] = { <- nested attribute u32 ifindex <- router port ifindex for user-space compatibility [MDBA_ROUTER_PATTR attributes] } This way it remains compatible with older users (they'll simply retrieve the u32 in the beginning) and new users can parse the remaining attributes. It would also allow to add future extensions to the router port without breaking compatibility. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01bridge: mcast: add support for temporary port routerNikolay Aleksandrov1-2/+19
Add support for a temporary router port which doesn't depend only on the incoming query. It can be refreshed if set to the same value, which is a no-op for the rest. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01bridge: mcast: do nothing if port's multicast_router is set to the same valNikolay Aleksandrov1-1/+4
This is needed for the upcoming temporary port router. There's no point to go through the logic if the value is the same. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01bridge: mcast: use names for the different multicast_router typesNikolay Aleksandrov1-28/+33
Using raw values makes it difficult to extend and also understand the code, give them names and do explicit per-option manipulation in br_multicast_set_port_router. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: dsa: support VLAN filtering switchdev attrVivien Didelot1-0/+21
When a user explicitly requests VLAN filtering with something like: # echo 1 > /sys/class/net/<bridge>/bridge/vlan_filtering Switchdev propagates a SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING port attribute. Add support for it in the DSA layer with a new port_vlan_filtering function to let drivers toggle 802.1Q filtering on user demand. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01Introduce devlink infrastructureJiri Pirko3-0/+746
Introduce devlink infrastructure for drivers to register and expose to userspace via generic Netlink interface. There are two basic objects defined: devlink - one instance for every "parent device", for example switch ASIC devlink port - one instance for every physical port of the device. This initial portion implements basic get/dump of objects to userspace. Also, port splitter and port type setting is implemented. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: sched: cls_u32 add bit to specify software only rulesJohn Fastabend1-10/+27
In the initial implementation the only way to stop a rule from being inserted into the hardware table was via the device feature flag. However this doesn't work well when working on an end host system where packets are expect to hit both the hardware and software datapaths. For example we can imagine a rule that will match an IP address and increment a field. If we install this rule in both hardware and software we may increment the field twice. To date we have only added support for the drop action so we have been able to ignore these cases. But as we extend the action support we will hit this example plus more such cases. Arguably these are not even corner cases in many working systems these cases will be common. To avoid forcing the driver to always abort (i.e. the above example) this patch adds a flag to add a rule in software only. A careful user can use this flag to build software and hardware datapaths that work together. One example we have found particularly useful is to use hardware resources to set the skb->mark on the skb when the match may be expensive to run in software but a mark lookup in a hash table is cheap. The idea here is hardware can do in one lookup what the u32 classifier may need to traverse multiple lists and hash tables to compute. The flag is only passed down on inserts. On deletion to avoid stale references in hardware we always try to remove a rule if it exists. The flags field is part of the classifier specific options. Although it is tempting to lift this into the generic structure doing this proves difficult do to how the tc netlink attributes are implemented along with how the dump/change routines are called. There is also precedence for putting seemingly generic pieces in the specific classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So although not ideal I've left FLAGS in the u32 options as well as it simplifies the code greatly and user space has already learned how to manage these bits ala 'tc' tool. Another thing if trying to update a rule we require the flags to be unchanged. This is to force user space, software u32 and the hardware u32 to keep in sync. Thanks to Simon Horman for catching this case. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: sched: consolidate offload decision in cls_u32John Fastabend1-4/+4
The offload decision was originally very basic and tied to if the dev implemented the appropriate ndo op hook. The next step is to allow the user to more flexibly define if any paticular rule should be offloaded or not. In order to have this logic in one function lift the current check into a helper routine tc_should_offload(). Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01ovs: propagate per dp max headroom to all vportsPaolo Abeni3-1/+53
This patch implements bookkeeping support to compute the maximum headroom for all the devices in each datapath. When said value changes, the underlying devs are notified via the ndo_set_rx_headroom method. This also increases the internal vports xmit performance. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01bridge: notify enslaved devices of headroom changesPaolo Abeni1-2/+35
On bridge needed_headroom changes, the enslaved devices are notified via the ndo_set_rx_headroom method Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-29sch_dsmark: update backlog as wellWANG Cong1-0/+3
Similarly, we need to update backlog too when we update qlen. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-29sch_htb: update backlog as wellWANG Cong1-1/+4
We saw qlen!=0 but backlog==0 on our production machine: qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17 Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 requeues 0) backlog 0b 72p requeues 0 The problem is we only count qlen for HTB qdisc but not backlog. We need to update backlog too when we update qlen, so that we can at least know the average packet length. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-29net_sched: update hierarchical backlog tooWANG Cong19-47/+84
When the bottom qdisc decides to, for example, drop some packet, it calls qdisc_tree_decrease_qlen() to update the queue length for all its ancestors, we need to update the backlog too to keep the stats on root qdisc accurate. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-29net_sched: introduce qdisc_replace() helperWANG Cong12-78/+12
Remove nearly duplicated code and prepare for the following patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-29batman-adv: Start new development cycleSimon Wunderlich1-1/+1
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-29batman-adv: B.A.T.M.A.N. V - implement bat_neigh_print APILinus Luessing1-0/+55
Lists all neighbours detected by the Echo Locating Protocol (ELP) and their throughput metric. Initially Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
2016-02-29batman-adv: B.A.T.M.A.N. V - implement bat_orig_print APIAntonio Quartulli1-0/+105
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2016-02-29batman-adv: B.A.T.M.A.N. V - implement neighbor comparison API callsAntonio Quartulli1-0/+38
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2016-02-29batman-adv: ELP - send unicast ELP packets for throughput samplingAntonio Quartulli2-0/+72
In case of an unused wireless link, the mac80211 throughput estimation won't get updated further. Consequently, the reported throughput metric will become obsolete. With this patch unicast sampling is introduced by periodically sending unicast ELP packets to each neighbor on idle WiFi links. These sampling packets will fill an entire frame, so that the measurement is as reliable as possible Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2016-02-29batman-adv: ELP - compute the metric based on the estimated throughputAntonio Quartulli7-2/+164
In case of wireless interface retrieve the throughput by querying cfg80211. To perform this call a separate work must be scheduled because the function may sleep and this is not allowed within an RCU protected context (RCU in this case is used to iterate over all the neighbours). Use ethtool to retrieve information about an Ethernet link like HALF/FULL_DUPLEX and advertised bandwidth (e.g. 100/10Mbps). The metric is updated each time a new ELP packet is sent, this way it is possible to timely react to a metric variation which can imply (for example) a neighbour disconnection. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2016-02-29batman-adv: keep track of when unicast packets are sentAntonio Quartulli10-34/+75
To enable ELP to send probing packets over wireless links only if needed, batman-adv must keep track of the last time it sent a unicast packet towards every neighbour. For this purpose a 2 main changes are introduced: 1) a new member of the elp_neigh_node structure stores the last time a unicast packet was sent towards this neighbour; 2) a wrapper function for sending unicast packets is implemented. This function will simply update the member describe din point 1) and then forward the packet to the real sending routine. Point 2) implies that any code-path leading to a unicast sending now has to use the new wrapper. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2016-02-29batman-adv: add throughput override attribute to hard_ifacesAntonio Quartulli5-2/+86
This attribute is exported to user space to disable the link throughput auto-detection by setting a fixed value. The throughput override value is used when batman-adv is computing the link throughput towards a neighbour. If the value is set to 0 then batman-adv will try to detect the throughput by itself. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2016-02-29batman-adv: OGMv2 - implement originators logicAntonio Quartulli5-41/+566
Add the support for recognising new originators in the network and rebroadcast their OGMs. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2016-02-29batman-adv: OGMv2 - add basic infrastructureAntonio Quartulli9-3/+427
This is the initial implementation of the new OGM protocol (version 2). It has been designed to work on top of the newly added ELP. In the previous version the OGM protocol was used to both measure link qualities and flood the network with the metric information. In this version the protocol is in charge of the latter task only, leaving the former to ELP. This means being able to decouple the interval used by the neighbor discovery from the OGM broadcasting, which revealed to be costly in dense networks and needed to be relaxed so leading to a less responsive routing protocol. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2016-02-29batman-adv: ELP - adding sysfs parameter for elp intervalLinus Luessing1-0/+7
This parameter can be set individually on each interface and allows the configuration of the elp interval for the link quality measurements during runtime. Usually it is desirable to set it to a higher (= slower) value on interfaces which have a more static characteristic (e.g. wired interfaces) or very dense neighbourhoods to reduce overhead. Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> [antonio@open-mesh.com: respin on top of the latest master] Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>