summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2015-04-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller31-543/+623
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next A final pull request, I know it's very late but this time I think it's worth a bit of rush. The following patchset contains Netfilter/nf_tables updates for net-next, more specifically concatenation support and dynamic stateful expression instantiation. This also comes with a couple of small patches. One to fix the ebtables.h userspace header and another to get rid of an obsolete example file in tree that describes a nf_tables expression. This time, I decided to paste the original descriptions. This will result in a rather large commit description, but I think these bytes to keep. Patrick McHardy says: ==================== netfilter: nf_tables: concatenation support The following patches add support for concatenations, which allow multi dimensional exact matches in O(1). The basic idea is to split the data registers, currently consisting of 4 registers of 16 bytes each, into smaller units, 16 registers of 4 bytes each, and making sure each register store always leaves the full 32 bit in a well defined state, meaning smaller stores will zero the remaining bits. Based on that, we can load multiple adjacent registers with different values, thereby building a concatenated bigger value, and use that value for set lookups. Sets are changed to use variable sized extensions for their key and data values, removing the fixed limit of 16 bytes while saving memory if less space is needed. As a side effect, these patches will allow some nice optimizations in the future, like using jhash2 in nft_hash, removing the masking in nft_cmp_fast, optimized data comparison using 32 bit word size etc. These are not done so far however. The patches are split up as follows: * the first five patches add length validation to register loads and stores to make sure we stay within bounds and prepare the validation functions for the new addressing mode * the next patches prepare for changing to 32 bit addressing by introducing a struct nft_regs, which holds the verdict register as well as the data registers. The verdict members are moved to a new struct nft_verdict to allow to pull struct nft_data out of the stack. * the next patches contain preparatory conversions of expressions and sets to use 32 bit addressing * the next patch introduces so far unused register conversion helpers for parsing and dumping register numbers over netlink * following is the real conversion to 32 bit addressing, consisting of replacing struct nft_data in struct nft_regs by an array of u32s and actually translating and validating the new register numbers. * the final two patches add support for variable sized data items and variable sized keys / data in set elements The patches have been verified to work correctly with nft binaries using both old and new addressing. ==================== Patrick McHardy says: ==================== netfilter: nf_tables: dynamic stateful expression instantiation The following patches are the grand finale of my nf_tables set work, using all the building blocks put in place by the previous patches to support something like iptables hashlimit, but a lot more powerful. Sets are extended to allow attaching expressions to set elements. The dynset expression dynamically instantiates these expressions based on a template when creating new set elements and evaluates them for all new or updated set members. In combination with concatenations this effectively creates state tables for arbitrary combinations of keys, using the existing expression types to maintain that state. Regular set GC takes care of purging expired states. We currently support two different stateful expressions, counter and limit. Using limit as a template we can express the functionality of hashlimit, but completely unrestricted in the combination of keys. Using counter we can perform accounting for arbitrary flows. The following examples from patch 5/5 show some possibilities. Userspace syntax is still WIP, especially the listing of state tables will most likely be seperated from normal set listings and use a more structured format: 1. Limit the rate of new SSH connections per host, similar to iptables hashlimit: flow ip saddr timeout 60s \ limit 10/second \ accept 2. Account network traffic between each set of /24 networks: flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \ counter 3. Account traffic to each host per user: flow skuid . ip daddr \ counter 4. Account traffic for each combination of source address and TCP flags: flow ip saddr . tcp flags \ counter The resulting set content after a Xmas-scan look like this: { 192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040, 192.168.122.1 . ack : counter packets 74 bytes 3848, 192.168.122.1 . psh | ack : counter packets 35 bytes 3144 } In the future the "expressions attached to elements" will be extended to also support user created non-stateful expressions to allow to efficiently select beween a set of parameter sets, f.i. a set of log statements with different prefixes based on the interface, which currently require one rule each. This will most likely have to wait until the next kernel version though. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller9-17/+50
The dwmac-socfpga.c conflict was a case of a bug fix overlapping changes in net-next to handle an error pointer differently. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13Merge branch 'for-davem' of ↵David S. Miller10-111/+196
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Al Viro says: ==================== netdev-related stuff in vfs.git There are several commits sitting in vfs.git that probably ought to go in via net-next.git. First of all, there's merge with vfs.git#iocb - that's Christoph's aio rework, which has triggered conflicts with the ->sendmsg() and ->recvmsg() patches a while ago. It's not so much Christoph's stuff that ought to be in net-next, as (pretty simple) conflict resolution on merge. The next chunk is switch to {compat_,}import_iovec/import_single_range - new safer primitives for initializing iov_iter. The primitives themselves come from vfs/git#iov_iter (and they are used quite a lot in vfs part of queue), conversion of net/socket.c syscalls belongs in net-next, IMO. Next there's afs and rxrpc stuff from dhowells. And then there's sanitizing kernel_sendmsg et.al. + missing inlined helper for "how much data is left in msg->msg_iter" - this stuff is used in e.g. cifs stuff, but it belongs in net-next. That pile is pullable from git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-davem I'll post the individual patches in there in followups; could you take a look and tell if everything in there is OK with you? ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13tcp/dccp: get rid of central timewait timerEric Dumazet10-288/+60
Using a timer wheel for timewait sockets was nice ~15 years ago when memory was expensive and machines had a single processor. This does not scale, code is ugly and source of huge latencies (Typically 30 ms have been seen, cpus spinning on death_lock spinlock.) We can afford to use an extra 64 bytes per timewait sock and spread timewait load to all cpus to have better behavior. Tested: On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1 on the target (lpaa24) Before patch : lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 419594 lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 437171 While test is running, we can observe 25 or even 33 ms latencies. lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2 lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2 After patch : About 90% increase of throughput : lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 810442 lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 800992 And latencies are kept to minimal values during this load, even if network utilization is 90% higher : lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13netfilter: Fix format string of nfnetlink_log proc fileRichard Weinberger1-1/+1
The printed values are all of type unsigned integer, therefore use %u instead of %d. Otherwise an user can face negative values. Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13netfilter: Fix format string of nfnetlink_queue proc fileRichard Weinberger1-1/+1
The printed values are all of type unsigned integer, therefore use %u instead of %d. Otherwise an user can face negative values. Fixes: $ cat /proc/net/netfilter/nfnetlink_queue 0 29508 278 2 65531 0 2004213241 -2129885586 1 1 -27747 0 2 65531 0 0 0 1 2 -27748 0 2 65531 0 0 0 1 Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13netfilter: Fix portid typesRichard Weinberger2-6/+5
The netlink portid is an unsigned integer, use this type also in netfilter. Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13nfc: Fix portid type in urelease_workRichard Weinberger1-1/+1
portid is an unsigned integer. Fix urelease_work to match all other portid user in the kernel. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13netfilter: nf_tables: get rid of the expression example codePablo Neira Ayuso1-94/+0
There's an example net/netfilter/nft_expr_template.c example file in tree that got out of sync along time, remove it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net>
2015-04-13netfilter: nft_dynset: dynamic stateful expression instantiationPatrick McHardy1-4/+50
Support instantiating stateful expressions based on a template that are associated with dynamically created set entries. The expressions are evaluated when adding or updating the set element. This allows to maintain per flow state using the existing set infrastructure and expression types, with arbitrary definitions of a flow. Usage is currently restricted to anonymous sets, meaning only a single binding can exist, since the desired semantics of multiple independant bindings haven't been defined so far. Examples (userspace syntax is still WIP): 1. Limit the rate of new SSH connections per host, similar to iptables hashlimit: flow ip saddr timeout 60s \ limit 10/second \ accept 2. Account network traffic between each set of /24 networks: flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \ counter 3. Account traffic to each host per user: flow skuid . ip daddr \ counter 4. Account traffic for each combination of source address and TCP flags: flow ip saddr . tcp flags \ counter The resulting set content after a Xmas-scan look like this: { 192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040, 192.168.122.1 . ack : counter packets 74 bytes 3848, 192.168.122.1 . psh | ack : counter packets 35 bytes 3144 } Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: add flag to indicate set contains expressionsPatrick McHardy2-2/+9
Add a set flag to indicate that the set is used as a state table and contains expressions for evaluation. This operation is mutually exclusive with the mapping operation, so sets specifying both are rejected. The lookup expression also rejects binding to state tables since it only deals with loopup and map operations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: mark stateful expressionsPatrick McHardy2-0/+2
Add a flag to mark stateful expressions. This is used for dynamic expression instanstiation to limit the usable expressions. Strictly speaking only the dynset expression can not be used in order to avoid recursion, but since dynamically instantiating non-stateful expressions will simply create an identical copy, which behaves no differently than the original, this limits to expressions where it actually makes sense to dynamically instantiate them. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: prepare for expressions associated to set elementsPatrick McHardy1-0/+9
Preparation to attach expressions to set elements: add a set extension type to hold an expression and dump the expression information with the set element. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: add helper functions for expression handlingPatrick McHardy1-5/+51
Add helper functions for initializing, cloning, dumping and destroying a single expression that is not part of a rule. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13tcp: fix bogus RTT for CC when retransmissions are ackedKenneth Klette Jonassen1-6/+4
Since retransmitted segments are not used for RTT estimation, previously SACKed segments present in the rtx queue are used. This estimation can be several times larger than the actual RTT. When a cumulative ack covers both previously SACKed and retransmitted segments, CC may thus get a bogus RTT. Such segments previously had an RTT estimation in tcp_sacktag_one(), so it seems reasonable to not reuse them in tcp_clean_rtx_queue() at all. Afaik, this has had no effect on SRTT/RTO because of Karn's check. Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13net: use jump label patching for ingress qdisc in __netif_receive_skb_coreDaniel Borkmann2-7/+33
Even if we make use of classifier and actions from the egress path, we're going into handle_ing() executing additional code on a per-packet cost for ingress qdisc, just to realize that nothing is attached on ingress. Instead, this can just be blinded out as a no-op entirely with the use of a static key. On input fast-path, we already make use of static keys in various places, e.g. skb time stamping, in RPS, etc. It makes sense to not waste time when we're assured that no ingress qdisc is attached anywhere. Enabling/disabling of that code path is being done via two helpers, namely net_{inc,dec}_ingress_queue(), that are being invoked under RTNL mutex when a ingress qdisc is being either initialized or destructed. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13netfilter: nf_tables: variable sized set element keys / dataPatrick McHardy3-18/+16
This patch changes sets to support variable sized set element keys / data up to 64 bytes each by using variable sized set extensions. This allows to use concatenations with bigger data items suchs as IPv6 addresses. As a side effect, small keys/data now don't require the full 16 bytes of struct nft_data anymore but just the space they need. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: support variable sized data in nft_data_init()Patrick McHardy4-16/+28
Add a size argument to nft_data_init() and pass in the available space. This will be used by the following patches to support variable sized set element data. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: switch registers to 32 bit addressingPatrick McHardy14-39/+75
Switch the nf_tables registers from 128 bit addressing to 32 bit addressing to support so called concatenations, where multiple values can be concatenated over multiple registers for O(1) exact matches of multiple dimensions using sets. The old register values are mapped to areas of 128 bits for compatibility. When dumping register numbers, values are expressed using the old values if they refer to the beginning of a 128 bit area for compatibility. To support concatenations, register loads of less than a full 32 bit value need to be padded. This mainly affects the payload and exthdr expressions, which both unconditionally zero the last word before copying the data. Userspace fully passes the testsuite using both old and new register addressing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: add register parsing/dumping helpersPatrick McHardy14-53/+65
Add helper functions to parse and dump register values in netlink attributes. These helpers will later be changed to take care of translation between the old 128 bit and the new 32 bit register numbers. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: convert sets to u32 data pointersPatrick McHardy2-8/+6
Simple conversion to use u32 pointers to the beginning of the data area to keep follow up patches smaller. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: kill nft_data_cmp()Patrick McHardy3-8/+7
Only needlessly complicates things due to requiring specific argument types. Use memcmp directly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: convert expressions to u32 register pointersPatrick McHardy8-67/+63
Simple conversion to use u32 pointers to the beginning of the registers to keep follow up patches smaller. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: use struct nft_verdict within struct nft_dataPatrick McHardy1-17/+21
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: get rid of NFT_REG_VERDICT usagePatrick McHardy27-120/+115
Replace the array of registers passed to expressions by a struct nft_regs, containing the verdict as a seperate member, which aliases to the NFT_REG_VERDICT register. This is needed to seperate the verdict from the data registers completely, so their size can be changed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: introduce nft_validate_register_load()Patrick McHardy10-38/+61
Change nft_validate_input_register() to not only validate the input register number, but also the length of the load, and rename it to nft_validate_register_load() to reflect that change. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: kill nft_validate_output_register()Patrick McHardy10-76/+14
All users of nft_validate_register_store() first invoke nft_validate_output_register(). There is in fact no use for using it on its own, so simplify the code by folding the functionality into nft_validate_register_store() and kill it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nft_lookup: use nft_validate_register_store() to validate typesPatrick McHardy2-8/+8
In preparation of validating the length of a register store, use nft_validate_register_store() in nft_lookup instead of open coding the validation. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: rename nft_validate_data_load()Patrick McHardy9-27/+29
The existing name is ambiguous, data is loaded as well when we read from a register. Rename to nft_validate_register_store() for clarity and consistency with the upcoming patch to introduce its counterpart, nft_validate_register_load(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: validate len in nft_validate_data_load()Patrick McHardy9-45/+96
For values spanning multiple registers, we need to validate that enough space is available from the destination register onwards. Add a len argument to nft_validate_data_load() and consolidate the existing length validations in preparation of that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-12Merge branch 'master' of ↵David S. Miller1-6/+26
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-04-11 This series contains updates to iflink, ixgbe and ixgbevf. The entire set of changes come from Vlad Zolotarov to ultimately add the ethtool ops to VF driver to allow querying the RSS indirection table and RSS random key. Currently we support only 82599 and x540 devices. On those devices, VFs share the RSS redirection table and hash key with a PF. Letting the VF query this information may introduce some security risks, therefore this feature will be disabled by default. The new netdev op allows a system administrator to change the default behaviour with "ip link set" command. The relevant iproute2 patch has already been sent and awaits for this series upstream. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-12fou: implement FOU_CMD_GETWANG Cong1-0/+109
Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-12fou: add network namespace supportWANG Cong1-39/+67
Also convert the spinlock to a mutex. Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-12fou: always use be16 for portWANG Cong1-3/+3
udp_config.local_udp_port is be16. And iproute2 passes network order for FOU_ATTR_PORT. This doesn't fix any bug, just for consistency. Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-12fou: exit early when parsing config failsWANG Cong1-1/+4
Not a big deal, just for corretness. Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-12fou: avoid calling udp_del_offload() twiceWANG Cong1-2/+2
This fixes the following harmless warning: ./ip/ip fou del port 7777 [ 122.907516] udp_del_offload: didn't find offload for port 7777 Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-12tcp: do not cache align timewait socketsEric Dumazet1-2/+1
With recent adoption of skc_cookie in struct sock_common, struct tcp_timewait_sock size increased from 192 to 200 bytes on 64bit arches. SLAB rounds then to 256 bytes. It is time to drop SLAB_HWCACHE_ALIGN constraint for twsk_slab. This saves about 12 MB of memory on typical configuration reaching 262144 timewait sockets, and has no noticeable impact on performance. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-12Merge tag 'mac80211-next-for-davem-2015-04-10' of ↵David S. Miller25-383/+1113
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== There isn't much left, but we have * new mac80211 internal software queue to allow drivers to have shorter hardware queues and pull on-demand * use rhashtable for mac80211 station table * minstrel rate control debug improvements and some refactoring * fix noisy message about TX power reduction * fix continuous message printing and activity if CRDA doesn't respond * fix VHT-related capabilities with "iw connect" or "iwconfig ..." * fix Kconfig for cfg80211 wireless extensions compatibility ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-11new helper: msg_data_left()Al Viro4-17/+16
convert open-coded instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11Merge remote-tracking branch 'dh/afs' into for-davemAl Viro4-29/+148
2015-04-11get rid of the size argument of sock_sendmsg()Al Viro2-14/+15
it's equal to iov_iter_count(&msg->msg_iter) in all cases Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-10if_link: Add an additional parameter to ifla_vf_info for RSS queryingVlad Zolotarov1-6/+26
Add configuration setting for drivers to allow/block an RSS Redirection Table and a Hash Key querying for discrete VFs. On some devices VF share the mentioned above information with PF and querying it may adduce a theoretical security risk. We want to let a system administrator to decide if he/she wants to take this risk or not. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-10rtnetlink: Mark name argument of rtnl_create_link() constThomas Graf1-1/+1
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-09Merge branch 'for-upstream' of ↵David S. Miller3-151/+258
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-04-09 We've had enough new patches during the past week (especially from Marcel) that it'd be good to still get these queued for 4.1. The majority of the changes are from Marcel with lots of cleanup & refactoring patches for the HCI UART driver. Marcel also split out some Broadcom & Intel vendor specific functionality into two new btintel & btbcm modules. In addition to the HCI driver changes there's the completion of our local OOB data interface for pairing, added support for requesting remote LE features when connecting, as well as a couple of minor fixes for mac802154. Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-09tcp: md5: fix a typo in tcp_v4_md5_lookup()Eric Dumazet1-2/+2
Lookup key for tcp_md5_do_lookup() has to be taken from addr_sk, not sk (which can be the listener) Fixes: fd3a154a00fb ("tcp: md5: get rid of tcp_v[46]_reqsk_md5_lookup()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-09net: Pass VLAN ID to rtnl_fdb_notify.Hubert Sokolowski1-10/+10
When an FDB entry is added or deleted the information about VLAN is not passed to listening applications like 'bridge monitor fdb'. With this patch VLAN ID is passed if it was set in the original netlink message. Also remove an unused bdev variable. Signed-off-by: Hubert Sokolowski <hubert.sokolowski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-09tcp: tcp_make_synack() should clear skb->tstampEric Dumazet1-0/+2
I noticed tcpdump was giving funky timestamps for locally generated SYNACK messages on loopback interface. 11:42:46.938990 IP 127.0.0.1.48245 > 127.0.0.2.23850: S 945476042:945476042(0) win 43690 <mss 65495,nop,nop,sackOK,nop,wscale 7> 20:28:58.502209 IP 127.0.0.2.23850 > 127.0.0.1.48245: S 3160535375:3160535375(0) ack 945476043 win 43690 <mss 65495,nop,nop,sackOK,nop,wscale 7> This is because we need to clear skb->tstamp before entering lower stack, otherwise net_timestamp_check() does not set skb->tstamp. Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-09udptunnels: Call handle_offloads after inserting vlan tag.Jesse Gross1-4/+4
handle_offloads() calls skb_reset_inner_headers() to store the layer pointers to the encapsulated packet. However, we currently push the vlag tag (if there is one) onto the packet afterwards. This changes the MAC header for the encapsulated packet but it is not reflected in skb->inner_mac_header, which breaks GSO and drivers which attempt to use this for encapsulation offloads. Fixes: 1eaa8178 ("vxlan: Add tx-vlan offload support.") Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller18-173/+747
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. They are: * nf_tables set timeout infrastructure from Patrick Mchardy. 1) Add support for set timeout support. 2) Add support for set element timeouts using the new set extension infrastructure. 4) Add garbage collection helper functions to get rid of stale elements. Elements are accumulated in a batch that are asynchronously released via RCU when the batch is full. 5) Add garbage collection synchronization helpers. This introduces a new element busy bit to address concurrent access from the netlink API and the garbage collector. 5) Add timeout support for the nft_hash set implementation. The garbage collector peridically checks for stale elements from the workqueue. * iptables/nftables cgroup fixes: 6) Ignore non full-socket objects from the input path, otherwise cgroup match may crash, from Daniel Borkmann. 7) Fix cgroup in nf_tables. 8) Save some cycles from xt_socket by skipping packet header parsing when skb->sk is already set because of early demux. Also from Daniel. * br_netfilter updates from Florian Westphal. 9) Save frag_max_size and restore it from the forward path too. 10) Use a per-cpu area to restore the original source MAC address when traffic is DNAT'ed. 11) Add helper functions to access physical devices. 12) Use these new physdev helper function from xt_physdev. 13) Add another nf_bridge_info_get() helper function to fetch the br_netfilter state information. 14) Annotate original layer 2 protocol number in nf_bridge info, instead of using kludgy flags. 15) Also annotate the pkttype mangling when the packet travels back and forth from the IP to the bridge layer, instead of using a flag. * More nf_tables set enhancement from Patrick: 16) Fix possible usage of set variant that doesn't support timeouts. 17) Avoid spurious "set is full" errors from Netlink API when there are pending stale elements scheduled to be released. 18) Restrict loop checks to set maps. 19) Add support for dynamic set updates from the packet path. 20) Add support to store optional user data (eg. comments) per set element. BTW, I have also pulled net-next into nf-next to anticipate the conflict resolution between your okfn() signature changes and Florian's br_netfilter updates. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-09Merge branch 'master' of ↵David S. Miller1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2015-04-09 1) Prohibit the use/abuse of the xfrm netlink interface on 32/64 bit compatibility tasks. We need a full compat layer before we can allow this. From Fan Du. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>