summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2017-12-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-2/+3
Daniel Borkmann says: ==================== pull-request: bpf-next 2017-12-28 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Fix incorrect state pruning related to recognition of zero initialized stack slots, where stacksafe exploration would mistakenly return a positive pruning verdict too early ignoring other slots, from Gianluca. 2) Various BPF to BPF calls related follow-up fixes. Fix an off-by-one in maximum call depth check, and rework maximum stack depth tracking logic to fix a bypass of the total stack size check reported by Jann. Also fix a bug in arm64 JIT where prog->jited_len was uninitialized. Addition of various test cases to BPF selftests, from Alexei. 3) Addition of a BPF selftest to test_verifier that is related to BPF to BPF calls which demonstrates a late caller stack size increase and thus out of bounds access. Fixed above in 2). Test case from Jann. 4) Addition of correlating BPF helper calls, BPF to BPF calls as well as BPF maps to bpftool xlated dump in order to allow for better BPF program introspection and debugging, from Daniel. 5) Fixing several bugs in BPF to BPF calls kallsyms handling in order to get it actually to work for subprogs, from Daniel. 6) Extending sparc64 JIT support for BPF to BPF calls and fix a couple of build errors for libbpf on sparc64, from David. 7) Allow narrower context access for BPF dev cgroup typed programs in order to adapt to LLVM code generation. Also adjust memlock rlimit in the test_dev_cgroup BPF selftest, from Yonghong. 8) Add netdevsim Kconfig entry to BPF selftests since test_offload.py relies on netdevsim device being available, from Jakub. 9) Reduce scope of xdp_do_generic_redirect_map() to being static, from Xiongwei. 10) Minor cleanups and spelling fixes in BPF verifier, from Colin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27tcp: do not allocate linear memory for zerocopy skbsWillem de Bruijn1-4/+7
Zerocopy payload is now always stored in frags, and space for headers is reversed, so this memory is unused. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27tcp: place all zerocopy payload in fragsWillem de Bruijn1-4/+5
This avoids an unnecessary copy of 1-2KB and improves tso_fragment, which has to fall back to tcp_fragment if skb->len != skb_data_len. It also avoids a surprising inconsistency in notifications: Zerocopy packets sent over loopback have their frags copied, so set SO_EE_CODE_ZEROCOPY_COPIED in the notification. But this currently does not happen for small packets, because when all data fits in the linear fragment, data is not copied in skb_orphan_frags_rx. Reported-by: Tom Deseyn <tom.deseyn@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27tcp: push full zerocopy packetsWillem de Bruijn1-1/+3
Skbs that reach MAX_SKB_FRAGS cannot be extended further. Do the same for zerocopy frags as non-zerocopy frags and set the PSH bit. This improves GRO assembly. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27skbuff: in skb_segment, call zerocopy functions once per nskbWillem de Bruijn1-5/+9
This is a net-next follow-up to commit 268b79067942 ("skbuff: orphan frags before zerocopy clone"), which fixed a bug in net, but added a call to skb_zerocopy_clone at each frag to do so. When segmenting skbs with user frags, either the user frags must be replaced with private copies and uarg released, or the uarg must have its refcount increased for each new skb. skb_orphan_frags does the first, except for cases that can handle reference counting. skb_zerocopy_clone then does the second. Call these once per nskb, instead of once per frag. That is, in the common case. With a frag list, also refresh when the origin skb (frag_skb) changes. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27rds: tcp: cleanup if kmem_cache_alloc fails in rds_tcp_conn_alloc()Sowmini Varadhan1-20/+26
If kmem_cache_alloc() fails in the middle of the for() loop, cleanup anything that might have been allocated so far. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27rds: tcp: initialize t_tcp_detached to falseSowmini Varadhan1-0/+1
Commit f10b4cff98c6 ("rds: tcp: atomically purge entries from rds_tcp_conn_list during netns delete") adds the field t_tcp_detached, but this needs to be initialized explicitly to false. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27rds; Reset rs->rs_bound_addr in rds_add_bound() failure pathSowmini Varadhan1-0/+1
If the rds_sock is not added to the bind_hash_table, we must reset rs_bound_addr so that rds_remove_bound will not trip on this rds_sock. rds_add_bound() does a rds_sock_put() in this failure path, so failing to reset rs_bound_addr will result in a socket refcount bug, and will trigger a WARN_ON with the stack shown below when the application subsequently tries to close the PF_RDS socket. WARNING: CPU: 20 PID: 19499 at net/rds/af_rds.c:496 \ rds_sock_destruct+0x15/0x30 [rds] : __sk_destruct+0x21/0x190 rds_remove_bound.part.13+0xb6/0x140 [rds] rds_release+0x71/0x120 [rds] sock_release+0x1a/0x70 sock_close+0xe/0x20 __fput+0xd5/0x210 task_work_run+0x82/0xa0 do_exit+0x2ce/0xb30 ? syscall_trace_enter+0x1cc/0x2b0 do_group_exit+0x39/0xa0 SyS_exit_group+0x10/0x10 do_syscall_64+0x61/0x1a0 Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27l2tp: add peer_offset parameterLorenzo Bianconi4-8/+37
Introduce peer_offset parameter in order to add the capability to specify two different values for payload offset on tx/rx side. If just offset is provided by userspace use it for rx side as well in order to maintain compatibility with older l2tp versions Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27l2tp: fix missing print session offset infoHangbin Liu1-0/+2
Report offset parameter in L2TP_CMD_SESSION_GET command if it has been configured by userspace Fixes: 309795f4bec ("l2tp: Add netlink control API for L2TP") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27Merge branch 'master' of ↵David S. Miller10-176/+283
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-12-22 1) Separate ESP handling from segmentation for GRO packets. This unifies the IPsec GSO and non GSO codepath. 2) Add asynchronous callbacks for xfrm on layer 2. This adds the necessary infrastructure to core networking. 3) Allow to use the layer2 IPsec GSO codepath for software crypto, all infrastructure is there now. 4) Also allow IPsec GSO with software crypto for local sockets. 5) Don't require synchronous crypto fallback on IPsec offloading, it is not needed anymore. 6) Check for xdo_dev_state_free and only call it if implemented. From Shannon Nelson. 7) Check for the required add and delete functions when a driver registers xdo_dev_ops. From Shannon Nelson. 8) Define xfrmdev_ops only with offload config. From Shannon Nelson. 9) Update the xfrm stats documentation. From Shannon Nelson. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26net: erspan: remove md NULL checkWilliam Tu2-9/+0
The 'md' is allocated from 'tun_dst = ip_tun_rx_dst' and since we've checked 'tun_dst', 'md' will never be NULL. The patch removes it at both ipv4 and ipv6 erspan. Fixes: afb4c97d90e6 ("ip6_gre: fix potential memory leak in ip6erspan_rcv") Fixes: 50670b6ee9bc ("ip_gre: fix potential memory leak in erspan_rcv") Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26tcp: md5: Handle RCU dereference of md5sig_infoMat Martineau1-1/+1
Dereference tp->md5sig_info in tcp_v4_destroy_sock() the same way it is done in the adjacent call to tcp_clear_md5_list(). Resolves this sparse warning: net/ipv4/tcp_ipv4.c:1914:17: warning: incorrect type in argument 1 (different address spaces) net/ipv4/tcp_ipv4.c:1914:17: expected struct callback_head *head net/ipv4/tcp_ipv4.c:1914:17: got struct callback_head [noderef] <asn:4>*<noident> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26ipv6: Reinject IPv6 packets if IPsec policy matches after SNATTobias Brunner1-0/+8
If SNAT modifies the source address the resulting packet might match an IPsec policy, reinject the packet if that's the case. The exact same thing is already done for IPv4. Signed-off-by: Tobias Brunner <tobias@strongswan.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller22-132/+225
Lots of overlapping changes. Also on the net-next side the XDP state management is handled more in the generic layers so undo the 'net' nfp fix which isn't applicable in net-next. Include a necessary change by Jakub Kicinski, with log message: ==================== cls_bpf no longer takes care of offload tracking. Make sure netdevsim performs necessary checks. This fixes a warning caused by TC trying to remove a filter it has not added. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21flow_dissector: Parse batman-adv unicast headersSven Eckelmann1-0/+57
The batman-adv unicast packets contain a full layer 2 frame in encapsulated form. The flow dissector must therefore be able to parse the batman-adv unicast header to reach the layer 2+3 information. +--------------------+ | ip(v6)hdr | +--------------------+ | inner ethhdr | +--------------------+ | batadv unicast hdr | +--------------------+ | outer ethhdr | +--------------------+ The obtained information from the upper layer can then be used by RPS to schedule the processing on separate cores. This allows better distribution of multiple flows from the same neighbor to different cores. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21batman-adv: Convert packet.h to uapi headerSven Eckelmann25-670/+24
The header file is used by different userspace programs to inject packets or to decode sniffed packets. It should therefore be available to them as userspace header. Also other components in the kernel (like the flow dissector) require access to the packet definitions to be able to decode ETH_P_BATMAN ethernet packets. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21batman-adv: Remove kernel fixed width types in packet.hSven Eckelmann1-107/+107
The uapi headers use the __u8/__u16/... version of the fixed width types instead of u8/u16/... The use of the latter must be avoided before packet.h is copied to include/uapi/linux/. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21batman-adv: Remove usage of BIT(x) in packet.hSven Eckelmann1-12/+11
The BIT(x) macro is no longer available for uapi headers because it is defined outside of it (linux/bitops.h). The use of it must therefore be avoided and replaced by an appropriate other representation. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21batman-adv: Let packet.h include its headers directlySven Eckelmann2-2/+2
The headers used by packet.h should also be included by it directly. main.h is currently dealing with it in batman-adv, but this will no longer work when this header is moved to include/uapi/linux/. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21skbuff: skb_copy_ubufs must release uarg even without user fragsWillem de Bruijn1-1/+2
skb_copy_ubufs creates a private copy of frags[] to release its hold on user frags, then calls uarg->callback to notify the owner. Call uarg->callback even when no frags exist. This edge case can happen when zerocopy_sg_from_iter finds enough room in skb_headlen to copy all the data. Fixes: 3ece782693c4 ("sock: skb_copy_ubufs support for compound pages") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21skbuff: orphan frags before zerocopy cloneWillem de Bruijn1-2/+2
Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate calls to skb_uarg(skb)->callback for the same data. skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb with each segment. This is only safe for uargs that do refcounting, which is those that pass skb_orphan_frags without dropping their shared frags. For others, skb_orphan_frags drops the user frags and sets the uarg to NULL, after which sock_zerocopy_clone has no effect. Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback calls for the same data causing the vhost_net_ubuf_ref_>refcount to drop below zero. Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com> Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") Reported-by: Andreas Hartmann <andihartmann@01019freenet.de> Reported-by: David Hill <dhill@redhat.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: reevalulate autoflowlabel setting after sysctl settingShaohua Li3-3/+11
sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2. If sockopt doesn't set autoflowlabel, outcome packets from the hosts are supposed to not include flowlabel. This is true for normal packet, but not for reset packet. The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't changed, so the sock will keep the old behavior in terms of auto flowlabel. Reset packet is suffering from this problem, because reset packet is sent from a special control socket, which is created at boot time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control socket will always have its ipv6_pinfo.autoflowlabel set, even after user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always have flowlabel. Normal sock created before sysctl setting suffers from the same issue. We can't even turn off autoflowlabel unless we kill all socks in the hosts. To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the autoflowlabel setting from user, otherwise we always call ip6_default_np_autolabel() which has the new settings of sysctl. Note, this changes behavior a little bit. Before commit 42240901f7c4 (ipv6: Implement different admin modes for automatic flow labels), the autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes, existing connection will change autoflowlabel behavior. After that commit, autoflowlabel behavior is sticky in the whole life of the sock. With this patch, the behavior isn't sticky again. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Tom Herbert <tom@quantonium.net> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21openvswitch: Fix pop_vlan action for double tagged framesEric Garver1-3/+12
skb_vlan_pop() expects skb->protocol to be a valid TPID for double tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop() shift the true ethertype into position for us. Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets") Signed-off-by: Eric Garver <e@erig.me> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: sch_drr: add extack supportAlexander Aring1-5/+14
This patch adds extack support for the drr qdisc implementation by adding NL_SET_ERR_MSG in validation of user input. Also it serves to illustrate a use case of how the infrastructure ops api changes are to be used by individual qdiscs. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: sch_cbs: add extack supportAlexander Aring1-7/+16
This patch adds extack support for the cbs qdisc implementation by adding NL_SET_ERR_MSG in validation of user input. Also it serves to illustrate a use case of how the infrastructure ops api changes are to be used by individual qdiscs. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: sch_cbq: add extack supportAlexander Aring1-12/+34
This patch adds extack support for the cbq qdisc implementation by adding NL_SET_ERR_MSG in validation of user input. Also it serves to illustrate a use case of how the infrastructure ops api changes are to be used by individual qdiscs. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: api: add extack support in qdisc_create_dfltAlexander Aring16-38/+52
This patch adds extack support for the function qdisc_create_dflt which is a common used function in the tc subsystem. Callers which are interested in the receiving error can assign extack to get a more detailed information why qdisc_create_dflt failed. The function qdisc_create_dflt will also call an init callback which can fail by any per-qdisc specific handling. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: api: add extack support in qdisc_allocAlexander Aring2-3/+5
This patch adds extack support for the function qdisc_alloc which is a common used function in the tc subsystem. Callers which are interested in the receiving error can assign extack to get a more detailed information why qdisc_alloc failed. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: api: add extack support in tcf_block_getAlexander Aring14-23/+32
This patch adds extack support for the function tcf_block_get which is a common used function in the tc subsystem. Callers which are interested in the receiving error can assign extack to get a more detailed information why tcf_block_get failed. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: api: add extack support in qdisc_get_rtabAlexander Aring5-11/+21
This patch adds extack support for the function qdisc_get_rtab which is a common used function in the tc subsystem. Callers which are interested in the receiving error can assign extack to get a more detailed information why qdisc_get_rtab failed. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: sch: add extack for graft callbackAlexander Aring16-16/+21
This patch adds extack support for graft callback to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: sch: add extack for block callbackAlexander Aring15-17/+31
This patch adds extack support for block callback to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: sch: add extack to change classAlexander Aring10-10/+16
This patch adds extack support for class change callback api. This prepares to handle extack support inside each specific class implementation. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: sch: add extack for change qdisc opsAlexander Aring18-39/+50
This patch adds extack support for change callback for qdisc ops structtur to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: sch: add extack for init callbackAlexander Aring30-36/+72
This patch adds extack support for init callback to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: sch_api: handle generic qdisc errorsAlexander Aring1-43/+105
This patch adds extack support for generic qdisc handling. The extack will be set deeper to each called function which is not part of netdev core api. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: fix coding style issuesAlexander Aring6-13/+14
This patch fix checkpatch issues for upcomming patches according to the sched api file. It changes mostly how to check on null pointer. Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21ipv6: Honor specified parameters in fibmatch lookupIdo Schimmel1-8/+11
Currently, parameters such as oif and source address are not taken into account during fibmatch lookup. Example (IPv4 for reference) before patch: $ ip -4 route show 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1 $ ip -6 route show 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium 2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium fe80::/64 dev dummy0 proto kernel metric 256 pref medium fe80::/64 dev dummy1 proto kernel metric 256 pref medium $ ip -4 route get fibmatch 192.0.2.2 oif dummy0 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 $ ip -4 route get fibmatch 192.0.2.2 oif dummy1 RTNETLINK answers: No route to host $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium After: $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 RTNETLINK answers: Network is unreachable The problem stems from the fact that the necessary route lookup flags are not set based on these parameters. Instead of duplicating the same logic for fibmatch, we can simply resolve the original route from its copy and dump it instead. Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib result when requested") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21xfrm: check for xdo_dev_ops add and deleteShannon Nelson1-13/+18
This adds a check for the required add and delete functions up front at registration time to be sure both are defined. Since both the features check and the registration check are looking at the same things, break out the check for both to call. Lastly, for some reason the feature check was setting xfrmdev_ops to NULL if the NETIF_F_HW_ESP bit was missing, which would probably surprise the driver later if the driver turned its NETIF_F_HW_ESP bit back on. We shouldn't be messing with the driver's callback list, so we stop doing that with this patch. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-20ipv4: Fix use-after-free when flushing FIB tablesIdo Schimmel1-2/+7
Since commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") the local table uses the same trie allocated for the main table when custom rules are not in use. When a net namespace is dismantled, the main table is flushed and freed (via an RCU callback) before the local table. In case the callback is invoked before the local table is iterated, a use-after-free can occur. Fix this by iterating over the FIB tables in reverse order, so that the main table is always freed after the local table. v3: Reworded comment according to Alex's suggestion. v2: Add a comment to make the fix more explicit per Dave's and Alex's feedback. Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20tipc: remove joining group member from congested listJon Maloy1-4/+2
When we receive a JOIN message from a peer member, the message may contain an advertised window value ADV_IDLE that permits removing the member in question from the tipc_group::congested list. However, since the removal has been made conditional on that the advertised window is *not* ADV_IDLE, we miss this case. This has the effect that a sender sometimes may enter a state of permanent, false, broadcast congestion. We fix this by unconditinally removing the member from the congested list before calling tipc_member_update(), which might potentially sort it into the list again. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20Merge tag 'batadv-next-for-davem-20171220' of ↵David S. Miller60-1343/+2833
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - de-inline hash functions to save memory footprint, by Denys Vlasenko - Add License information to various files, by Sven Eckelmann (3 patches) - Change batman_adv.h from ISC to MIT, by Sven Eckelmann - Improve various includes, by Sven Eckelmann (5 patches) - Lots of kernel-doc work by Sven Eckelmann (8 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: tracepoint: using sock_set_state tracepoint to trace SCTP state transitionYafang Shao3-9/+9
With changes in inet_ files, SCTP state transitions are traced with inet_sock_set_state tracepoint. As SCTP state names, i.e. SCTP_SS_CLOSED, SCTP_SS_ESTABLISHED, have the same value with TCP state names. So the output info still print the TCP state names, that makes the code easy. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: tracepoint: using sock_set_state tracepoint to trace DCCP state transitionYafang Shao1-1/+1
With changes in inet_ files, DCCP state transitions are traced with inet_sock_set_state tracepoint. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: sock: replace sk_state_load with inet_sk_state_load and remove ↵Yafang Shao5-6/+6
sk_state_store sk_state_load is only used by AF_INET/AF_INET6, so rename it to inet_sk_state_load and move it into inet_sock.h. sk_state_store is removed as it is not used any more. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state ↵Yafang Shao4-9/+19
tracepoint As sk_state is a common field for struct sock, so the state transition tracepoint should not be a TCP specific feature. Currently it traces all AF_INET state transition, so I rename this tracepoint to inet_sock_set_state tracepoint with some minor changes and move it into trace/events/sock.h. We dont need to create a file named trace/events/inet_sock.h for this one single tracepoint. Two helpers are introduced to trace sk_state transition - void inet_sk_state_store(struct sock *sk, int newstate); - void inet_sk_set_state(struct sock *sk, int state); As trace header should not be included in other header files, so they are defined in sock.c. The protocol such as SCTP maybe compiled as a ko, hence export inet_sk_set_state(). Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20ip6_gre: fix potential memory leak in ip6erspan_rcvHaishuang Yan1-1/+3
If md is NULL, tun_dst must be freed, otherwise it will cause memory leak. Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20ip_gre: fix potential memory leak in erspan_rcvHaishuang Yan1-1/+3
If md is NULL, tun_dst must be freed, otherwise it will cause memory leak. Fixes: 1a66a836da6 ("gre: add collect_md mode to ERSPAN tunnel") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20ip6_gre: fix error path when ip6erspan_rcv failedHaishuang Yan1-1/+2
Same as ipv4 code, when ip6erspan_rcv call return PACKET_REJECT, we should call icmpv6_send to send icmp unreachable message in error path. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Acked-by: William Tu <u9012063@gmail.com> Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>