Age | Commit message (Collapse) | Author | Files | Lines |
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next tree:
1) Support for matching on ipsec policy already set in the route, from
Florian Westphal.
2) Split set destruction into deactivate and destroy phase to make it
fit better into the transaction infrastructure, also from Florian.
This includes a patch to warn on imbalance when setting the new
activate and deactivate interfaces.
3) Release transaction list from the workqueue to remove expensive
synchronize_rcu() from configuration plane path. This speeds up
configuration plane quite a bit. From Florian Westphal.
4) Add new xfrm/ipsec extension, this new extension allows you to match
for ipsec tunnel keys such as source and destination address, spi and
reqid. From Máté Eckl and Florian Westphal.
5) Add secmark support, this includes connsecmark too, patches
from Christian Gottsche.
6) Allow to specify remaining bytes in xt_quota, from Chenbo Feng.
One follow up patch to calm a clang warning for this one, from
Nathan Chancellor.
7) Flush conntrack entries based on layer 3 family, from Kristian Evensen.
8) New revision for cgroups2 to shrink the path field.
9) Get rid of obsolete need_conntrack(), as a result from recent
demodularization works.
10) Use WARN_ON instead of BUG_ON, from Florian Westphal.
11) Unused exported symbol in nf_nat_ipv4_fn(), from Florian.
12) Remove superfluous check for timeout netlink parser and dump
functions in layer 4 conntrack helpers.
13) Unnecessary redundant rcu read side locks in NAT redirect,
from Taehee Yoo.
14) Pass nf_hook_state structure to error handlers, patch from
Florian Westphal.
15) Remove ->new() interface from layer 4 protocol trackers. Place
them in the ->packet() interface. From Florian.
16) Place conntrack ->error() handling in the ->packet() interface.
Patches from Florian Westphal.
17) Remove unused parameter in the pernet initialization path,
also from Florian.
18) Remove additional parameter to specify layer 3 protocol when
looking up for protocol tracker. From Florian.
19) Shrink array of layer 4 protocol trackers, from Florian.
20) Check for linear skb only once from the ALG NAT mangling
codebase, from Taehee Yoo.
21) Use rhashtable_walk_enter() instead of deprecated
rhashtable_walk_init(), also from Taehee.
22) No need to flush all conntracks when only one single address
is gone, from Tan Hu.
23) Remove redundant check for NAT flags in flowtable code, from
Taehee Yoo.
24) Use rhashtable_lookup() instead of rhashtable_lookup_fast()
from netfilter codebase, since rcu read lock side is already
assumed in this path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update inet_netconf_dump_devconf, inet6_netconf_dump_devconf, and
mpls_netconf_dump_devconf for strict data checking. If the flag is set,
the dump request is expected to have an netconfmsg struct as the header.
The struct only has the family member and no attributes can be appended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update ip6addrlbl_dump for strict data checking. If the flag is set,
the dump request is expected to have an ifaddrlblmsg struct as the
header. All elements of the struct are expected to be 0 and no
attributes can be appended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add helper to check netlink message for route dumps. If the strict flag
is set the dump request is expected to have an rtmsg struct as the header.
All elements of the struct are expected to be 0 with the exception of
rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes
can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX
set.
Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute,
and ip6mr_rtm_dumproute to call this helper if strict data checking is
enabled.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update inet6_dump_ifinfo for strict data checking. If the flag is
set, the dump request is expected to have an ifinfomsg struct as
the header. All elements of the struct are expected to be 0 and no
attributes can be appended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update inet6_dump_addr for strict data checking. If the flag is set, the
dump request is expected to have an ifaddrmsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values suppored by the dump handler are allowed to be
non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID
attribute is supported. Follow on patches can add support for other fields
(e.g., honor ifa_index and only return data for the given device index).
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull the inet6_fill_args arg up to in6_dump_addrs and move netnsid
into it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make sure extack is passed to nlmsg_parse where easy to do so.
Most of these are dump handlers and leveraging the extack in
the netlink_callback.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we
directly assign the dst to skb and set passed in dst to NULL to avoid
double free.
However, in error case, we free skb and then do stats update with the
dst pointer passed in. This causes use-after-free on the dst.
Fix it by taking rcu read lock right before dst could get released to
make sure dst does not get freed until the stats update is done.
Note: we don't have this issue in ipv4 cause dst is not used for stats
update in v4.
Syzkaller reported following crash:
BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088
CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg+0xd5/0x120 net/socket.c:631
___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
__sys_sendmsg+0x11d/0x280 net/socket.c:2152
__do_sys_sendmsg net/socket.c:2161 [inline]
__se_sys_sendmsg net/socket.c:2159 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457099
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099
RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000
Allocated by task 32088:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554
dst_alloc+0xbb/0x1d0 net/core/dst.c:105
ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353
ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186
ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895
ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093
fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122
ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121
ip6_route_output include/net/ip6_route.h:88 [inline]
ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951
ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079
rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905
inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg+0xd5/0x120 net/socket.c:631
___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
__sys_sendmsg+0x11d/0x280 net/socket.c:2152
__do_sys_sendmsg net/socket.c:2161 [inline]
__se_sys_sendmsg net/socket.c:2159 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 5356:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kmem_cache_free+0x83/0x290 mm/slab.c:3756
dst_destroy+0x267/0x3c0 net/core/dst.c:141
dst_destroy_rcu+0x16/0x19 net/core/dst.c:154
__rcu_reclaim kernel/rcu/rcu.h:236 [inline]
rcu_do_batch kernel/rcu/tree.c:2576 [inline]
invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline]
__rcu_process_callbacks kernel/rcu/tree.c:2847 [inline]
rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864
__do_softirq+0x30b/0xad8 kernel/softirq.c:292
Fixes: 1789a640f556 ("raw: avoid two atomics in xmit")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case ip_fib_metrics_init() returns an error, we better
rewrite rt->fib6_metrics with &dst_default_metrics so that
we do not crash later in ip_fib_metrics_put()
Fixes: 767a2217533f ("net: common metrics init helper for FIB entries")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Avoid the socket lookup cost in udp_gro_receive if no socket has a
udp tunnel callback configured.
udp_sk(sk)->gro_receive requires a registration with
setup_udp_tunnel_sock, which enables the static key.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the refcounting and potential free of dst metrics associated
for ipv4 and ipv6 to a common helper.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ipv4 and ipv6 both use refcounted metrics if FIB entries have metrics set.
Move the common initialization code to a helper and use for both protocols.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the refcounting and potential free of dst metrics associated
with a fib entry to a helper and use it in both ipv4 and ipv6.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Consolidate initialization of ipv4 and ipv6 metrics when fib entries
are created into a single helper, ip_fib_metrics_init, that handles
the call to ip_metrics_convert.
If no metrics are defined for the fib entry, then the metrics is set
to dst_default_metrics.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net'
overlapped the renaming of a netlink attribute in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The code to obtain the correct table for the incoming interface was
missing for IPv6. This has been added along with the table creation
notification to fib rules for the RTNL_FAMILY_IP6MR address family.
Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
(allows for better compiler optimization)
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
(allows for better compiler optimization)
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
(allows for better compiler optimization)
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
(the parameter in question is mark)
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
memset
(allows for better compiler optimization)
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
(allows for better compiler optimization)
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2018-10-01
1) Make xfrmi_get_link_net() static to silence a sparse warning.
From Wei Yongjun.
2) Remove a unused esph pointer definition in esp_input().
From Haishuang Yan.
3) Allow the NIC driver to quietly refuse xfrm offload
in case it does not support it, the SA is created
without offload in this case.
From Shannon Nelson.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2018-10-01
1) Validate address prefix lengths in the xfrm selector,
otherwise we may hit undefined behaviour in the
address matching functions if the prefix is too
big for the given address family.
2) Fix skb leak on local message size errors.
From Thadeu Lima de Souza Cascardo.
3) We currently reset the transport header back to the network
header after a transport mode transformation is applied. This
leads to an incorrect transport header when multiple transport
mode transformations are applied. Reset the transport header
only after all transformations are already applied to fix this.
From Sowmini Varadhan.
4) We only support one offloaded xfrm, so reset crypto_done after
the first transformation in xfrm_input(). Otherwise we may call
the wrong input method for subsequent transformations.
From Sowmini Varadhan.
5) Fix NULL pointer dereference when skb_dst_force clears the dst_entry.
skb_dst_force does not really force a dst refcount anymore, it might
clear it instead. xfrm code did not expect this, add a check to not
dereference skb_dst() if it was cleared by skb_dst_force.
6) Validate xfrm template mode, otherwise we can get a stack-out-of-bounds
read in xfrm_state_find. From Sean Tranchetti.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
deleted on device
We configured iptables as below, which only allowed incoming data on
established connections:
iptables -t mangle -A PREROUTING -m state --state ESTABLISHED -j ACCEPT
iptables -t mangle -P PREROUTING DROP
When deleting a secondary address, current masquerade implements would
flush all conntracks on this device. All the established connections on
primary address also be deleted, then subsequent incoming data on the
connections would be dropped wrongly because it was identified as NEW
connection.
So when an address was delete, it should only flush connections related
with the address.
Signed-off-by: Tan Hu <tan.hu@zte.com.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
(the parameters in question are mark and flow_flags)
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
(the parameters in question are mark and flow_flags)
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The change to move metrics from the dst to rt6_info moved the call
to ip6_convert_metrics from ip6_route_add to ip6_route_info_create. In
doing so it makes the call in ip6_route_info_append redundant and
actually leaks the metrics installed as part of the ip6_route_info_create.
Remove the now unnecessary call.
Fixes: d4ead6b34b67f ("net/ipv6: move metrics from dst to rt6_info")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Version bump conflict in batman-adv, take what's in net-next.
iavf conflict, adjustment of netdev_ops in net-next conflicting
with poll controller method removal in net.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, ip[6]frag_high_thresh sysctl values in new namespaces are
hard-limited to those of the root/init ns.
There are at least two use cases when it would be desirable to
set the high_thresh values higher in a child namespace vs the global hard
limit:
- a security/ddos protection policy may lower the thresholds in the
root/init ns but allow for a special exception in a child namespace
- testing: a test running in a namespace may want to set these
thresholds higher in its namespace than what is in the root/init ns
The new behavior:
# ip netns add testns
# ip netns exec testns bash
# sysctl -w net.ipv4.ipfrag_high_thresh=9000000
net.ipv4.ipfrag_high_thresh = 9000000
# sysctl net.ipv4.ipfrag_high_thresh
net.ipv4.ipfrag_high_thresh = 9000000
# sysctl -w net.ipv6.ip6frag_high_thresh=9000000
net.ipv6.ip6frag_high_thresh = 9000000
# sysctl net.ipv6.ip6frag_high_thresh
net.ipv6.ip6frag_high_thresh = 9000000
The old behavior:
# ip netns add testns
# ip netns exec testns bash
# sysctl -w net.ipv4.ipfrag_high_thresh=9000000
net.ipv4.ipfrag_high_thresh = 9000000
# sysctl net.ipv4.ipfrag_high_thresh
net.ipv4.ipfrag_high_thresh = 4194304
# sysctl -w net.ipv6.ip6frag_high_thresh=9000000
net.ipv6.ip6frag_high_thresh = 9000000
# sysctl net.ipv6.ip6frag_high_thresh
net.ipv6.ip6frag_high_thresh = 4194304
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is similar to how ipv4 now behaves:
commit 0ff89efb5246 ("ip: fail fast on IP defrag errors").
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly
handle starting/stopping the iteration. The problem is that at some point
during the iteration, an overflow is detected and the process is
subsequently stopped. The item being shown via seq_printf() when the
overflow occurs is not actually shown, though. When start() is
subsequently called to resume iterating, it returns the next item, and
thus the item that was being processed when the overflow occurred never
gets printed.
Alter the meaning of the private data member "offset". Currently, when it
is not 0 (which only happens at the very beginning), "offset" represents
the next hlist item to be printed. After this change, "offset" always
represents the current item.
This is also consistent with the private data member "bucket", which
represents the current bucket, and also the use of "pos" as defined in
seq_file.txt:
The pos passed to start() will always be either zero, or the most
recent pos used in the previous session.
Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
kfree_skb has taken the null pointer into account. hence it is safe
to remove the redundant null pointer check before kfree_skb.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
the ip6 tunnel xmit ndo assumes that the processed skb always
contains an ip[v6] header, but syzbot has found a way to send
frames that fall short of this assumption, leading to the following splat:
BUG: KMSAN: uninit-value in ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307
[inline]
BUG: KMSAN: uninit-value in ip6_tnl_start_xmit+0x7d2/0x1ef0
net/ipv6/ip6_tunnel.c:1390
CPU: 0 PID: 4504 Comm: syz-executor558 Not tainted 4.16.0+ #87
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:53
kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
__msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307 [inline]
ip6_tnl_start_xmit+0x7d2/0x1ef0 net/ipv6/ip6_tunnel.c:1390
__netdev_start_xmit include/linux/netdevice.h:4066 [inline]
netdev_start_xmit include/linux/netdevice.h:4075 [inline]
xmit_one net/core/dev.c:3026 [inline]
dev_hard_start_xmit+0x5f1/0xc70 net/core/dev.c:3042
__dev_queue_xmit+0x27ee/0x3520 net/core/dev.c:3557
dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
packet_snd net/packet/af_packet.c:2944 [inline]
packet_sendmsg+0x7c70/0x8a30 net/packet/af_packet.c:2969
sock_sendmsg_nosec net/socket.c:630 [inline]
sock_sendmsg net/socket.c:640 [inline]
___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
__sys_sendmmsg+0x42d/0x800 net/socket.c:2136
SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
SyS_sendmmsg+0x63/0x90 net/socket.c:2162
do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x441819
RSP: 002b:00007ffe58ee8268 EFLAGS: 00000213 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441819
RDX: 0000000000000002 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 00000000006cd018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000402510
R13: 00000000004025a0 R14: 0000000000000000 R15: 0000000000000000
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
slab_post_alloc_hook mm/slab.h:445 [inline]
slab_alloc_node mm/slub.c:2737 [inline]
__kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
__kmalloc_reserve net/core/skbuff.c:138 [inline]
__alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
alloc_skb include/linux/skbuff.h:984 [inline]
alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
packet_alloc_skb net/packet/af_packet.c:2803 [inline]
packet_snd net/packet/af_packet.c:2894 [inline]
packet_sendmsg+0x6454/0x8a30 net/packet/af_packet.c:2969
sock_sendmsg_nosec net/socket.c:630 [inline]
sock_sendmsg net/socket.c:640 [inline]
___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
__sys_sendmmsg+0x42d/0x800 net/socket.c:2136
SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
SyS_sendmmsg+0x63/0x90 net/socket.c:2162
do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
This change addresses the issue adding the needed check before
accessing the inner header.
The ipv4 side of the issue is apparently there since the ipv4 over ipv6
initial support, and the ipv6 side predates git history.
Fixes: c4d3efafcc93 ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+3fde91d4d394747d6db4@syzkaller.appspotmail.com
Tested-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no way currently for an IPv6 client connect using a loopback
address in a VRF, whereas for IPv4 the loopback address can be added:
$ sudo ip addr add dev vrfred 127.0.0.1/8
$ sudo ip -6 addr add ::1/128 dev vrfred
RTNETLINK answers: Cannot assign requested address
So allow ::1 to be configured on an L3 master device. In order for
this to be usable ip_route_output_flags needs to not consider ::1 to
be a link scope address (since oif == l3mdev and so it would be
dropped), and ipv6_rcv needs to consider the l3mdev to be a loopback
device so that it doesn't drop the packets.
Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When dst->_metrics and f6i->fib6_metrics share the same memory, both
take reference count on the dst_metrics structure. However, when dst is
destroyed, ip6_dst_destroy() only invokes dst_destroy_metrics_generic()
which does not take care of READONLY metrics and does not release refcnt.
This causes memory leak.
Similar to ipv4 logic, the fix is to properly release refcnt and free
the memory space pointed by dst->_metrics if refcnt becomes 0.
Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit e70a3aad44cc8b24986687ffc98c4a4f6ecf25ea.
This change causes use-after-free on dst->_metrics.
The crash trace looks like this:
[ 97.763269] BUG: KASAN: use-after-free in ip6_mtu+0x116/0x140
[ 97.769038] Read of size 4 at addr ffff881781d2cf84 by task svw_NetThreadEv/8801
[ 97.777954] CPU: 76 PID: 8801 Comm: svw_NetThreadEv Not tainted 4.15.0-smp-DEV #11
[ 97.777956] Hardware name: Default string Default string/Indus_QC_02, BIOS 5.46.4 03/29/2018
[ 97.777957] Call Trace:
[ 97.777971] [<ffffffff895709db>] dump_stack+0x4d/0x72
[ 97.777985] [<ffffffff881651df>] print_address_description+0x6f/0x260
[ 97.777997] [<ffffffff88165747>] kasan_report+0x257/0x370
[ 97.778001] [<ffffffff894488e6>] ? ip6_mtu+0x116/0x140
[ 97.778004] [<ffffffff881658b9>] __asan_report_load4_noabort+0x19/0x20
[ 97.778008] [<ffffffff894488e6>] ip6_mtu+0x116/0x140
[ 97.778013] [<ffffffff892bb91e>] tcp_current_mss+0x12e/0x280
[ 97.778016] [<ffffffff892bb7f0>] ? tcp_mtu_to_mss+0x2d0/0x2d0
[ 97.778022] [<ffffffff887b45b8>] ? depot_save_stack+0x138/0x4a0
[ 97.778037] [<ffffffff87c38985>] ? __mmdrop+0x145/0x1f0
[ 97.778040] [<ffffffff881643b1>] ? save_stack+0xb1/0xd0
[ 97.778046] [<ffffffff89264c82>] tcp_send_mss+0x22/0x220
[ 97.778059] [<ffffffff89273a49>] tcp_sendmsg_locked+0x4f9/0x39f0
[ 97.778062] [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
[ 97.778066] [<ffffffff89273550>] ? tcp_sendpage+0x60/0x60
[ 97.778070] [<ffffffff881cb359>] ? rw_copy_check_uvector+0x69/0x280
[ 97.778075] [<ffffffff8873c65f>] ? import_iovec+0x9f/0x430
[ 97.778078] [<ffffffff88164be7>] ? kasan_slab_free+0x87/0xc0
[ 97.778082] [<ffffffff8873c5c0>] ? memzero_page+0x140/0x140
[ 97.778085] [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
[ 97.778088] [<ffffffff89276f6c>] tcp_sendmsg+0x2c/0x50
[ 97.778092] [<ffffffff89276f6c>] ? tcp_sendmsg+0x2c/0x50
[ 97.778098] [<ffffffff89352d43>] inet_sendmsg+0x103/0x480
[ 97.778102] [<ffffffff89352c40>] ? inet_gso_segment+0x15b0/0x15b0
[ 97.778105] [<ffffffff890294da>] sock_sendmsg+0xba/0xf0
[ 97.778108] [<ffffffff8902ab6a>] ___sys_sendmsg+0x6ca/0x8e0
[ 97.778113] [<ffffffff87dccac1>] ? hrtimer_try_to_cancel+0x71/0x3b0
[ 97.778116] [<ffffffff8902a4a0>] ? copy_msghdr_from_user+0x3d0/0x3d0
[ 97.778119] [<ffffffff881646d1>] ? memset+0x31/0x40
[ 97.778123] [<ffffffff87a0cff5>] ? schedule_hrtimeout_range_clock+0x165/0x380
[ 97.778127] [<ffffffff87a0ce90>] ? hrtimer_nanosleep_restart+0x250/0x250
[ 97.778130] [<ffffffff87dcc700>] ? __hrtimer_init+0x180/0x180
[ 97.778133] [<ffffffff87dd1f82>] ? ktime_get_ts64+0x172/0x200
[ 97.778137] [<ffffffff8822b8ec>] ? __fget_light+0x8c/0x2f0
[ 97.778141] [<ffffffff8902d5c6>] __sys_sendmsg+0xe6/0x190
[ 97.778144] [<ffffffff8902d5c6>] ? __sys_sendmsg+0xe6/0x190
[ 97.778147] [<ffffffff8902d4e0>] ? SyS_shutdown+0x20/0x20
[ 97.778152] [<ffffffff87cd4370>] ? wake_up_q+0xe0/0xe0
[ 97.778155] [<ffffffff8902d670>] ? __sys_sendmsg+0x190/0x190
[ 97.778158] [<ffffffff8902d683>] SyS_sendmsg+0x13/0x20
[ 97.778162] [<ffffffff87a1600c>] do_syscall_64+0x2ac/0x430
[ 97.778166] [<ffffffff87c17515>] ? do_page_fault+0x35/0x3d0
[ 97.778171] [<ffffffff8960131f>] ? page_fault+0x2f/0x50
[ 97.778174] [<ffffffff89600071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 97.778177] RIP: 0033:0x7f83fa36000d
[ 97.778178] RSP: 002b:00007f83ef9229e0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[ 97.778180] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f83fa36000d
[ 97.778182] RDX: 0000000000004000 RSI: 00007f83ef922f00 RDI: 0000000000000036
[ 97.778183] RBP: 00007f83ef923040 R08: 00007f83ef9231f8 R09: 00007f83ef923168
[ 97.778184] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f83f69c5b40
[ 97.778185] R13: 000000000000001c R14: 0000000000000001 R15: 0000000000004000
[ 97.779684] Allocated by task 5919:
[ 97.783185] save_stack+0x46/0xd0
[ 97.783187] kasan_kmalloc+0xad/0xe0
[ 97.783189] kmem_cache_alloc_trace+0xdf/0x580
[ 97.783190] ip6_convert_metrics.isra.79+0x7e/0x190
[ 97.783192] ip6_route_info_create+0x60a/0x2480
[ 97.783193] ip6_route_add+0x1d/0x80
[ 97.783195] inet6_rtm_newroute+0xdd/0xf0
[ 97.783198] rtnetlink_rcv_msg+0x641/0xb10
[ 97.783200] netlink_rcv_skb+0x27b/0x3e0
[ 97.783202] rtnetlink_rcv+0x15/0x20
[ 97.783203] netlink_unicast+0x4be/0x720
[ 97.783204] netlink_sendmsg+0x7bc/0xbf0
[ 97.783205] sock_sendmsg+0xba/0xf0
[ 97.783207] ___sys_sendmsg+0x6ca/0x8e0
[ 97.783208] __sys_sendmsg+0xe6/0x190
[ 97.783209] SyS_sendmsg+0x13/0x20
[ 97.783211] do_syscall_64+0x2ac/0x430
[ 97.783213] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 97.784709] Freed by task 0:
[ 97.785056] knetbase: Error: /proc/sys/net/core/txcs_enable does not exist
[ 97.794497] save_stack+0x46/0xd0
[ 97.794499] kasan_slab_free+0x71/0xc0
[ 97.794500] kfree+0x7c/0xf0
[ 97.794501] fib6_info_destroy_rcu+0x24f/0x310
[ 97.794504] rcu_process_callbacks+0x38b/0x1730
[ 97.794506] __do_softirq+0x1c8/0x5d0
Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Two new tls tests added in parallel in both net and net-next.
Used Stephen Rothwell's linux-next resolution.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
DST_NOCOUNT in dst_entry::flags tracks whether the entry counts
toward route cache size (net->ipv6.sysctl.ip6_rt_max_size).
If the flag is NOT set, dst_ops::pcpuc_entries counter is incremented
in dist_init() and decremented in dst_destroy().
This flag is tied to allocation/deallocation of dst_entry and
should not be copied from another dst/route. Otherwise it can happen
that dst_ops::pcpuc_entries counter grows until no new routes can
be allocated because the counter reached ip6_rt_max_size due to
DST_NOCOUNT not set and thus no counter decrements on gc-ed routes.
Fixes: 3b6761d18bc1 ("net/ipv6: Move dst flags to booleans in fib entries")
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the unlikely case ip6_xmit() has to call skb_realloc_headroom(),
we need to call skb_set_owner_w() before consuming original skb,
otherwise we risk a use-after-free.
Bring IPv6 in line with what we do in IPv4 to fix this.
Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I see no reason for them, label or timer cannot be NULL, and if they
were, we'll crash with null deref anyway.
For skb_header_pointer failure, just set hotdrop to true and toss
such packet.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Same as ip_gre, use gre_parse_header to parse gre header in gre error
handler code.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the UDPv6 early demux rx code path lacks some mandatory
checks, already implemented into the normal RX code path - namely
the checksum conversion and no_check6_rx check.
Similar to the previous commit, we move the common processing to
an UDPv6 specific helper and call it from both edemux code path
and normal code path. In respect to the UDPv4, we need to add an
explicit check for non zero csum according to no_check6_rx value.
Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Xin Long <lucien.xin@gmail.com>
Fixes: c9f2c1ae123a ("udp6: fix socket leak on early demux")
Fixes: 2abb7cdc0dc8 ("udp: Add support for doing checksum unnecessary conversion")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When splitting a GSO segment that consists of encapsulated packets, the
skb->mac_len of the segments can end up being set wrong, causing packet
drops in particular when using act_mirred and ifb interfaces in
combination with a qdisc that splits GSO packets.
This happens because at the time skb_segment() is called, network_header
will point to the inner header, throwing off the calculation in
skb_reset_mac_len(). The network_header is subsequently adjust by the
outer IP gso_segment handlers, but they don't set the mac_len.
Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6
gso_segment handlers, after they modify the network_header.
Many thanks to Eric Dumazet for his help in identifying the cause of
the bug.
Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When splitting a GSO segment that consists of encapsulated packets, the
skb->mac_len of the segments can end up being set wrong, causing packet
drops in particular when using act_mirred and ifb interfaces in
combination with a qdisc that splits GSO packets.
This happens because at the time skb_segment() is called, network_header
will point to the inner header, throwing off the calculation in
skb_reset_mac_len(). The network_header is subsequently adjust by the
outer IP gso_segment handlers, but they don't set the mac_len.
Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6
gso_segment handlers, after they modify the network_header.
Many thanks to Eric Dumazet for his help in identifying the cause of
the bug.
Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In inet6_rtm_getroute, since Commit 93531c674315 ("net/ipv6: separate
handling of FIB entries from dst based routes"), it has used rt->from
to dump route info instead of rt.
However for some route like cache, some of its information like flags
or gateway is not the same as that of the 'from' one. It caused 'ip
route get' to dump the wrong route information.
In Jianlin's testing, the output information even lost the expiration
time for a pmtu route cache due to the wrong fib6_flags.
So change to use rt6_info members for dst addr, src addr, flags and
gateway when it tries to dump a route entry without fibmatch set.
v1->v2:
- not use rt6i_prefsrc.
- also fix the gw dump issue.
Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The socket option will be enabled by default to ensure current behaviour
is not changed. This is the same for the IPv4 version.
A socket bound to in6addr_any and a specific port will receive all traffic
on that port. Analogue to IP_MULTICAST_ALL, disable this behaviour, if
one or more multicast groups were joined (using said socket) and only
pass on multicast traffic from groups, which were explicitly joined via
this socket.
Without this option disabled a socket (system even) joined to multiple
multicast groups is very hard to get right. Filtering by destination
address has to take place in user space to avoid receiving multicast
traffic from other multicast groups, which might have traffic on the same
port.
The extension of the IP_MULTICAST_ALL socketoption to just apply to ipv6,
too, is not done to avoid changing the behaviour of current applications.
Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
Acked-By: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|