summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-05-17lib: Correct comment of prandom_seedPhilippe Mazenauer1-2/+2
Variable 'entropy' was wrongly documented as 'seed', changed comment to reflect actual variable name. ../lib/random32.c:179: warning: Function parameter or member 'entropy' not described in 'prandom_seed' ../lib/random32.c:179: warning: Excess function parameter 'seed' description in 'prandom_seed' Signed-off-by: Philippe Mazenauer <philippe.mazenauer@outlook.de> Acked-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-17net: caif: fix the value of size argument of snprintfswkhack5-6/+5
Because the function snprintf write at most size bytes(including the null byte).So the value of the argument size need not to minus one. Signed-off-by: swkhack <swkhack@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16ipv6: fix src addr routing with the exception tableWei Wang1-24/+27
When inserting route cache into the exception table, the key is generated with both src_addr and dest_addr with src addr routing. However, current logic always assumes the src_addr used to generate the key is a /128 host address. This is not true in the following scenarios: 1. When the route is a gateway route or does not have next hop. (rt6_is_gw_or_nonexthop() == false) 2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL. This means, when looking for a route cache in the exception table, we have to do the lookup twice: first time with the passed in /128 host address, second time with the src_addr stored in fib6_info. This solves the pmtu discovery issue reported by Mikael Magnusson where a route cache with a lower mtu info is created for a gateway route with src addr. However, the lookup code is not able to find this route cache. Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") Reported-by: Mikael Magnusson <mikael.kernel@lists.m7n.se> Bisected-by: David Ahern <dsahern@gmail.com> Signed-off-by: Wei Wang <weiwan@google.com> Cc: Martin Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16selftests: pmtu.sh: Remove quotes around commands in setup_xfrmDavid Ahern1-9/+9
The first command in setup_xfrm is failing resulting in the test getting skipped: + ip netns exec ns-B ip -6 xfrm state add src fd00:1::a dst fd00:1::b spi 0x1000 proto esp aead 'rfc4106(gcm(aes))' 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel + out=RTNETLINK answers: Function not implemented ... xfrm6 not supported TEST: vti6: PMTU exceptions [SKIP] xfrm4 not supported TEST: vti4: PMTU exceptions [SKIP] ... The setup command started failing when the run_cmd option was added. Removing the quotes fixes the problem: ... TEST: vti6: PMTU exceptions [ OK ] TEST: vti4: PMTU exceptions [ OK ] ... Fixes: 56490b623aa0 ("selftests: Add debugging options to pmtu.sh") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16net: avoid weird emergency messageEric Dumazet1-1/+1
When host is under high stress, it is very possible thread running netdev_wait_allrefs() returns from msleep(250) 10 seconds late. This leads to these messages in the syslog : [...] unregister_netdevice: waiting for syz_tun to become free. Usage count = 0 If the device refcount is zero, the wait is over. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16Merge branch 'aqc111-revert-endianess-fixes-and-cleanup-mtu-logic'David S. Miller1-25/+10
Igor Russkikh says: ==================== aqc111: revert endianess fixes and cleanup mtu logic This reverts no-op commits as it was discussed: https://lore.kernel.org/netdev/1557839644.11261.4.camel@suse.com/ First and second original patches are already dropped from stable, No need to stable-queue the third patch as it has no functional impact, just a logic cleanup. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16aqc111: cleanup mtu related logicIgor Russkikh1-4/+2
Original fix b8b277525e9d was done under impression that invalid data could be written for mtu configuration higher that 16334. But the high limit will anyway be rejected my max_mtu check in caller. Thus, make the code cleaner and allow it doing the configuration without checking for maximum mtu value. Fixes: b8b277525e9d ("aqc111: fix endianness issue in aqc111_change_mtu") Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16Revert "aqc111: fix writing to the phy on BE"Igor Russkikh1-17/+6
This reverts commit 369b46e9fbcfa5136f2cb5f486c90e5f7fa92630. The required temporary storage is already done inside of write32/16 helpers. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16Revert "aqc111: fix double endianness swap on BE"Igor Russkikh1-4/+2
This reverts commit 2cf672709beb005f6e90cb4edbed6f2218ba953e. The required temporary storage is already done inside of write32/16 helpers. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16xfrm: ressurrect "Fix uninitialized memory read in _decode_session4"Florian Westphal1-11/+13
This resurrects commit 8742dc86d0c7a9628 ("xfrm4: Fix uninitialized memory read in _decode_session4"), which got lost during a merge conflict resolution between ipsec-next and net-next tree. c53ac41e3720 ("xfrm: remove decode_session indirection from afinfo_policy") in ipsec-next moved the (buggy) _decode_session4 from net/ipv4/xfrm4_policy.c to net/xfrm/xfrm_policy.c. In mean time, 8742dc86d0c7a was applied to ipsec.git and fixed the problem in the "old" location. When the trees got merged, the moved, old function was kept. This applies the "lost" commit again, to the new location. Fixes: a658a3f2ecbab ("Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16tipc: switch order of device registration to fix a crashJunwei Hu1-7/+7
When tipc is loaded while many processes try to create a TIPC socket, a crash occurs: PANIC: Unable to handle kernel paging request at virtual address "dfff20000000021d" pc : tipc_sk_create+0x374/0x1180 [tipc] lr : tipc_sk_create+0x374/0x1180 [tipc] Exception class = DABT (current EL), IL = 32 bits Call trace: tipc_sk_create+0x374/0x1180 [tipc] __sock_create+0x1cc/0x408 __sys_socket+0xec/0x1f0 __arm64_sys_socket+0x74/0xa8 ... This is due to race between sock_create and unfinished register_pernet_device. tipc_sk_insert tries to do "net_generic(net, tipc_net_id)". but tipc_net_id is not initialized yet. So switch the order of the two to close the race. This can be reproduced with multiple processes doing socket(AF_TIPC, ...) and one process doing module removal. Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace") Signed-off-by: Junwei Hu <hujunwei4@huawei.com> Reported-by: Wang Wang <wangwang2@huawei.com> Reviewed-by: Xiaogang Wang <wangxiaogang3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16ipv6: prevent possible fib6 leaksEric Dumazet3-4/+18
At ipv6 route dismantle, fib6_drop_pcpu_from() is responsible for finding all percpu routes and set their ->from pointer to NULL, so that fib6_ref can reach its expected value (1). The problem right now is that other cpus can still catch the route being deleted, since there is no rcu grace period between the route deletion and call to fib6_drop_pcpu_from() This can leak the fib6 and associated resources, since no notifier will take care of removing the last reference(s). I decided to add another boolean (fib6_destroying) instead of reusing/renaming exception_bucket_flushed to ease stable backports, and properly document the memory barriers used to implement this fix. This patch has been co-developped with Wei Wang. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Wei Wang <weiwan@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin Lau <kafai@fb.com> Acked-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16net: test nouarg before dereferencing zerocopy pointersWillem de Bruijn1-3/+6
Zerocopy skbs without completion notification were added for packet sockets with PACKET_TX_RING user buffers. Those signal completion through the TP_STATUS_USER bit in the ring. Zerocopy annotation was added only to avoid premature notification after clone or orphan, by triggering a copy on these paths for these packets. The mechanism had to define a special "no-uarg" mode because packet sockets already use skb_uarg(skb) == skb_shinfo(skb)->destructor_arg for a different pointer. Before deferencing skb_uarg(skb), verify that it is a real pointer. Fixes: 5cd8d46ea1562 ("packet: copy user buffers before orphan or clone") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16net: usb: qmi_wwan: add Telit 0x1260 and 0x1261 compositionsDaniele Palmas1-0/+2
Added support for Telit LE910Cx 0x1260 and 0x1261 compositions. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16net: phy: aquantia: readd XGMII support for AQR107Madalin-cristian Bucur1-0/+1
XGMII interface mode no longer works on AQR107 after the recent changes, adding back support. Fixes: 570c8a7d5303 ("net: phy: aquantia: check for supported interface modes in config_init") Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16net: bpfilter: fallback to netfilter if failed to load bpfilter kernel moduleKonstantin Khlebnikov1-4/+2
If bpfilter is not available return ENOPROTOOPT to fallback to netfilter. Function request_module() returns both errors and userspace exit codes. Just ignore them. Rechecking bpfilter_ops is enough. Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16hv_sock: Add support for delayed closeSunil Muthuswamy1-31/+77
Currently, hvsock does not implement any delayed or background close logic. Whenever the hvsock socket is closed, a FIN is sent to the peer, and the last reference to the socket is dropped, which leads to a call to .destruct where the socket can hang indefinitely waiting for the peer to close it's side. The can cause the user application to hang in the close() call. This change implements proper STREAM(TCP) closing handshake mechanism by sending the FIN to the peer and the waiting for the peer's FIN to arrive for a given timeout. On timeout, it will try to terminate the connection (i.e. a RST). This is in-line with other socket providers such as virtio. This change does not address the hang in the vmbus_hvsock_device_unregister where it waits indefinitely for the host to rescind the channel. That should be taken up as a separate fix. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16atm: iphase: Avoid copying pointers to user space.Fuqian Huang1-6/+0
Remove the MEMDUMP_DEV case in ia_ioctl to avoid copy pointers to user space. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16Merge branch 'flow_offload-fix-CVLAN-support'David S. Miller3-1/+10
Edward Cree says: ==================== flow_offload: fix CVLAN support When the flow_offload infrastructure was added, CVLAN matches weren't plumbed through, and flow_rule_match_vlan() was incorrectly called in the mlx5 driver when populating CVLAN match information. This series adds flow_rule_match_cvlan(), and uses it in the mlx5 code. Both patches should also go to 5.1 stable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16net/mlx5e: Fix calling wrong function to get inner vlan key and maskJianbo Liu1-1/+1
When flow_rule_match_XYZ() functions were first introduced, flow_rule_match_cvlan() for inner vlan is missing. In mlx5_core driver, to get inner vlan key and mask, flow_rule_match_vlan() is just called, which is wrong because it obtains outer vlan information by FLOW_DISSECTOR_KEY_VLAN. This commit fixes this by changing to call flow_rule_match_cvlan() after it's added. Fixes: 8f2566225ae2 ("flow_offload: add flow_rule and flow_match structures and use them") Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16flow_offload: support CVLAN matchEdward Cree2-0/+9
Plumb it through from the flow_dissector. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16Merge branch 'rhashtable-Fix-sparse-warnings'David S. Miller2-42/+49
Herbert Xu says: ==================== rhashtable: Fix sparse warnings This patch series fixes all the sparse warnings. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16rhashtable: Fix cmpxchg RCU warningsHerbert Xu1-2/+3
As cmpxchg is a non-RCU mechanism it will cause sparse warnings when we use it for RCU. This patch adds explicit casts to silence those warnings. This should probably be moved to RCU itself in future. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16rhashtable: Remove RCU marking from rhash_lock_headHerbert Xu2-40/+46
The opaque type rhash_lock_head should not be marked with __rcu because it can never be dereferenced. We should apply the RCU marking when we turn it into a pointer which can be dereferenced. This patch does exactly that. This fixes a number of sparse warnings as well as getting rid of some unnecessary RCU checking. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller13-35/+321
Daniel Borkmann says: ==================== pull-request: bpf 2019-05-16 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a use after free in __dev_map_entry_free(), from Eric. 2) Several sockmap related bug fixes: a splat in strparser if it was never initialized, remove duplicate ingress msg list purging which can race, fix msg->sg.size accounting upon skb to msg conversion, and last but not least fix a timeout bug in tcp_bpf_wait_data(), from John. 3) Fix LRU map to avoid messing with eviction heuristics upon syscall lookup, e.g. map walks from user space side will then lead to eviction of just recently created entries on updates as it would mark all map entries, from Daniel. 4) Don't bail out when libbpf feature probing fails. Also various smaller fixes to flow_dissector test, from Stanislav. 5) Fix missing brackets for BTF_INT_OFFSET() in UAPI, from Gary. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16bpf, tcp: correctly handle DONT_WAIT flags and timeo == 0John Fastabend1-1/+4
The tcp_bpf_wait_data() routine needs to check timeo != 0 before calling sk_wait_event() otherwise we may see unexpected stalls on receiver. Arika did all the leg work here I just formatted, posted and ran a few tests. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Reported-by: Arika Chen <eaglesora@gmail.com> Suggested-by: Arika Chen <eaglesora@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-16selftests/bpf: add prog detach to flow_dissector testStanislav Fomichev1-0/+1
In case we are not running in a namespace (which we don't do by default), let's try to detach the bpf program that we use for eth_get_headlen tests. Fixes: 0905beec9f52 ("selftests/bpf: run flow dissector tests in skb-less mode") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-16selftests/bpf: add missing \n to flow_dissector CHECK errorsStanislav Fomichev1-4/+4
Otherwise, in case of an error, everything gets mushed together. Fixes: a5cb33464e53 ("selftests/bpf: make flow dissector tests more extensible") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-16libbpf: don't fail when feature probing failsStanislav Fomichev1-1/+1
Otherwise libbpf is unusable from unprivileged process with kernel.kernel.unprivileged_bpf_disabled=1. All I get is EPERM from the probes, even if I just want to open an ELF object and look at what progs/maps it has. Instead of dying on probes, let's just pr_debug the error and try to continue. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-15tcp: do not recycle cloned skbsEric Dumazet2-2/+2
It is illegal to change arbitrary fields in skb_shared_info if the skb is cloned. Before calling skb_zcopy_clear() we need to ensure this rule, therefore we need to move the test from sk_stream_alloc_skb() to sk_wmem_free_skb() Fixes: 4f661542a402 ("tcp: fix zerocopy and notsent_lowat issues") Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-15enetc: Add missing link state info for ethtoolClaudiu Manoil1-0/+2
Just hook get_link to standard ethtool_op_get_link, nothing special needed at this point. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-15enetc: Allow to disable Tx SGClaudiu Manoil2-2/+2
The fact that the Tx SG flag is fixed to 'on' is only an oversight. Non-SG mode is also supported. Fix this by allowing to turn SG off. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-15enetc: Fix NULL dma address unmap for Tx BD extensionsClaudiu Manoil1-1/+3
For the unlikely case of TxBD extensions (i.e. ptp) the driver tries to unmap the tx_swbd corresponding to the extension, which is bogus as it has no buffer attached. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14nfp: flower: add rcu locks when accessing netdev for tunnelsPieter Jansen van Vuuren1-6/+11
Add rcu locks when accessing netdev when processing route request and tunnel keep alive messages received from hardware. Fixes: 8e6a9046b66a ("nfp: flower vxlan neighbour offload") Fixes: 856f5b135758 ("nfp: flower vxlan neighbour keep-alive") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14ppp: deflate: Fix possible crash in deflate_initYueHaibing1-6/+14
BUG: unable to handle kernel paging request at ffffffffa018f000 PGD 3270067 P4D 3270067 PUD 3271063 PMD 2307eb067 PTE 0 Oops: 0000 [#1] PREEMPT SMP CPU: 0 PID: 4138 Comm: modprobe Not tainted 5.1.0-rc7+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:ppp_register_compressor+0x3e/0xd0 [ppp_generic] Code: 98 4a 3f e2 48 8b 15 c1 67 00 00 41 8b 0c 24 48 81 fa 40 f0 19 a0 75 0e eb 35 48 8b 12 48 81 fa 40 f0 19 a0 74 RSP: 0018:ffffc90000d93c68 EFLAGS: 00010287 RAX: ffffffffa018f000 RBX: ffffffffa01a3000 RCX: 000000000000001a RDX: ffff888230c750a0 RSI: 0000000000000000 RDI: ffffffffa019f000 RBP: ffffc90000d93c80 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0194080 R13: ffff88822ee1a700 R14: 0000000000000000 R15: ffffc90000d93e78 FS: 00007f2339557540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa018f000 CR3: 000000022bde4000 CR4: 00000000000006f0 Call Trace: ? 0xffffffffa01a3000 deflate_init+0x11/0x1000 [ppp_deflate] ? 0xffffffffa01a3000 do_one_initcall+0x6c/0x3cc ? kmem_cache_alloc_trace+0x248/0x3b0 do_init_module+0x5b/0x1f1 load_module+0x1db1/0x2690 ? m_show+0x1d0/0x1d0 __do_sys_finit_module+0xc5/0xd0 __x64_sys_finit_module+0x15/0x20 do_syscall_64+0x6b/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe If ppp_deflate fails to register in deflate_init, module initialization failed out, however ppp_deflate_draft may has been regiestred and not unregistered before return. Then the seconed modprobe will trigger crash like this. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14net: macb: fix error format in dev_err()Luca Ceresoli1-8/+8
Errors are negative numbers. Using %u shows them as very large positive numbers such as 4294967277 that don't make sense. Use the %d format instead, and get a much nicer -19. Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net> Fixes: b48e0bab142f ("net: macb: Migrate to devm clock interface") Fixes: 93b31f48b3ba ("net/macb: unify clock management") Fixes: 421d9df0628b ("net/macb: merge at91_ether driver into macb driver") Fixes: aead88bd0e99 ("net: ethernet: macb: Add support for rx_clk") Fixes: f5473d1d44e4 ("net: macb: Support clock management for tsu_clk") Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14rtnetlink: always put IFLA_LINK for links with a link-netnsidSabrina Dubroca1-6/+10
Currently, nla_put_iflink() doesn't put the IFLA_LINK attribute when iflink == ifindex. In some cases, a device can be created in a different netns with the same ifindex as its parent. That device will not dump its IFLA_LINK attribute, which can confuse some userspace software that expects it. For example, if the last ifindex created in init_net and foo are both 8, these commands will trigger the issue: ip link add parent type dummy # ifindex 9 ip link add link parent netns foo type macvlan # ifindex 9 in ns foo So, in case a device puts the IFLA_LINK_NETNSID attribute in a dump, always put the IFLA_LINK attribute as well. Thanks to Dan Winship for analyzing the original OpenShift bug down to the missing netlink attribute. v2: change Fixes tag, it's been here forever, as Nicolas Dichtel said add Nicolas' ack v3: change Fixes tag fix subject typo, spotted by Edward Cree Analyzed-by: Dan Winship <danw@redhat.com> Fixes: d8a5ec672768 ("[NET]: netlink support for moving devices between network namespaces.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14net/mlx4_core: Change the error print to info printYunjian Wang1-1/+1
The error print within mlx4_flow_steer_promisc_add() should be a info print. Fixes: 592e49dda812 ('net/mlx4: Implement promiscuous mode with device managed flow-steering') Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14NFC: Orphan the subsystemJohannes Berg1-4/+2
Samuel clearly hasn't been working on this in many years and patches getting to the wireless list are just being ignored entirely now. Mark the subsystem as orphan to reflect the current state and revert back to the netdev list so at least some fixes can be picked up by Dave. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14net: Always descend into dsa/Florian Fainelli1-1/+1
Jiri reported that with a kernel built with CONFIG_FIXED_PHY=y, CONFIG_NET_DSA=m and CONFIG_NET_DSA_LOOP=m, we would not get to a functional state where the mock-up driver is registered. Turns out that we are not descending into drivers/net/dsa/ unconditionally, and we won't be able to link-in dsa_loop_bdinfo.o which does the actual mock-up mdio device registration. Reported-by: Jiri Pirko <jiri@resnulli.us> Fixes: 40013ff20b1b ("net: dsa: Fix functional dsa-loop dependency on FIXED_PHY") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Tested-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14tcp: fix retrans timestamp on passive Fast OpenYuchung Cheng1-0/+3
Commit c7d13c8faa74 ("tcp: properly track retry time on passive Fast Open") sets the start of SYNACK retransmission time on passive Fast Open in "retrans_stamp". However the timestamp is not reset upon the handshake has completed. As a result, future data packet retransmission may not update it in tcp_retransmit_skb(). This may lead to socket aborting earlier unexpectedly by retransmits_timed_out() since retrans_stamp remains the SYNACK rtx time. This bug only manifests on passive TFO sender that a) suffered SYNACK timeout and then b) stalls on very first loss recovery. Any successful loss recovery would reset the timestamp to avoid this issue. Fixes: c7d13c8faa74 ("tcp: properly track retry time on passive Fast Open") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds5-59/+40
Pull virtio updates from Michael Tsirkin: - enable packed ring support for s390 - several fixes * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio/s390: enable packed ring virtio/s390: DMA support for virtio-ccw virtio/s390: use vring_create_virtqueue virtio/virtio_ring: do some comment fixes vhost-scsi: remove incorrect memory barrier tools/virtio/ringtest: Remove bogus definition of BUG_ON() virtio_ring: Fix potential mem leak in virtqueue_add_indirect_packed
2019-05-14Add gitignore file for samples/vfs/ generated filesLinus Torvalds1-0/+2
Commit f1b5618e013a ("vfs: Add a sample program for the new mount API") added sample programs that get built during the kernel build, but then cause 'git status' to worry about whether the resulting binaries should be managed by git. Tell git not to worry, and to ignore the sample binaries. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14Merge branch 'parisc-5.2-2' of ↵Linus Torvalds15-72/+73
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull more parisc updates from Helge Deller: "Two small enhancements, which I didn't included in the last pull request because I wanted to keep them a few more days in for-next before sending upstream: - Replace the ldcw barrier instruction by a nop instruction in the CAS code on uniprocessor machines. - Map variables read-only after init (enable ro_after_init feature)" * 'parisc-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Use __ro_after_init in init.c parisc: Use __ro_after_init in unwind.c parisc: Use __ro_after_init in time.c parisc: Use __ro_after_init in processor.c parisc: Use __ro_after_init in process.c parisc: Use __ro_after_init in perf_images.h parisc: Use __ro_after_init in pci.c parisc: Use __ro_after_init in inventory.c parisc: Use __ro_after_init in head.S parisc: Use __ro_after_init in firmware.c parisc: Use __ro_after_init in drivers.c parisc: Use __ro_after_init in cache.c parisc: Enable the ro_after_init feature parisc: Drop LDCW barrier in CAS code when running UP
2019-05-14Merge tag 'kgdb-5.2-rc1' of ↵Linus Torvalds5-9/+8
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "Mostly cleanups but there are also a couple of fixes for out-of-bounds accesses (including a potential write to the byte before a static buffer). The main changes are: - Fixes to those out-of-bounds access (empty string to configure test module could write the byte before a buffer, high cpu counts could read outside of per-cpu structures). - Improvements to string handling problems picked up by new compiler warnings and other static checks. Most are fixing benign issues that can't be tickled without code changes but still reduce the wtf factor a little. - Tidy up the terminal output" * tag 'kgdb-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kdb: Fix bound check compiler warning kdb: do a sanity check on the cpu in kdb_per_cpu() kdb: Get rid of broken attempt to print CCVERSION in kdb summary misc: kgdbts: fix out-of-bounds access in function param_set_kgdbts_var kdb: kdb_support: replace strcpy() by strscpy() gdbstub: Replace strcpy() by strscpy() gdbstub: mark expected switch fall-throughs
2019-05-14Merge tag 'modules-for-v5.2' of ↵Linus Torvalds4-9/+27
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules updates from Jessica Yu: - Use a separate table to store symbol types instead of hijacking fields in struct Elf_Sym - Trivial code cleanups * tag 'modules-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: add stubs for within_module functions kallsyms: store type information in its own array vmlinux.lds.h: drop unused __vermagic
2019-05-14Merge branch 'lru-map-fix'Alexei Starovoitov4-20/+297
Daniel Borkmann says: ==================== This set fixes LRU map eviction in combination with map lookups out of system call side from user space. Main patch is the second one and test cases are adapted and added in the last one. Thanks! ==================== Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-14bpf: test ref bit from data path and add new tests for syscall pathDaniel Borkmann1-14/+274
The test_lru_map is relying on marking the LRU map entry via regular BPF map lookup from system call side. This is basically for simplicity reasons. Given we fixed marking entries in that case, the test needs to be fixed as well. Here we add a small drop-in replacement to retain existing behavior for the tests by marking out of the BPF program and transferring the retrieved value out via temporary map. This also adds new test cases to track the new behavior where two elements are marked, one via system call side and one via program side, where the next update then evicts the key looked up only from system call side. # ./test_lru_map nr_cpus:8 test_lru_sanity0 (map_type:9 map_flags:0x0): Pass test_lru_sanity1 (map_type:9 map_flags:0x0): Pass test_lru_sanity2 (map_type:9 map_flags:0x0): Pass test_lru_sanity3 (map_type:9 map_flags:0x0): Pass test_lru_sanity4 (map_type:9 map_flags:0x0): Pass test_lru_sanity5 (map_type:9 map_flags:0x0): Pass test_lru_sanity7 (map_type:9 map_flags:0x0): Pass test_lru_sanity8 (map_type:9 map_flags:0x0): Pass test_lru_sanity0 (map_type:10 map_flags:0x0): Pass test_lru_sanity1 (map_type:10 map_flags:0x0): Pass test_lru_sanity2 (map_type:10 map_flags:0x0): Pass test_lru_sanity3 (map_type:10 map_flags:0x0): Pass test_lru_sanity4 (map_type:10 map_flags:0x0): Pass test_lru_sanity5 (map_type:10 map_flags:0x0): Pass test_lru_sanity7 (map_type:10 map_flags:0x0): Pass test_lru_sanity8 (map_type:10 map_flags:0x0): Pass test_lru_sanity0 (map_type:9 map_flags:0x2): Pass test_lru_sanity4 (map_type:9 map_flags:0x2): Pass test_lru_sanity6 (map_type:9 map_flags:0x2): Pass test_lru_sanity7 (map_type:9 map_flags:0x2): Pass test_lru_sanity8 (map_type:9 map_flags:0x2): Pass test_lru_sanity0 (map_type:10 map_flags:0x2): Pass test_lru_sanity4 (map_type:10 map_flags:0x2): Pass test_lru_sanity6 (map_type:10 map_flags:0x2): Pass test_lru_sanity7 (map_type:10 map_flags:0x2): Pass test_lru_sanity8 (map_type:10 map_flags:0x2): Pass Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-14bpf, lru: avoid messing with eviction heuristics upon syscall lookupDaniel Borkmann1-5/+18
One of the biggest issues we face right now with picking LRU map over regular hash table is that a map walk out of user space, for example, to just dump the existing entries or to remove certain ones, will completely mess up LRU eviction heuristics and wrong entries such as just created ones will get evicted instead. The reason for this is that we mark an entry as "in use" via bpf_lru_node_set_ref() from system call lookup side as well. Thus upon walk, all entries are being marked, so information of actual least recently used ones are "lost". In case of Cilium where it can be used (besides others) as a BPF based connection tracker, this current behavior causes disruption upon control plane changes that need to walk the map from user space to evict certain entries. Discussion result from bpfconf [0] was that we should simply just remove marking from system call side as no good use case could be found where it's actually needed there. Therefore this patch removes marking for regular LRU and per-CPU flavor. If there ever should be a need in future, the behavior could be selected via map creation flag, but due to mentioned reason we avoid this here. [0] http://vger.kernel.org/bpfconf.html Fixes: 29ba732acbee ("bpf: Add BPF_MAP_TYPE_LRU_HASH") Fixes: 8f8449384ec3 ("bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-14bpf: add map_lookup_elem_sys_only for lookups from syscall sideDaniel Borkmann2-1/+5
Add a callback map_lookup_elem_sys_only() that map implementations could use over map_lookup_elem() from system call side in case the map implementation needs to handle the latter differently than from the BPF data path. If map_lookup_elem_sys_only() is set, this will be preferred pick for map lookups out of user space. This hook is used in a follow-up fix for LRU map, but once development window opens, we can convert other map types from map_lookup_elem() (here, the one called upon BPF_MAP_LOOKUP_ELEM cmd is meant) over to use the callback to simplify and clean up the latter. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>