summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2016-10-22ipv4: use the right lock for ping_group_rangeWANG Cong1-4/+4
This reverts commit a681574c99be23e4d20b769bf0e543239c364af5 ("ipv4: disable BH in set_ping_group_range()") because we never read ping_group_range in BH context (unlike local_port_range). Then, since we already have a lock for ping_group_range, those using ip_local_ports.lock for ping_group_range are clearly typos. We might consider to share a same lock for both ping_group_range and local_port_range w.r.t. space saving, but that should be for net-next. Fixes: a681574c99be ("ipv4: disable BH in set_ping_group_range()") Fixes: ba6b918ab234 ("ping: move ping_group_range out of CONFIG_SYSCTL") Cc: Eric Dumazet <edumazet@google.com> Cc: Eric Salo <salo@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22Merge branch 'dsa-bcm_sf2-do-not-rely-on-kexec_in_progress'David S. Miller2-3/+3
Florian Fainelli says: ==================== net: dsa: bcm_sf2: Do not rely on kexec_in_progress These are the two patches following the discussing we had on kexec_in_progress. Feel free to apply or discard them, thanks! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22net: dsa: bcm_sf2: Do not rely on kexec_in_progressFlorian Fainelli1-2/+3
After discussing with Eric, it turns out that, while using kexec_in_progress is a nice optimization, which prevents us from always powering on the integrated PHY, let's just turn it on in the shutdown path. This removes a dependency on kexec_in_progress which, according to Eric should not be used by modules Fixes: 2399d6143f85 ("net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22Revert "kexec: Export kexec_in_progress to modules"Florian Fainelli1-1/+0
This reverts commit 97dcaa0fcfd24daa9a36c212c1ad1d5a97759212. Based on the review discussion with Eric, we will come up with a different fix for the bcm_sf2 driver which does not make it rely on the kexec_in_progress value. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22netns: revert "netns: avoid disabling irq for netns id"Paul Moore1-15/+20
This reverts commit bc51dddf98c9 ("netns: avoid disabling irq for netns id") as it was found to cause problems with systems running SELinux/audit, see the mailing list thread below: * http://marc.info/?t=147694653900002&r=1&w=2 Eventually we should be able to reintroduce this code once we have rewritten the audit multicast code to queue messages much the same way we do for unicast messages. A tracking issue for this can be found below: * https://github.com/linux-audit/audit-kernel/issues/23 Reported-by: Stephen Smalley <sds@tycho.nsa.gov> Reported-by: Elad Raz <e@eladraz.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-21ipv6: fix a potential deadlock in do_ipv6_setsockopt()WANG Cong3-6/+15
Baozeng reported this deadlock case: CPU0 CPU1 ---- ---- lock([ 165.136033] sk_lock-AF_INET6); lock([ 165.136033] rtnl_mutex); lock([ 165.136033] sk_lock-AF_INET6); lock([ 165.136033] rtnl_mutex); Similar to commit 87e9f0315952 ("ipv4: fix a potential deadlock in mcast getsockopt() path") this is due to we still have a case, ipv6_sock_mc_close(), where we acquire sk_lock before rtnl_lock. Close this deadlock with the similar solution, that is always acquire rtnl lock first. Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket") Reported-by: Baozeng Ding <sploving1@gmail.com> Tested-by: Baozeng Ding <sploving1@gmail.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller14-60/+70
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix compilation warning in xt_hashlimit on m68k 32-bits, from Geert Uytterhoeven. 2) Fix wrong timeout in set elements added from packet path via nft_dynset, from Anders K. Pedersen. 3) Remove obsolete nf_conntrack_events_retry_timeout sysctl documentation, from Nicolas Dichtel. 4) Ensure proper initialization of log flags via xt_LOG, from Liping Zhang. 5) Missing alias to autoload ipcomp, also from Liping Zhang. 6) Missing NFTA_HASH_OFFSET attribute validation, again from Liping. 7) Wrong integer type in the new nft_parse_u32_check() function, from Dan Carpenter. 8) Another wrong integer type declaration in nft_exthdr_init, also from Dan Carpenter. 9) Fix insufficient mode validation in nft_range. 10) Fix compilation warning in nft_range due to possible uninitialized value, from Arnd Bergmann. 11) Zero nf_hook_ops allocated via xt_hook_alloc() in x_tables to calm down kmemcheck, from Florian Westphal. 12) Schedule gc_worker() to run again if GC_MAX_EVICTS quota is reached, from Nicolas Dichtel. 13) Fix nf_queue() after conversion to single-linked hook list, related to incorrect bypass flag handling and incorrect hook point of reinjection. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20kexec: Export kexec_in_progress to modulesFlorian Fainelli1-0/+1
The bcm_sf2 driver uses kexec_in_progress to know whether it can power down an integrated PHY during shutdown, and can be built as a module. Other modules may be using this in the future, so export it. Fixes: 2399d6143f85 ("net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20ipv4: disable BH in set_ping_group_range()Eric Dumazet1-2/+2
In commit 4ee3bd4a8c746 ("ipv4: disable BH when changing ip local port range") Cong added BH protection in set_local_port_range() but missed that same fix was needed in set_ping_group_range() Fixes: b8f1a55639e6 ("udp: Add function to make source port for UDP tunnels") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Eric Salo <salo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20udp: must lock the socket in udp_disconnect()Eric Dumazet8-8/+18
Baozeng Ding reported KASAN traces showing uses after free in udp_lib_get_port() and other related UDP functions. A CONFIG_DEBUG_PAGEALLOC=y kernel would eventually crash. I could write a reproducer with two threads doing : static int sock_fd; static void *thr1(void *arg) { for (;;) { connect(sock_fd, (const struct sockaddr *)arg, sizeof(struct sockaddr_in)); } } static void *thr2(void *arg) { struct sockaddr_in unspec; for (;;) { memset(&unspec, 0, sizeof(unspec)); connect(sock_fd, (const struct sockaddr *)&unspec, sizeof(unspec)); } } Problem is that udp_disconnect() could run without holding socket lock, and this was causing list corruptions. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernelsFlorian Fainelli1-0/+14
For a kernel that is being kexec'd we re-enable the integrated GPHY in order for the subsequent MDIO bus scan to succeed and properly bind to the bcm7xxx PHY driver. If we did not do that, the GPHY would be shut down by the time the MDIO driver is probing the bus, and it would fail to read the correct PHY OUI and therefore bind to an appropriate PHY driver. Later on, this would cause DSA not to be able to successfully attach to the PHY, and the interface would not be created at all. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20bpf, test: fix ld_abs + vlan push/pop stress testDaniel Borkmann1-1/+1
After commit 636c2628086e ("net: skbuff: Remove errornous length validation in skb_vlan_pop()") mentioned test case stopped working, throwing a -12 (ENOMEM) return code. The issue however is not due to 636c2628086e, but rather due to a buggy test case that got uncovered from the change in behaviour in 636c2628086e. The data_size of that test case for the skb was set to 1. In the bpf_fill_ld_abs_vlan_push_pop() handler bpf insns are generated that loop with: reading skb data, pushing 68 tags, reading skb data, popping 68 tags, reading skb data, etc, in order to force a skb expansion and thus trigger that JITs recache skb->data. Problem is that initial data_size is too small. While before 636c2628086e, the test silently bailed out due to the skb->len < VLAN_ETH_HLEN check with returning 0, and now throwing an error from failing skb_ensure_writable(). Set at least minimum of ETH_HLEN as an initial length so that on first push of data, equivalent pop will succeed. Fixes: 4d9c5c53ac99 ("test_bpf: add bpf_skb_vlan_push/pop() tests") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net: add recursion limit to GROSabrina Dubroca11-11/+49
Currently, GRO can do unlimited recursion through the gro_receive handlers. This was fixed for tunneling protocols by limiting tunnel GRO to one level with encap_mark, but both VLAN and TEB still have this problem. Thus, the kernel is vulnerable to a stack overflow, if we receive a packet composed entirely of VLAN headers. This patch adds a recursion counter to the GRO layer to prevent stack overflow. When a gro_receive function hits the recursion limit, GRO is aborted for this skb and it is processed normally. This recursion counter is put in the GRO CB, but could be turned into a percpu counter if we run out of space in the CB. Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report. Fixes: CVE-2016-7039 Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.") Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Jiri Benc <jbenc@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20ipv6: properly prevent temp_prefered_lft sysctl raceJiri Bohac1-5/+4
The check for an underflow of tmp_prefered_lft is always false because tmp_prefered_lft is unsigned. The intention of the check was to guard against racing with an update of the temp_prefered_lft sysctl, potentially resulting in an underflow. As suggested by David Miller, the best way to prevent the race is by reading the sysctl variable using READ_ONCE. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Reported-by: Julia Lawall <julia.lawall@lip6.fr> Fixes: 76506a986dc3 ("IPv6: fix DESYNC_FACTOR") Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20netfilter: fix nf_queue handlingPablo Neira Ayuso3-27/+36
nf_queue handling is broken since e3b37f11e6e4 ("netfilter: replace list_head with single linked list") for two reasons: 1) If the bypass flag is set on, there are no userspace listeners and we still have more hook entries to iterate over, then jump to the next hook. Otherwise accept the packet. On nf_reinject() path, the okfn() needs to be invoked. 2) We should not re-enter the same hook on packet reinjection. If the packet is accepted, we have to skip the current hook from where the packet was enqueued, otherwise the packets gets enqueued over and over again. This restores the previous list_for_each_entry_continue() behaviour happening from nf_iterate() that was dealing with these two cases. This patch introduces a new nf_queue() wrapper function so this fix becomes simpler. Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-20netfilter: conntrack: restart gc immediately if GC_MAX_EVICTS is reachedNicolas Dichtel1-1/+1
When the maximum evictions number is reached, do not wait 5 seconds before the next run. CC: Florian Westphal <fw@strlen.de> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-20stmmac: display the descriptors if DES0 = 0Giuseppe CAVALLARO1-4/+3
It makes sense to display the descriptors even if DES0 is zero. This helps for example in case of it is needed to dump rx write-back descriptors to get timestamp status. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20Merge branch 'ncsi-fixes'David S. Miller3-34/+112
Gavin Shan says: ==================== net/ncsi: More bug fixes This series fixes 2 issues that were found during NCSI's availability testing on BCM5718 and improves HNCDSC AEN handler: * PATCH[1] refactors the code so that minimal code change is put to PATCH[2]. * PATCH[2] fixes the NCSI channel's stale link state before doing failover. * PATCH[3] chooses the hot channel, which was ever chosen as active channel, when the available channels are all in link-down state. * PATCH[4] improves Host Network Controller Driver Status Change (HNCDSC) AEN handler Changelog ========= v2: * Merged PATCH[v1 1/2] to PATCH[v2 1]. * Avoid if/else statements in ncsi_suspend_channel() as Joel suggested. * Added comments to explain why we need retrieve last link states in ncsi_suspend_channel(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net/ncsi: Improve HNCDSC AEN handlerGavin Shan1-3/+15
This improves AEN handler for Host Network Controller Driver Status Change (HNCDSC): * The channel's lock should be hold when accessing its state. * Do failover when host driver isn't ready. * Configure channel when host driver becomes ready. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net/ncsi: Choose hot channel as active one if necessaryGavin Shan2-3/+20
The issue was found on BCM5718 which has two NCSI channels in one package: C0 and C1. C0 is in link-up state while C1 is in link-down state. C0 is chosen as active channel until unplugging and plugging C0's cable: On unplugging C0's cable, LSC (Link State Change) AEN packet received on C0 to report link-down event. After that, C1 is chosen as active channel. LSC AEN for link-up event is lost on C0 when plugging C0's cable back. We lose the network even C0 is usable. This resolves the issue by recording the (hot) channel that was ever chosen as active one. The hot channel is chosen to be active one if none of available channels in link-up state. With this, C0 is still the active one after unplugging C0's cable. LSC AEN packet received on C0 when plugging its cable back. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net/ncsi: Fix stale link state of inactive channels on failoverGavin Shan2-1/+28
The issue was found on BCM5718 which has two NCSI channels in one package: C0 and C1. Both of them are connected to different LANs, means they are in link-up state and C0 is chosen as the active one until resetting BCM5718 happens as below. Resetting BCM5718 results in LSC (Link State Change) AEN packet received on C0, meaning LSC AEN is missed on C1. When LSC AEN packet received on C0 to report link-down, it fails over to C1 because C1 is in link-up state as software can see. However, C1 is in link-down state in hardware. It means the link state is out of synchronization between hardware and software, resulting in inappropriate channel (C1) selected as active one. This resolves the issue by sending separate GLS (Get Link Status) commands to all channels in the package before trying to do failover. The last link states of all channels in the package are retrieved. With it, C0 (not C1) is selected as active one as expected. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net/ncsi: Avoid if statements in ncsi_suspend_channel()Gavin Shan1-28/+50
There are several if/else statements in the state machine implemented by switch/case in ncsi_suspend_channel() to avoid duplicated code. It makes the code a bit hard to be understood. This drops if/else statements in ncsi_suspend_channel() to improve the code readability as Joel Stanley suggested. Also, it becomes easy to add more states in the state machine without affecting current code. No logical changes introduced by this. Suggested-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net/sched: act_mirred: Use passed lastuse argumentPaul Blakey1-1/+4
stats_update callback is called by NIC drivers doing hardware offloading of the mirred action. Lastuse is passed as argument to specify when the stats was actually last updated and is not always the current time. Fixes: 9798e6fe4f9b ('net: act_mirred: allow statistic updates from offloaded actions') Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20mlxsw: pci: Fix reset wait for SwitchX2Jiri Pirko1-2/+8
SwitchX2 firmware does not implement reset done yet. Moreover, when busy-polled for ready magic, that slows down firmware and reset takes longer than the defined timeout, causing initialization to fail. So restore the previous behaviour and just sleep in this case. Fixes: 233fa44bd67a ("mlxsw: pci: Implement reset done check") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20mlxsw: switchx2: Fix ethernet port initializationElad Raz1-0/+1
When creating an ethernet port fails, we must move the port to disable, otherwise putting the port in switch partition 0 (ETH) or 1 (IB) will always fails. Fixes: 31557f0f9755 ("mlxsw: Introduce Mellanox SwitchX-2 ASIC support") Signed-off-by: Elad Raz <eladr@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20mlxsw: spectrum_router: Make mlxsw_sp_router_fib4_del return void and remove ↵Jiri Pirko1-8/+5
warn The function return value is not checked anywhere. Also, the warning causes huge slowdown when removing large number of FIB entries which were not offloaded, because of ordering issue. Ido's preparing a patchset to fix the ordering issue, but that is definitelly not net tree material. Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20mlxsw: spectrum_router: Use correct tree index for bindingJiri Pirko1-1/+2
By a mistake, there is tree index 0 passed to RALTB. Should be MLXSW_SP_LPM_TREE_MIN. Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls") Reported-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19stmmac: fix and review the ptp registration.Giuseppe CAVALLARO3-14/+9
The commit commit 7086605a6ab5 ("stmmac: fix error check when init ptp") breaks the procedure added by the commit efee95f42b5d ("ptp_clock: future-proofing drivers against PTP subsystem becoming optional") So this patch tries to re-import the logic added by the latest commit above: it makes sense to have the stmmac_ptp_register as void function and, inside the main, the stmmac_init_ptp can fails in case of the capability cannot be supported by the HW. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre TORGUE <alexandre.torgue@st.com> Cc: Rayagond Kokatanur <rayagond@vayavyalabs.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Nicolas Pitre <nico@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19netfilter: x_tables: suppress kmemcheck warningFlorian Westphal1-1/+1
Markus Trippelsdorf reports: WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff88001e605480) 4055601e0088ffff000000000000000090686d81ffffffff0000000000000000 u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u ^ |RIP: 0010:[<ffffffff8166e561>] [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160 [..] [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160 [<ffffffff8166eaaf>] nf_register_net_hooks+0x3f/0xa0 [<ffffffff816d6715>] ipt_register_table+0xe5/0x110 [..] This warning is harmless; we copy 'uninitialized' data from the hook ops but it will not be used. Long term the structures keeping run-time data should be disentangled from those only containing config-time data (such as where in the list to insert a hook), but thats -next material. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-19tcp: do not export sysctl_tcp_low_latencyEric Dumazet1-1/+0
Since commit b2fb4f54ecd4 ("tcp: uninline tcp_prequeue()") we no longer access sysctl_tcp_low_latency from a module. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19rtnetlink: Add rtnexthop offload flag to compare maskJiri Pirko1-1/+1
The offload flag is a status flag and should not be used by FIB semantics for comparison. Fixes: 37ed9493699c ("rtnetlink: add RTNH_F_EXTERNAL flag for fib offload") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19switchdev: Execute bridge ndos only for bridge portsIdo Schimmel1-0/+9
We recently got the following warning after setting up a vlan device on top of an offloaded bridge and executing 'bridge link': WARNING: CPU: 0 PID: 18566 at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:81 mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum] [...] CPU: 0 PID: 18566 Comm: bridge Not tainted 4.8.0-rc7 #1 Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox switch, BIOS 4.6.5 05/21/2015 0000000000000286 00000000e64ab94f ffff880406e6f8f0 ffffffff8135eaa3 0000000000000000 0000000000000000 ffff880406e6f930 ffffffff8108c43b 0000005106e6f988 ffff8803df398840 ffff880403c60108 ffff880406e6f990 Call Trace: [<ffffffff8135eaa3>] dump_stack+0x63/0x90 [<ffffffff8108c43b>] __warn+0xcb/0xf0 [<ffffffff8108c56d>] warn_slowpath_null+0x1d/0x20 [<ffffffffa01420d5>] mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum] [<ffffffffa0142195>] mlxsw_sp_port_attr_get+0xa5/0xb0 [mlxsw_spectrum] [<ffffffff816f151f>] switchdev_port_attr_get+0x4f/0x140 [<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140 [<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140 [<ffffffff816f1d6b>] switchdev_port_bridge_getlink+0x5b/0xc0 [<ffffffff816f2680>] ? switchdev_port_fdb_dump+0x90/0x90 [<ffffffff815f5427>] rtnl_bridge_getlink+0xe7/0x190 [<ffffffff8161a1b2>] netlink_dump+0x122/0x290 [<ffffffff8161b0df>] __netlink_dump_start+0x15f/0x190 [<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230 [<ffffffff815fab46>] rtnetlink_rcv_msg+0x1a6/0x220 [<ffffffff81208118>] ? __kmalloc_node_track_caller+0x208/0x2c0 [<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230 [<ffffffff815fa9a0>] ? rtnl_newlink+0x890/0x890 [<ffffffff8161cf54>] netlink_rcv_skb+0xa4/0xc0 [<ffffffff815f56f8>] rtnetlink_rcv+0x28/0x30 [<ffffffff8161c92c>] netlink_unicast+0x18c/0x240 [<ffffffff8161ccdb>] netlink_sendmsg+0x2fb/0x3a0 [<ffffffff815c5a48>] sock_sendmsg+0x38/0x50 [<ffffffff815c6031>] SYSC_sendto+0x101/0x190 [<ffffffff815c7111>] ? __sys_recvmsg+0x51/0x90 [<ffffffff815c6b6e>] SyS_sendto+0xe/0x10 [<ffffffff817017f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4 The problem is that the 8021q module propagates the call to ndo_bridge_getlink() via switchdev ops, but the switch driver doesn't recognize the netdev, as it's not offloaded. While we can ignore calls being made to non-bridge ports inside the driver, a better fix would be to push this check up to the switchdev layer. Note that these ndos can be called for non-bridged netdev, but this only happens in certain PF drivers which don't call the corresponding switchdev functions anyway. Fixes: 99f44bb3527b ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Tamir Winetroub <tamirw@mellanox.com> Tested-by: Tamir Winetroub <tamirw@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19net: core: Correctly iterate over lower adjacency listIdo Schimmel2-4/+8
Tamir reported the following trace when processing ARP requests received via a vlan device on top of a VLAN-aware bridge: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0] [...] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.8.0-rc7 #1 Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016 task: ffff88017edfea40 task.stack: ffff88017ee10000 RIP: 0010:[<ffffffff815dcc73>] [<ffffffff815dcc73>] netdev_all_lower_get_next_rcu+0x33/0x60 [...] Call Trace: <IRQ> [<ffffffffa015de0a>] mlxsw_sp_port_lower_dev_hold+0x5a/0xa0 [mlxsw_spectrum] [<ffffffffa016f1b0>] mlxsw_sp_router_netevent_event+0x80/0x150 [mlxsw_spectrum] [<ffffffff810ad07a>] notifier_call_chain+0x4a/0x70 [<ffffffff810ad13a>] atomic_notifier_call_chain+0x1a/0x20 [<ffffffff815ee77b>] call_netevent_notifiers+0x1b/0x20 [<ffffffff815f2eb6>] neigh_update+0x306/0x740 [<ffffffff815f38ce>] neigh_event_ns+0x4e/0xb0 [<ffffffff8165ea3f>] arp_process+0x66f/0x700 [<ffffffff8170214c>] ? common_interrupt+0x8c/0x8c [<ffffffff8165ec29>] arp_rcv+0x139/0x1d0 [<ffffffff816e505a>] ? vlan_do_receive+0xda/0x320 [<ffffffff815e3794>] __netif_receive_skb_core+0x524/0xab0 [<ffffffff815e6830>] ? dev_queue_xmit+0x10/0x20 [<ffffffffa06d612d>] ? br_forward_finish+0x3d/0xc0 [bridge] [<ffffffffa06e5796>] ? br_handle_vlan+0xf6/0x1b0 [bridge] [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60 [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0 [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70 [<ffffffffa06d7856>] br_pass_frame_up+0xc6/0x160 [bridge] [<ffffffffa06d63d7>] ? deliver_clone+0x37/0x50 [bridge] [<ffffffffa06d656c>] ? br_flood+0xcc/0x160 [bridge] [<ffffffffa06d7b14>] br_handle_frame_finish+0x224/0x4f0 [bridge] [<ffffffffa06d7f94>] br_handle_frame+0x174/0x300 [bridge] [<ffffffff815e3599>] __netif_receive_skb_core+0x329/0xab0 [<ffffffff81374815>] ? find_next_bit+0x15/0x20 [<ffffffff8135e802>] ? cpumask_next_and+0x32/0x50 [<ffffffff810c9968>] ? load_balance+0x178/0x9b0 [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60 [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0 [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70 [<ffffffffa01544a1>] mlxsw_sp_rx_listener_func+0x61/0xb0 [mlxsw_spectrum] [<ffffffffa005c9f7>] mlxsw_core_skb_receive+0x187/0x200 [mlxsw_core] [<ffffffffa007332a>] mlxsw_pci_cq_tasklet+0x63a/0x9b0 [mlxsw_pci] [<ffffffff81091986>] tasklet_action+0xf6/0x110 [<ffffffff81704556>] __do_softirq+0xf6/0x280 [<ffffffff8109213f>] irq_exit+0xdf/0xf0 [<ffffffff817042b4>] do_IRQ+0x54/0xd0 [<ffffffff8170214c>] common_interrupt+0x8c/0x8c The problem is that netdev_all_lower_get_next_rcu() never advances the iterator, thereby causing the loop over the lower adjacency list to run forever. Fix this by advancing the iterator and avoid the infinite loop. Fixes: 7ce856aaaf13 ("mlxsw: spectrum: Add couple of lower device helper functions") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Tamir Winetroub <tamirw@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19flow_dissector: Check skb for VLAN only if skb specified.Eric Garver1-4/+2
Fixes a panic when calling eth_get_headlen(). Noticed on i40e driver. Fixes: d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci") Signed-off-by: Eric Garver <e@erig.me> Reviewed-by: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Amir Vadai <amir@vadai.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18qed: Use list_move_tail instead of list_del/list_add_tailWei Yongjun1-7/+4
Using list_move_tail() instead of list_del() + list_add_tail(). Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18rocker: fix maybe-uninitialized warningArnd Bergmann1-2/+2
In some rare configurations, we get a warning about the 'index' variable being used without an initialization: drivers/net/ethernet/rocker/rocker_ofdpa.c: In function ‘ofdpa_port_fib_ipv4.isra.16.constprop’: drivers/net/ethernet/rocker/rocker_ofdpa.c:2425:92: warning: ‘index’ may be used uninitialized in this function [-Wmaybe-uninitialized] This is a false positive, the logic is just a bit too complex for gcc to follow here. Moving the intialization of 'index' a little further down makes it clear to gcc that the function always returns an error if it is not initialized. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18net/hyperv: avoid uninitialized variableArnd Bergmann1-1/+1
The hdr_offset variable is only if we deal with a TCP or UDP packet, but as the check surrounding its usage tests for skb_is_gso() instead, the compiler has no idea if the variable is initialized or not at that point: drivers/net/hyperv/netvsc_drv.c: In function ‘netvsc_start_xmit’: drivers/net/hyperv/netvsc_drv.c:494:42: error: ‘hdr_offset’ may be used uninitialized in this function [-Werror=maybe-uninitialized] This adds an additional check for the transport type, which tells the compiler that this path cannot happen. Since the get_net_transport_info() function should always be inlined here, I don't expect this to result in additional runtime checks. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18net: bcm63xx: avoid referencing uninitialized variableArnd Bergmann1-1/+2
gcc found a reference to an uninitialized variable in the error handling of bcm_enet_open, introduced by a recent cleanup: drivers/net/ethernet/broadcom/bcm63xx_enet.c: In function 'bcm_enet_open' drivers/net/ethernet/broadcom/bcm63xx_enet.c:1129:2: warning: 'phydev' may be used uninitialized in this function [-Wmaybe-uninitialized] This makes the use of that variable conditional, so we only reference it here after it has been used before. Unlike my normal patches, I have not build-tested this one, as I don't currently have mips test in my randconfig setup. Fixes: 625eb8667d6f ("net: ethernet: broadcom: bcm63xx: use phydev from struct net_device") Cc: Philippe Reynes <tremyfr@gmail.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18soreuseport: do not export reuseport_add_sock()Eric Dumazet1-1/+0
reuseport_add_sock() is not used from a module, no need to export it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18ibmvnic: Update MTU after device initializationThomas Falcon1-0/+2
It is possible for the MTU to be changed during the initialization process with the VNIC Server. Ensure that the net device is updated to reflect the new MTU. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18ibmvnic: Fix GFP_KERNEL allocation in interrupt contextThomas Falcon1-1/+1
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18ibmvnic: Driver Version 1.0.1Thomas Falcon1-1/+1
Increment driver version to reflect features that have been added since release. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18bridge: multicast: restore perm router ports on multicast enableNikolay Aleksandrov1-9/+14
Satish reported a problem with the perm multicast router ports not getting reenabled after some series of events, in particular if it happens that the multicast snooping has been disabled and the port goes to disabled state then it will be deleted from the router port list, but if it moves into non-disabled state it will not be re-added because the mcast snooping is still disabled, and enabling snooping later does nothing. Here are the steps to reproduce, setup br0 with snooping enabled and eth1 added as a perm router (multicast_router = 2): 1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping 2. $ ip l set eth1 down ^ This step deletes the interface from the router list 3. $ ip l set eth1 up ^ This step does not add it again because mcast snooping is disabled 4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping 5. $ bridge -d -s mdb show <empty> At this point we have mcast enabled and eth1 as a perm router (value = 2) but it is not in the router list which is incorrect. After this change: 1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping 2. $ ip l set eth1 down ^ This step deletes the interface from the router list 3. $ ip l set eth1 up ^ This step does not add it again because mcast snooping is disabled 4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping 5. $ bridge -d -s mdb show router ports on br0: eth1 Note: we can directly do br_multicast_enable_port for all because the querier timer already has checks for the port state and will simply expire if it's in blocking/disabled. See the comment added by commit 9aa66382163e7 ("bridge: multicast: add a comment to br_port_state_selection about blocking state") Fixes: 561f1103a2b7 ("bridge: Add multicast_snooping sysfs toggle") Reported-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18netfilter: nf_tables: avoid uninitialized variable warningArnd Bergmann1-6/+4
The newly added nft_range_eval() function handles the two possible nft range operations, but as the compiler warning points out, any unexpected value would lead to the 'mismatch' variable being used without being initialized: net/netfilter/nft_range.c: In function 'nft_range_eval': net/netfilter/nft_range.c:45:5: error: 'mismatch' may be used uninitialized in this function [-Werror=maybe-uninitialized] This removes the variable in question and instead moves the condition into the switch itself, which is potentially more efficient than adding a bogus 'default' clause as in my first approach, and is nicer than using the 'uninitialized_var' macro. Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression") Link: http://patchwork.ozlabs.org/patch/677114/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-18tcp: Remove unused but set variableTobias Klauser1-2/+0
Remove the unused but set variable icsk in listening_get_next to fix the following GCC warning when building with 'W=1': net/ipv4/tcp_ipv4.c: In function ‘listening_get_next’: net/ipv4/tcp_ipv4.c:1890:31: warning: variable ‘icsk’ set but not used [-Wunused-but-set-variable] Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18cxgb4: Fix number of queue sets corssing the limitGanesh Goudar1-1/+1
Do not let number of offload queue sets to go more than MAX_OFLD_QSETS, which would otherwise crash the driver on machines with cores more than MAX_OFLD_QSETS. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18ipv4: Remove unused but set variableTobias Klauser1-3/+0
Remove the unused but set variable dev in ip_do_fragment to fix the following GCC warning when building with 'W=1': net/ipv4/ip_output.c: In function ‘ip_do_fragment’: net/ipv4/ip_output.c:541:21: warning: variable ‘dev’ set but not used [-Wunused-but-set-variable] Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18dwc_eth_qos: enable flow control by defaultNiklas Cassel1-0/+1
Allow autoneg to enable flow control by default. The behavior when autoneg is off has not changed. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Jesper Nilsson <jespern@axis.com> Acked-by: Lars Persson <larper@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18dwc_eth_qos: do not clear pause flags from phy_device->supportedNiklas Cassel1-1/+2
phy_device->supported is originally set by the PHY driver. The ethernet driver should filter phy_device->supported to only contain flags supported by the IP. The IP supports setting rx and tx flow control independently, therefore SUPPORTED_Pause and SUPPORTED_Asym_Pause should not be cleared. If the flags are cleared, pause frames cannot be enabled (even if they are supported by the PHY). Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Jesper Nilsson <jespern@axis.com> Acked-by: Lars Persson <larper@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18net/hsr: Remove unused but set variableTobias Klauser1-4/+0
Remove the unused but set variable master_dev in check_local_dest to fix the following GCC warning when building with 'W=1': net/hsr/hsr_forward.c: In function ‘check_local_dest’: net/hsr/hsr_forward.c:303:21: warning: variable ‘master_dev’ set but not used [-Wunused-but-set-variable] Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>