summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2015-11-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller7-37/+46
Minor overlapping changes in net/ipv4/ipmr.c, in 'net' we were fixing the "BH-ness" of the counter bumps whilst in 'net-next' the functions were modified to take an explicit 'net' parameter. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursionJiri Pirko1-5/+4
Caller passing down the SKIP_EOPNOTSUPP switchdev flag expects that -EOPNOTSUPP cannot be returned. But in case of direct op call without recurtion, this may happen. So fix this by checking it always on the end of __switchdev_port_attr_set function. Fixes: 464314ea6c11 ("switchdev: skip over ports returning -EOPNOTSUPP when recursing ports") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03net: sched: kill dead code in sch_choke.cPhil Sutter1-59/+0
It looks like this has never been used at all. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03irda: Delete an unnecessary check before the function call ↵Markus Elfring1-2/+1
"irlmp_unregister_service" The irlmp_unregister_service() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03Merge tag 'mac80211-for-davem-2015-11-03' of ↵David S. Miller17-210/+264
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Another set of fixes: * remove a warning on a check that can trigger without any errors having happened (Andrei) * correctly handle deauth request while in the process of associating (Andrei) * fix TDLS HT operation (Arik) * allow changing AID/listen interval during client setup (Ayala) * be more forgiving with WMM parameters to get HT/VHT in case of broken APs with bad WMM settings (Emmanuel, myself) * a number of other fixes (some in documentation) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03net/core: fix for_each_netdev_featureJarod Wilson1-2/+6
As pointed out by Nikolay and further explained by Geert, the initial for_each_netdev_feature macro was broken, as feature would get set outside of the block of code it was intended to run in, thus only ever working for the first feature bit in the mask. While less pretty this way, this is tested and confirmed functional with multiple feature bits set in NETIF_F_UPPER_DISABLES. [root@dell-per730-01 ~]# ethtool -K bond0 lro off ... [ 242.761394] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2. [ 243.552178] bnx2x 0000:06:00.1 p5p2: using MSI-X IRQs: sp 74 fp[0] 76 ... fp[7] 83 [ 244.353978] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1. [ 245.147420] bnx2x 0000:06:00.0 p5p1: using MSI-X IRQs: sp 62 fp[0] 64 ... fp[7] 71 [root@dell-per730-01 ~]# ethtool -K bond0 gro off ... [ 251.925645] bond0: Disabling feature 0x0000000000004000 on lower dev p5p2. [ 252.713693] bnx2x 0000:06:00.1 p5p2: using MSI-X IRQs: sp 74 fp[0] 76 ... fp[7] 83 [ 253.499085] bond0: Disabling feature 0x0000000000004000 on lower dev p5p1. [ 254.290922] bnx2x 0000:06:00.0 p5p1: using MSI-X IRQs: sp 62 fp[0] 64 ... fp[7] 71 Fixes: fd867d51f ("net/core: generic support for disabling netdev features down stack") CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Nikolay Aleksandrov <razor@blackwall.org> CC: Michal Kubecek <mkubecek@suse.cz> CC: Alexander Duyck <alexander.duyck@gmail.com> CC: Geert Uytterhoeven <geert@linux-m68k.org> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03vlan: Invoke driver vlan hooks only if device is presentPadmanabh Ratnakar1-2/+8
NIC drivers mark device as detached during error recovery. It expects no manangement hooks to be invoked in this state. Invoke driver vlan hooks only if device is present. Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03ptp: Change ptp_class to a proper bitmaskStefan Sørensen1-8/+8
Change the definition of PTP_CLASS_L2 to not have any bits overlapping with the other defined protocol values, allowing the PTP_CLASS_* definitions to be for simple filtering on packet type. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03ipv6: fix tunnel error handlingMichal Kubeček1-1/+11
Both tunnel6_protocol and tunnel46_protocol share the same error handler, tunnel6_err(), which traverses through tunnel6_handlers list. For ipip6 tunnels, we need to traverse tunnel46_handlers as we do e.g. in tunnel46_rcv(). Current code can generate an ICMPv6 error message with an IPv4 packet embedded in it. Fixes: 73d605d1abbd ("[IPSEC]: changing API of xfrm6_tunnel_register") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03cfg80211: allow AID/listen interval changes for unassociated stationAyala Beker1-9/+18
Currently, cfg80211 rejects updates of AID and listen interval parameters for existing entries. This information is known only at association stage and as a result it's impossible to update entries that were added unassociated. Fix this by allowing updates of these properies for stations that the driver (or mac80211) assigned unassociated state. This then fixes mac80211's use of NL80211_FEATURE_FULL_AP_CLIENT_STATE. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: document sleep requirements for channel context opsChaitanya T K2-0/+12
Channel context driver operations can sleep, so add might_sleep() and document this. Signed-off-by: Chaitanya T K <chaitanya.mgit@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: further improve "no supported rates" warningJohannes Berg1-1/+1
Allow distinguishing the non-station case from the case of a station without rates, by using -1 for the non-station case. This value cannot be reached with a station since that many legacy rates don't exist. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: treat bad WMM parameters more gracefullyJohannes Berg1-94/+48
As WMM is required for HT/VHT operation, treat bad WMM parameters more gracefully by falling back to default parameters instead of not using WMM assocation. This makes it possible to still use HT or VHT, although potentially with reduced quality of service due to unintended WMM parameters. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: fixup AIFSN instead of disabling WMMEmmanuel Grumbach1-7/+7
Disabling WMM has a huge impact these days. It implies that HT and VHT will be disabled which means that the throughput will be drammatically reduced. Since the AIFSN is a transmission parameter, we can play a bit and fix it up to make it compliant with the 802.11 specification which requires it to be at least 2. Increasing it from 1 to 2 will slightly reduce the likelyhood to get a transmission opportunity compared to other clients that would accept to set AIFSN=1, but at least it will allow HT and VHT which is a huge gain. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: make enable_qos parameter to ieee80211_set_wmm_default()Johannes Berg5-16/+11
The function currently determines this value, for use in bss_info.qos, based on the interface type itself. Make it a parameter instead and set it with the same logic for now. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: fix crash on mesh local link ID generation with VIFsMatthias Schiffer1-0/+3
llid_in_use needs to be limited to stations of the same VIF, otherwise it will cause a NULL deref as the sta_info of non-mesh-VIFs don't have sta->mesh set. Steps to reproduce: modprobe mac80211_hwsim channels=2 iw phy phy0 interface add ibss0 type ibss iw phy phy0 interface add mesh0 type mp iw phy phy1 interface add ibss1 type ibss iw phy phy1 interface add mesh1 type mp ip link set ibss0 up ip link set mesh0 up ip link set ibss1 up ip link set mesh1 up iw dev ibss0 ibss join foo 2412 iw dev ibss1 ibss join foo 2412 # Ensure that ibss0 and ibss1 are actually associated; I often need to # leave and join the cell on ibss1 a second time. iw dev mesh0 mesh join bar iw dev mesh1 mesh join bar # crash Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: TDLS: add proper HT-oper IEArik Nemtsov5-7/+18
When 11n peers performs a TDLS connection on a legacy BSS, the HT operation IE must be specified according to IEEE802.11-2012 section 9.23.3.2. Otherwise HT-protection is compromised and the medium becomes noisy for both the TDLS and the BSS links. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: don't reconfigure sched scan in case of wowlanEliad Peller5-35/+45
Scheduled scan has to be reconfigured only if wowlan wasn't configured, since otherwise it should continue to run (with the 'any' trigger) or be aborted. The current code will end up asking the driver to start a new scheduled scan without stopping the previous one, and leaking some memory (from the previous request.) Fix this by doing the abort/restart under the proper conditions. Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: call drv_stop only if driver is startedEliad Peller3-31/+48
If drv_start() fails during hw_restart, all the running interfaces are being closed/stopped, which results in drv_stop() being called, although the driver was never started successfully. This might cause drivers to perform operations on uninitialized memory (as they assume it was initialized on drv_start) Consider the local->started flag, and call the driver's stop() op only if drv_start() succeeded before. Move drv_start() and drv_stop() to driver-ops.c, as they are no longer simple wrappers. Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: Remove WARN_ON_ONCE in ieee80211_recalc_smpsAndrei Otcheretianski1-1/+7
The recalc_smps work can run after the station disassociates. At this stage we already released the channel, but the work will be cancelled only when the interface stops. In this scenario we can hit the warning in ieee80211_recalc_smps, so just remove it. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: use freezable workqueue for restart workEliad Peller2-1/+12
Requesting hw restart during suspend might result in the restart work being executed after mac80211 and the hw are suspended. Solve the race by simply scheduling the restart work on a freezable workqueue. Note that there can be some cases of reconfiguration on resume (besides the hardware restart): * wowlan is not configured - All the interfaces removed were removed on suspend, and drv_stop() was called. At this point the driver shouldn't expect for hw_restart anyway, so we can simply cancel it (on resume). * wowlan is configured, drv_resume() == 1 There is no definitive expected behavior in this case, as each driver might have different expectations (e.g. setting some flags on suspend/restart vs. not handling spurious recovery). For now, simply let the hw_restart work run again after resume, and hope the driver will handle it well (or at least initiate another hw restart). Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: Fix local deauth while associatingAndrei Otcheretianski1-0/+19
Local request to deauthenticate wasn't handled while associating, thus the association could continue even when the user space required to disconnect. Cc: stable@vger.kernel.org Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: allow null chandef in tracingArik Nemtsov1-5/+5
In TDLS channel-switch operations the chandef can sometimes be NULL. Avoid an oops in the trace code for these cases and just print a chandef full of zeros. Cc: stable@vger.kernel.org Fixes: a7a6bdd0670fe ("mac80211: introduce TDLS channel switch ops") Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03nl80211: Fix potential memory leak from parse_acl_dataOla Olsson1-6/+6
If parse_acl_data succeeds but the subsequent parsing of smps attributes fails, there will be a memory leak due to early returns. Fix that by moving the ACL parsing later. Cc: stable@vger.kernel.org Fixes: 18998c381b19b ("cfg80211: allow requesting SMPS mode on ap start") Signed-off-by: Ola Olsson <ola.olsson@sonymobile.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03mac80211: fix divide by zero when NOA updateJanusz.Dziedzic@tieto.com1-0/+7
In case of one shot NOA the interval can be 0, catch that instead of potentially (depending on the driver) crashing like this: divide error: 0000 [#1] SMP [...] Call Trace: <IRQ> [<ffffffffc08e891c>] ieee80211_extend_absent_time+0x6c/0xb0 [mac80211] [<ffffffffc08e8a17>] ieee80211_update_p2p_noa+0xb7/0xe0 [mac80211] [<ffffffffc069cc30>] ath9k_p2p_ps_timer+0x170/0x190 [ath9k] [<ffffffffc070adf8>] ath_gen_timer_isr+0xc8/0xf0 [ath9k_hw] [<ffffffffc0691156>] ath9k_tasklet+0x296/0x2f0 [ath9k] [<ffffffff8107ad65>] tasklet_action+0xe5/0xf0 [...] Cc: stable@vger.kernel.org [3.16+, due to d463af4a1c34 using it] Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-02net/core: generic support for disabling netdev features down stackJarod Wilson1-0/+50
There are some netdev features, which when disabled on an upper device, such as a bonding master or a bridge, must be disabled and cannot be re-enabled on underlying devices. This is a rework of an earlier more heavy-handed appraoch, which simply disables and prevents re-enabling of netdev features listed in a new define in include/net/netdev_features.h, NETIF_F_UPPER_DISABLES. Any upper device that disables a flag in that feature mask, the disabling will propagate down the stack, and any lower device that has any upper device with one of those flags disabled should not be able to enable said flag. Initially, only LRO is included for proof of concept, and because this code effectively does the same thing as dev_disable_lro(), though it will also activate from the ethtool path, which was one of the goals here. [root@dell-per730-01 ~]# ethtool -k bond0 |grep large large-receive-offload: on [root@dell-per730-01 ~]# ethtool -k p5p1 |grep large large-receive-offload: on [root@dell-per730-01 ~]# ethtool -K bond0 lro off [root@dell-per730-01 ~]# ethtool -k bond0 |grep large large-receive-offload: off [root@dell-per730-01 ~]# ethtool -k p5p1 |grep large large-receive-offload: off dmesg dump: [ 1033.277986] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2. [ 1034.067949] bnx2x 0000:06:00.1 p5p2: using MSI-X IRQs: sp 74 fp[0] 76 ... fp[7] 83 [ 1034.753612] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1. [ 1035.591019] bnx2x 0000:06:00.0 p5p1: using MSI-X IRQs: sp 62 fp[0] 64 ... fp[7] 71 This has been successfully tested with bnx2x, qlcnic and netxen network cards as slaves in a bond interface. Turning LRO on or off on the master also turns it on or off on each of the slaves, new slaves are added with LRO in the same state as the master, and LRO can't be toggled on the slaves. Also, this should largely remove the need for dev_disable_lro(), and most, if not all, of its call sites can be replaced by simply making sure NETIF_F_LRO isn't included in the relevant device's feature flags. Note that this patch is driven by bug reports from users saying it was confusing that bonds and slaves had different settings for the same features, and while it won't be 100% in sync if a lower device doesn't support a feature like LRO, I think this is a good step in the right direction. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Nikolay Aleksandrov <razor@blackwall.org> CC: Michal Kubecek <mkubecek@suse.cz> CC: Alexander Duyck <alexander.duyck@gmail.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02sit: fix sit0 percpu double allocationsEric Dumazet1-22/+4
sit0 device allocates its percpu storage twice : - One time in ipip6_tunnel_init() - One time in ipip6_fb_tunnel_init() Thus we leak 48 bytes per possible cpu per network namespace dismantle. ipip6_fb_tunnel_init() can be much simpler and does not return an error, and should be called after register_netdev() Note that ipip6_tunnel_clone_6rd() also needs to be called after register_netdev() (calling ipip6_tunnel_init()) Fixes: ebe084aafb7e ("sit: Use ipip6_tunnel_init as the ndo_init function.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02net: fix percpu memory leaksEric Dumazet5-18/+35
This patch fixes following problems : 1) percpu_counter_init() can return an error, therefore init_frag_mem_limit() must propagate this error so that inet_frags_init_net() can do the same up to its callers. 2) If ip[46]_frags_ns_ctl_register() fail, we must unwind properly and free the percpu_counter. Without this fix, we leave freed object in percpu_counters global list (if CONFIG_HOTPLUG_CPU) leading to crashes. This bug was detected by KASAN and syzkaller tool (http://github.com/google/syzkaller) Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02ipv6: fix crash on ICMPv6 redirects with prohibited/blackholed sourceMatthias Schiffer1-2/+1
There are other error values besides ip6_null_entry that can be returned by ip6_route_redirect(): fib6_rule_action() can also result in ip6_blk_hole_entry and ip6_prohibit_entry if such ip rules are installed. Only checking for ip6_null_entry in rt6_do_redirect() causes ip6_ins_rt() to be called with rt->rt6i_table == NULL in these cases, making the kernel crash. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02net: make skb_set_owner_w() more robustEric Dumazet2-3/+23
skb_set_owner_w() is called from various places that assume skb->sk always point to a full blown socket (as it changes sk->sk_wmem_alloc) We'd like to attach skb to request sockets, and in the future to timewait sockets as well. For these kind of pseudo sockets, we need to take a traditional refcount and use sock_edemux() as the destructor. It is now time to un-inline skb_set_owner_w(), being too big. Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Bisected-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bridge: vlan: Use rcu_dereference instead of rtnl_dereferenceIdo Schimmel1-1/+1
br_should_learn() is protected by RCU and not by RTNL, so use correct flavor of nbp_vlan_group(). Fixes: 907b1e6e83ed ("bridge: vlan: use proper rcu for the vlgrp member") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() ↵Ani Sinha1-3/+3
in preemptible context. Fixes the following kernel BUG : BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758 caller is __this_cpu_preempt_check+0x13/0x15 CPU: 0 PID: 2758 Comm: bash Tainted: P O 3.18.19 #2 ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000 0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800 ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8 Call Trace: [<ffffffff81482b2a>] dump_stack+0x52/0x80 [<ffffffff812010ae>] check_preemption_disabled+0xce/0xe1 [<ffffffff812010d4>] __this_cpu_preempt_check+0x13/0x15 [<ffffffff81419d60>] ipmr_queue_xmit+0x647/0x70c [<ffffffff8141a154>] ip_mr_forward+0x32f/0x34e [<ffffffff8141af76>] ip_mroute_setsockopt+0xe03/0x108c [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42 [<ffffffff810e6974>] ? pollwake+0x4d/0x51 [<ffffffff81058ac0>] ? default_wake_function+0x0/0xf [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42 [<ffffffff810613d9>] ? __wake_up_common+0x45/0x77 [<ffffffff81486ea9>] ? _raw_spin_unlock_irqrestore+0x1d/0x32 [<ffffffff810618bc>] ? __wake_up_sync_key+0x4a/0x53 [<ffffffff8139a519>] ? sock_def_readable+0x71/0x75 [<ffffffff813dd226>] do_ip_setsockopt+0x9d/0xb55 [<ffffffff81429818>] ? unix_seqpacket_sendmsg+0x3f/0x41 [<ffffffff813963fe>] ? sock_sendmsg+0x6d/0x86 [<ffffffff813959d4>] ? sockfd_lookup_light+0x12/0x5d [<ffffffff8139650a>] ? SyS_sendto+0xf3/0x11b [<ffffffff810d5738>] ? new_sync_read+0x82/0xaa [<ffffffff813ddd19>] compat_ip_setsockopt+0x3b/0x99 [<ffffffff813fb24a>] compat_raw_setsockopt+0x11/0x32 [<ffffffff81399052>] compat_sock_common_setsockopt+0x18/0x1f [<ffffffff813c4d05>] compat_SyS_setsockopt+0x1a9/0x1cf [<ffffffff813c4149>] compat_SyS_socketcall+0x180/0x1e3 [<ffffffff81488ea1>] cstar_dispatch+0x7/0x1e Signed-off-by: Ani Sinha <ani@arista.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bridge: vlan: Use correct flag name in commentIdo Schimmel1-3/+3
The flag used to indicate if a VLAN should be used for filtering - as opposed to context only - on the bridge itself (e.g. br0) is called 'brentry' and not 'brvlan'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bridge: vlan: Prevent possible use-after-freeIdo Schimmel1-0/+2
When adding a port to a bridge we initialize VLAN filtering on it. We do not bail out in case an error occurred in nbp_vlan_init, as it can be used as a non VLAN filtering bridge. However, if VLAN filtering is required and an error occurred in nbp_vlan_init, we should set vlgrp to NULL, so that VLAN filtering functions (e.g. br_vlan_find, br_get_pvid) will know the struct is invalid and will not try to access it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02tcp/dccp: fix ireq->pktopts raceEric Dumazet2-17/+17
IPv6 request sockets store a pointer to skb containing the SYN packet to be able to transfer it to full blown socket when 3WHS is done (ireq->pktopts -> np->pktoptions) As explained in commit 5e0724d027f0 ("tcp/dccp: fix hashdance race for passive sessions"), we must transfer the skb only if we won the hashdance race, if multiple cpus receive the 'ack' packet completing 3WHS at the same time. Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02RDS: convert bind hash table to re-sizable hashtablesantosh.shilimkar@oracle.com3-86/+56
To further improve the RDS connection scalabilty on massive systems where number of sockets grows into tens of thousands of sockets, there is a need of larger bind hashtable. Pre-allocated 8K or 16K table is not very flexible in terms of memory utilisation. The rhashtable infrastructure gives us the flexibility to grow the hashtbable based on use and also comes up with inbuilt efficient bucket(chain) handling. Reviewed-by: David Miller <davem@davemloft.net> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02net: rds: changing the return type from int to voidSaurabh Sengar1-4/+2
as result of function rds_iw_flush_mr_pool is nowhere checked, changing its return type from int to void. also removing the unused variable rc as there is nothing to return Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02ipv4: use l4 hash for locally generated multipath flowsPaolo Abeni1-1/+2
This patch changes how the multipath hash is computed for locally generated flows: now the hash comprises l4 information. This allows better utilization of the available paths when the existing flows have the same source IP and the same destination IP: with l3 hash, even when multiple connections are in place simultaneously, a single path will be used, while with l4 hash we can use all the available paths. v2 changes: - use get_hash_from_flowi4() instead of implementing just another l4 hash function Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01Use 64-bit timekeepingTina Ruchandani1-5/+6
This patch changes the use of struct timespec in dccp_probe to use struct timespec64 instead. timespec uses a 32-bit seconds field which will overflow in the year 2038 and beyond. timespec64 uses a 64-bit seconds field. Note that the correctness of the code isn't changed, since the original code only uses the timestamps to compute a small elapsed interval. This patch is part of a larger attempt to remove instances of 32-bit timekeeping structures (timespec, timeval, time_t) from the kernel so it is easier to identify where the real 2038 issues are. Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv4: update RTNH_F_LINKDOWN flag on UP eventJulian Anastasov1-0/+7
When nexthop is part of multipath route we should clear the LINKDOWN flag when link goes UP or when first address is added. This is needed because we always set LINKDOWN flag when DEAD flag was set but now on UP the nexthop is not dead anymore. Examples when LINKDOWN bit can be forgotten when no NETDEV_CHANGE is delivered: - link goes down (LINKDOWN is set), then link goes UP and device shows carrier OK but LINKDOWN remains set - last address is deleted (LINKDOWN is set), then address is added and device shows carrier OK but LINKDOWN remains set Steps to reproduce: modprobe dummy ifconfig dummy0 192.168.168.1 up here add a multipath route where one nexthop is for dummy0: ip route add 1.2.3.4 nexthop dummy0 nexthop SOME_OTHER_DEVICE ifconfig dummy0 down ifconfig dummy0 up now ip route shows nexthop that is not dead. Now set the sysctl var: echo 1 > /proc/sys/net/ipv4/conf/dummy0/ignore_routes_with_linkdown now ip route will show a dead nexthop because the forgotten RTNH_F_LINKDOWN is propagated as RTNH_F_DEAD. Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv4: fix to not remove local route on link downJulian Anastasov2-9/+15
When fib_netdev_event calls fib_disable_ip on NETDEV_DOWN event we should not delete the local routes if the local address is still present. The confusion comes from the fact that both fib_netdev_event and fib_inetaddr_event use the NETDEV_DOWN constant. Fix it by returning back the variable 'force'. Steps to reproduce: modprobe dummy ifconfig dummy0 192.168.168.1 up ifconfig dummy0 down ip route list table local | grep dummy | grep host local 192.168.168.1 dev dummy0 proto kernel scope host src 192.168.168.1 Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01net: dsa: use switchdev obj for VLAN add/del opsVivien Didelot1-20/+9
Simplify DSA by pushing the switchdev objects for VLAN add and delete operations down to its drivers. Currently only mv88e6xxx is affected. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01VSOCK: define VSOCK_SS_LISTEN once onlyStefan Hajnoczi2-22/+19
The SS_LISTEN socket state is defined by both af_vsock.c and vmci_transport.c. This is risky since the value could be changed in one file and the other would be out of sync. Rename from SS_LISTEN to VSOCK_SS_LISTEN since the constant is not part of enum socket_state (SS_CONNECTED, ...). This way it is clear that the constant is vsock-specific. The big text reflow in af_vsock.c was necessary to keep to the maximum line length. Text is unchanged except for s/SS_LISTEN/VSOCK_SS_LISTEN/. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01tipc: linearize arriving NAME_DISTR and LINK_PROTO buffersJon Paul Maloy1-0/+5
Testing of the new UDP bearer has revealed that reception of NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE message buffers is not prepared for the case that those may be non-linear. We now linearize all such buffers before they are delivered up to the generic reception layer. In order for the commit to apply cleanly to 'net' and 'stable', we do the change in the function tipc_udp_recv() for now. Later, we will post a commit to 'net-next' moving the linearization to generic code, in tipc_named_rcv() and tipc_link_proto_rcv(). Fixes: commit d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragmentHannes Frederic Sowa1-4/+4
CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one of those warn about them once and handle them gracefully by recalculating the checksum. Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets") See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL") Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked socketsHannes Frederic Sowa1-37/+33
We cannot reliable calculate packet size on MSG_MORE corked sockets and thus cannot decide if they are going to be fragmented later on, so better not use CHECKSUM_PARTIAL in the first place. The IPv6 code also intended to protect and not use CHECKSUM_PARTIAL in the existence of IPv6 extension headers, but the condition was wrong. Fix it up, too. Also the condition to check whether the packet fits into one fragment was wrong and has been corrected. Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets") See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL") Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragmentHannes Frederic Sowa1-3/+5
CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one of those warn about them once and handle them gracefully by recalculating the checksum. Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked socketsHannes Frederic Sowa1-0/+1
We cannot reliable calculate packet size on MSG_MORE corked sockets and thus cannot decide if they are going to be fragmented later on, so better not use CHECKSUM_PARTIAL in the first place. Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller8-14/+38
2015-10-30Merge branch 'master' of ↵David S. Miller4-16/+45
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2015-10-30 1) The flow cache is limited by the flow cache limit which depends on the number of cpus and the xfrm garbage collector threshold which is independent of the number of cpus. This leads to the fact that on systems with more than 16 cpus we hit the xfrm garbage collector limit and refuse new allocations, so new flows are dropped. On systems with 16 or less cpus, we hit the flowcache limit. In this case, we shrink the flow cache instead of refusing new flows. We increase the xfrm garbage collector threshold to INT_MAX to get the same behaviour, independent of the number of cpus. 2) Fix some unaligned accesses on sparc systems. From Sowmini Varadhan. 3) Fix some header checks in _decode_session4. We may call pskb_may_pull with a negative value converted to unsigened int from pskb_may_pull. This can lead to incorrect policy lookups. We fix this by a check of the data pointer position before we call pskb_may_pull. 4) Reload skb header pointers after calling pskb_may_pull in _decode_session4 as this may change the pointers into the packet. 5) Add a missing statistic counter on inner mode errors. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>