summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2017-12-11mac80211: Add MIC space only for TX key optionDavid Spinadel4-9/+29
Add a key flag to indicates that the device only needs MIC space and not a real MIC. In such cases, keep the MIC zeroed for ease of debug. Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-11mac80211: don't warn on AID field without top two MSBs setJohannes Berg1-4/+5
While the change between 802.11-2012 and 802.11-2016 to move from requiring APs to set the two top bits to now requiring them to be cleared was apparently unintentional and will be fixed, clients should either way assume that the top five bits are reserved and ignore them. Implement that in mac80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-11nl80211: add a few extended error strings to key parsingJohannes Berg1-20/+41
This mostly serves as an example for how to add error strings and erroneous attribute pointers. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-11cfg80211: cleanup signal strength units notationSergey Matyukevich3-11/+11
Both cfg80211_rx_mgmt and cfg80211_report_obss_beacon functions send reports to userspace using NL80211_ATTR_RX_SIGNAL_DBM attribute w/o any processing of their input signal values. Which means that in order to match userspace tools expectations, input signal values for those functions are supposed to be in dBm units. This patch cleans up comments, variable names, and trace reports for those functions, replacing confusing 'mBm' by 'dBm'. Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-11cfg80211: IBSS: Add support for static WEP in driver for IBSSTova Mussai2-0/+10
Add support for drivers that implement static WEP internally for IBSS. Add the WEP keys to the IBSS params struct, that will allow the driver to use the keys in the join flow, and not only after the connection. Signed-off-by: Tova Mussai <tova.mussai@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-11mac80211: remove BUG() when interface type is invalidLuca Coelho1-1/+1
In the ieee80211_setup_sdata() we check if the interface type is valid and, if not, call BUG(). This should never happen, but if there is something wrong with the code, it will not be caught until the bug happens when an interface is being set up. Calling BUG() is too extreme for this and a WARN_ON() would be better used instead. Change that. Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-11mac80211: call synchronize_net once in the restart flowSara Sharon3-10/+15
Currently the restart flow enables RX back, and then proceeds to tear down RX and TX aggregations. The TX aggregation tear down calls synchronize_net(), which waits for packet receiving to be done. This is done for every session, while RX processing is already active, and in some reproductions it takes up to 3 seconds. Add a call once in the restart_work, before we have traffic active again, and remove the subsequent calls when tearing down the aggregation. This requires to move down the code that turns off the reconfig flag in order to be able to test it in _ieee80211_stop_tx_ba_session(). Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-11mac80211: always update the PM state of a peer on MGMT / DATA framesEmmanuel Grumbach1-12/+5
The 2016 version of the spec is more generic about when the AP should update the power management state of the peer: the AP shall update the state based on any management or data frames. This means that even non-bufferable management frames should be looked at to update to maintain the power management state of the peer. This can avoid problematic cases for example if a station disappears while being asleep and then re-appears. The AP would remember it as in power save, but the Authentication frame couldn't be used to set the peer as awake again. Note that this issues wasn't really critical since at some point (after the association) we would have removed the station and created another one with all the states cleared. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-11mac80211_hwsim: enforce PS_MANUAL_POLL to be set after PS_ENABLEDAdiel Aloni1-6/+11
Enforce using PS_MANUAL_POLL in ps hwsim debugfs to trigger a poll, only if PS_ENABLED was set before. This is required due to commit c9491367b759 ("mac80211: always update the PM state of a peer on MGMT / DATA frames") that enforces the ap to check only mgmt/data frames ps bit, and then update station's power save accordingly. When sending only ps-poll (control frame) the ap will not be aware that the station entered power save. Setting ps enable before triggering ps_poll, will send NDP with PM bit enabled first. Signed-off-by: Adiel Aloni <adiel.aloni@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-11mac80211: make __ieee80211_start_rx_ba_session staticJohannes Berg2-8/+5
The function is only used with the file, so make it static. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-11mac80211: enable TDLS peer buffer STA featureYingying Tang3-1/+9
Allow drivers to set the buffer station extended capability for TDLS links, with a new hardware flag indicating this. Signed-off-by: Yingying Tang <yintang@qti.qualcomm.com> [change commit log/documentation wording] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-11mac80211: avoid looking up tid_tx/tid_rx from timersJohannes Berg2-37/+11
There's no need to re-lookup the data structures now that we actually get them immediately with from_timer(), just avoid that. The struct has to be valid anyway, otherwise the timer object itself would no longer be valid, and we can't have a different version of the struct since only a single session per TID is permitted. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-11mac80211: mark expected switch fall-throughsGustavo A. R. Silva10-4/+14
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in some cases I replaced "fall through on else" and "otherwise fall through" comments with just a "fall through" comment, which is what GCC is expecting to find. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller753-4574/+7219
Conflict was two parallel additions of include files to sch_generic.c, no biggie. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08kmemcheck: rip it out for realMichal Hocko13-13/+0
Commit 4675ff05de2d ("kmemcheck: rip it out") has removed the code but for some reason SPDX header stayed in place. This looks like a rebase mistake in the mmotm tree or the merge mistake. Let's drop those leftovers as well. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds113-367/+1052
Pull networking fixes from David Miller: 1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb drivers). 2) Revert returning -EEXIST from __dev_alloc_name() as this propagates to userspace and broke some apps. From Johannes Berg. 3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong Wang. 4) Gianfar MAC can't do EEE so don't advertise it by default, from Claudiu Manoil. 5) Relax strict netlink attribute validation, but emit a warning. From David Ahern. 6) Fix regression in checksum offload of thunderx driver, from Florian Westphal. 7) Fix UAPI bpf issues on s390, from Hendrik Brueckner. 8) New card support in iwlwifi, from Ihab Zhaika. 9) BBR congestion control bug fixes from Neal Cardwell. 10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren. 11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan. 12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni. 13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits) net: mvpp2: fix the RSS table entry offset tcp: evaluate packet losses upon RTT change tcp: fix off-by-one bug in RACK tcp: always evaluate losses in RACK upon undo tcp: correctly test congestion state in RACK bnxt_en: Fix sources of spurious netpoll warnings tcp_bbr: reset long-term bandwidth sampling on loss recovery undo tcp_bbr: reset full pipe detection on loss recovery undo tcp_bbr: record "full bw reached" decision in new full_bw_reached bit sfc: pass valid pointers from efx_enqueue_unwind gianfar: Disable EEE autoneg by default tcp: invalidate rate samples during SACK reneging can: peak/pcie_fd: fix potential bug in restarting tx queue can: usb_8dev: cancel urb on -EPIPE and -EPROTO can: kvaser_usb: cancel urb on -EPIPE and -EPROTO can: esd_usb2: cancel urb on -EPIPE and -EPROTO can: ems_usb: cancel urb on -EPIPE and -EPROTO can: mcba_usb: cancel urb on -EPROTO usbnet: fix alignment for frames with no ethernet header tcp: use current time in tcp_rcv_space_adjust() ...
2017-12-08Merge tag 'media/v4.15-2' of ↵Linus Torvalds269-2951/+3169
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "A series of fixes for the media subsytem: - The largest amount of fixes in this series is with regards to comments that aren't kernel-doc, but start with "/**". A new check added for 4.15 makes it to produce a *huge* amount of new warnings (I'm compiling here with W=1). Most of the patches in this series fix those. No code changes - just comment changes at the source files - rc: some fixed in order to better handle RC repetition codes - v4l-async: use the v4l2_dev from the root notifier when matching sub-devices - v4l2-fwnode: Check subdev count after checking port - ov 13858 and et8ek8: compilation fix with randconfigs - usbtv: a trivial new USB ID addition - dibusb-common: don't do DMA on stack on firmware load - imx274: Fix error handling, add MAINTAINERS entry - sir_ir: detect presence of port" * tag 'media/v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (50 commits) media: imx274: Fix error handling, add MAINTAINERS entry media: v4l: async: use the v4l2_dev from the root notifier when matching sub-devices media: v4l2-fwnode: Check subdev count after checking port media: et8ek8: select V4L2_FWNODE media: ov13858: Select V4L2_FWNODE media: rc: partial revert of "media: rc: per-protocol repeat period" media: dvb: i2c transfers over usb cannot be done from stack media: dvb-frontends: complete kernel-doc markups media: docs: add documentation for frontend attach info media: dvb_frontends: fix kernel-doc macros media: drivers: remove "/**" from non-kernel-doc comments media: lm3560: add a missing kernel-doc parameter media: rcar_jpu: fix two kernel-doc markups media: vsp1: add a missing kernel-doc parameter media: soc_camera: fix a kernel-doc markup media: mt2063: fix some kernel-doc warnings media: radio-wl1273: fix a parameter name at kernel-doc macro media: s3c-camif: add missing description at s3c_camif_find_format() media: mtk-vpu: add description for wdt fields at struct mtk_vpu media: vdec: fix some kernel-doc warnings ...
2017-12-08Merge tag 'drm-fixes-for-v4.15-rc3' of ↵Linus Torvalds60-128/+1087
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "This pull is a bit larger than I'd like but a large bunch of it is license fixes, AMD wanted to fix the licenses for a bunch of files that were missing them, Otherwise a bunch of TTM regression fix since the hugepage support, some i915 and gvt fixes, a core connector free in a safe context fix, and one bridge fix" * tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux: (26 commits) drm/bridge: analogix dp: Fix runtime PM state in get_modes() callback Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk" drm/vc4: Fix false positive WARN() backtrace on refcount_inc() usage drm/i915: Call i915_gem_init_userptr() before taking struct_mutex drm/exynos: remove unnecessary function declaration drm/exynos: remove unnecessary descrptions drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU drm/exynos: Fix dma-buf import drm/ttm: swap consecutive allocated pooled pages v4 drm: safely free connectors from connector_iter drm/i915/gvt: set max priority for gvt context drm/i915/gvt: Don't mark vgpu context as inactive when preempted drm/i915/gvt: Limit read hw reg to active vgpu drm/i915/gvt: Export intel_gvt_render_mmio_to_ring_id() drm/i915/gvt: Emulate PCI expansion ROM base address register drm/ttm: swap consecutive allocated cached pages v3 drm/ttm: roundup the shrink request to prevent skip huge pool drm/ttm: add page order support in ttm_pages_put drm/ttm: add set_pages_wb for handling page order more than zero drm/ttm: add page order in page pool ...
2017-12-08Merge tag 'md/4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds5-15/+21
Pull md fixes from Shaohua Li: "Some MD fixes. The notable one is a raid5-cache deadlock bug with dm-raid, others are not significant" * tag 'md/4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md/raid1/10: add missed blk plug md: limit mdstat resync progress to max_sectors md/r5cache: move mddev_lock() out of r5c_journal_mode_set() md/raid5: correct degraded calculation in raid5_error
2017-12-08Merge tag 'devicetree-fixes-for-4.15-part2' of ↵Linus Torvalds77-151/+149
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree fixes from Rob Herring: "Another set of DT fixes: - Fixes from overlay code rework. A trifecta of fixes to the locking, an out of bounds access, and a memory leak in of_overlay_apply() - Clean-up at25 eeprom binding document - Remove leading '0x' in unit-addresses from binding docs" * tag 'devicetree-fixes-for-4.15-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: overlay: Make node skipping in init_overlay_changeset() clearer of: overlay: Fix out-of-bounds write in init_overlay_changeset() of: overlay: Fix (un)locking in of_overlay_apply() of: overlay: Fix memory leak in of_overlay_apply() error path dt-bindings: eeprom: at25: Document device-specific compatible values dt-bindings: eeprom: at25: Grammar s/are can/can/ dt-bindings: Remove leading 0x from bindings notation of: overlay: Remove else after goto of: Spelling s/changset/changeset/ of: unittest: Remove bogus overlay mutex release from overlay_data_add()
2017-12-08Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2-11/+48
Pull virtio bugfixes from Michael Tsirkin: "A couple of minor bugfixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_net: fix return value check in receive_mergeable() virtio_mmio: add cleanup for virtio_mmio_remove virtio_mmio: add cleanup for virtio_mmio_probe
2017-12-08Merge tag 'for-linus-4.15-rc3-tag' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Just two small fixes for the new pvcalls frontend driver" * tag 'for-linus-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvcalls: Fix a check in pvcalls_front_remove() xen/pvcalls: check for xenbus_read() errors
2017-12-08Merge tag 'powerpc-4.15-4' of ↵Linus Torvalds9-29/+54
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One notable fix for kexec on Power9, where we were not clearing MMU PID properly which sometimes leads to hangs. Finally debugged to a root cause by Nick. A revert of a patch which tried to rework our panic handling to get more output on the console, but inadvertently broke reporting the panic to the hypervisor, which apparently people care about. Then a fix for an oops in the PMU code, and finally some s/%p/%px/ in xmon. Thanks to: David Gibson, Nicholas Piggin, Ravi Bangoria" * tag 'powerpc-4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/xmon: Don't print hashed pointers in xmon powerpc/64s: Initialize ISAv3 MMU registers before setting partition table Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier" powerpc/perf: Fix oops when grouping different pmu events
2017-12-08Merge tag 'linux-can-fixes-for-4.15-20171208' of ↵David S. Miller6-6/+12
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2017-12-08 this is a pull request of 6 patches for net/master. Martin Kelly provides 5 patches for various USB based CAN drivers, that properly cancel the URBs on adapter unplug, so that the driver doesn't end up in an endless loop. Stephane Grosjean provides a patch to restart the tx queue if zero length packages are transmitted. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08macvlan: fix memory hole in macvlan_devGirish Moodalbail1-1/+1
Move 'macaddr_count' from after 'netpoll' to after 'nest_level' to pack and reduce a memory hole. Fixes: 88ca59d1aaf28c25 (macvlan: remove unused fields in struct macvlan_dev) Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08Merge tag 'wireless-drivers-for-davem-2017-12-08' of ↵David S. Miller15-27/+122
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.15 Second set of fixes for 4.15. This time a lot of iwlwifi patches and two brcmfmac patches. Most important here are the MIC and IVC fixes for iwlwifi to unbreak 9000 series. iwlwifi * fix rate-scaling to not start lowest possible rate * fix the TX queue hang detection for AP/GO modes * fix the TX queue hang timeout in monitor interfaces * fix packet injection * remove a wrong error message when dumping PCI registers * fix race condition with RF-kill * tell mac80211 when the MIC has been stripped (9000 series) * tell mac80211 when the IVC has been stripped (9000 series) * add 2 new PCI IDs, one for 9000 and one for 22000 * fix a queue hang due during a P2P Remain-on-Channel operation brcmfmac * fix a race which sometimes caused a crash during sdio unbind * fix a kernel-doc related build error ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08slip: sl_alloc(): remove unused parameter "dev_t line"Marc Kleine-Budde1-2/+2
The first and only parameter of sl_alloc() is unused, so remove it. Fixes: 5342b77c4123 slip: ("Clean up create and destroy") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08net: mvpp2: fix the RSS table entry offsetAntoine Tenart1-1/+1
The macro used to access or set an RSS table entry was using an offset of 8, while it should use an offset of 0. This lead to wrongly configure the RSS table, not accessing the right entries. Fixes: 1d7d15d79fb4 ("net: mvpp2: initialize the RSS tables") Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08Merge branch 'cxgb4-collect-hardware-logs-via-ethtool'David S. Miller12-324/+944
Rahul Lakkireddy says: ==================== cxgb4: collect hardware logs via ethtool Collect more hardware logs via ethtool --get-dump facility. Patch 1 collects on-chip memory layout information. Patch 2 collects on-chip MC memory dumps. Patch 3 collects HMA memory dump. Patch 4 evaluates and skips TX and RX payload regions in memory dumps. Patch 5 collects egress and ingress SGE queue contexts. Patch 6 collects PCIe configuration logs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08cxgb4: collect PCIe configuration logsRahul Lakkireddy5-0/+54
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08cxgb4: collect egress and ingress SGE queue contextsRahul Lakkireddy6-24/+210
Use meminfo to identify the egress and ingress context regions and fetch all valid contexts from those regions. Also flush all contexts before attempting collection to prevent stale information. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08cxgb4: skip TX and RX payload regions in memory dumpsRahul Lakkireddy2-0/+148
Use meminfo to identify TX and RX payload regions and skip them in collection of EDC, MC, and HMA. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08cxgb4: collect HMA memory dumpRahul Lakkireddy9-4/+51
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08cxgb4: collect MC memory dumpRahul Lakkireddy5-56/+108
Use meminfo to get base address and size of MC memory. Also use same meminfo for EDC memory dumps. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08cxgb4: collect on-chip memory informationRahul Lakkireddy7-241/+374
Collect memory layout of various on-chip memory regions. Move code for collecting on-chip memory information to cudbg_lib.c and update cxgb4_debugfs.c to use the common function. Also include cudbg_entity.h before cudbg_lib.h to avoid adding cudbg entity structure forward declarations in cudbg_lib.h. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08Merge branch 'veth-and-GSO-maximums'David S. Miller2-0/+37
Stephen Hemminger says: ==================== veth and GSO maximums This is the more general way to solving the issue of GSO limits not being set correctly for containers on Azure. If a GSO packet is sent to host that exceeds the limit (reported by NDIS), then the host is forced to do segmentation in software which has noticeable performance impact. The core rtnetlink infrastructure already has the messages and infrastructure to allow changing gso limits. With an updated iproute2 the following already works: # ip li set dev dummy0 gso_max_size 30000 These patches are about making it easier with veth. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08veth: set peer GSO valuesStephen Hemminger1-0/+3
When new veth is created, and GSO values have been configured on one device, clone those values to the peer. For example: # ip link add dev vm1 gso_max_size 65530 type veth peer name vm2 This should create vm1 <--> vm2 with both having GSO maximum size set to 65530. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08rtnetlink: allow GSO maximums to be set on device creationStephen Hemminger1-0/+34
Netlink device already allows changing GSO sizes with ip set command. The part that is missing is allowing overriding GSO settings on device creation. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08net: stmmac: fix broken dma_interrupt handling for multi-queuesNiklas Cassel1-8/+46
There is nothing that says that number of TX queues == number of RX queues. E.g. the ARTPEC-6 SoC has 2 TX queues and 1 RX queue. This code is obviously wrong: for (chan = 0; chan < tx_channel_count; chan++) { struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan]; priv->rx_queue has size MTL_MAX_RX_QUEUES, so this will send an uninitialized napi_struct to __napi_schedule(), causing us to crash in net_rx_action(), because napi_struct->poll is zero. [12846.759880] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [12846.768014] pgd = (ptrval) [12846.770742] [00000000] *pgd=39ec7831, *pte=00000000, *ppte=00000000 [12846.777023] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM [12846.782942] Modules linked in: [12846.785998] CPU: 0 PID: 161 Comm: dropbear Not tainted 4.15.0-rc2-00285-gf5fb5f2f39a7 #36 [12846.794177] Hardware name: Axis ARTPEC-6 Platform [12846.798879] task: (ptrval) task.stack: (ptrval) [12846.803407] PC is at 0x0 [12846.805942] LR is at net_rx_action+0x274/0x43c [12846.810383] pc : [<00000000>] lr : [<80bff064>] psr: 200e0113 [12846.816648] sp : b90d9ae8 ip : b90d9ae8 fp : b90d9b44 [12846.821871] r10: 00000008 r9 : 0013250e r8 : 00000100 [12846.827094] r7 : 0000012c r6 : 00000000 r5 : 00000001 r4 : bac84900 [12846.833619] r3 : 00000000 r2 : b90d9b08 r1 : 00000000 r0 : bac84900 Since each DMA channel can be used for rx and tx simultaneously, the current code should probably be rewritten so that napi_struct is embedded in a new struct stmmac_channel. That way, stmmac_poll() can call stmmac_tx_clean() on just the tx queue where we got the IRQ, instead of looping through all tx queues. This is also how the xgbe driver does it (another driver for this IP). Fixes: c22a3f48ef99 ("net: stmmac: adding multiple napi mechanism") Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08Merge branch 'tcp-RACK-loss-recovery-bug-fixes'David S. Miller2-15/+14
Yuchung Cheng says: ==================== tcp: RACK loss recovery bug fixes This patch set has four minor bug fixes in TCP RACK loss recovery. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08tcp: evaluate packet losses upon RTT changeYuchung Cheng1-11/+8
RACK skips an ACK unless it advances the most recently delivered TX timestamp (rack.mstamp). Since RACK also uses the most recent RTT to decide if a packet is lost, RACK should still run the loss detection whenever the most recent RTT changes. For example, an ACK that does not advance the timestamp but triggers the cwnd undo due to reordering, would then use the most recent (higher) RTT measurement to detect further losses. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08tcp: fix off-by-one bug in RACKYuchung Cheng1-3/+3
RACK should mark a packet lost when remaining wait time is zero. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08tcp: always evaluate losses in RACK upon undoYuchung Cheng1-0/+1
When sender detects spurious retransmission, all packets marked lost are remarked to be in-flight. However some may be considered lost based on its timestamps in RACK. This patch forces RACK to re-evaluate, which may be skipped previously if the ACK does not advance RACK timestamp. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08tcp: correctly test congestion state in RACKYuchung Cheng1-1/+2
RACK does not test the loss recovery state correctly to compute the reordering window. It assumes if lost_out is zero then TCP is not in loss recovery. But it can be zero during recovery before calling tcp_rack_detect_loss(): when an ACK acknowledges all packets marked lost before receiving this ACK, but has not yet to discover new ones by tcp_rack_detect_loss(). The fix is to simply test the congestion state directly. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08net: dsa: lan9303: Protect ALR operations with mutexEgil Hjelmeland2-2/+13
ALR table operations are a sequence of related register operations which should be protected from concurrent access. The alr_cache should also be protected. Add alr_mutex doing that. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08net: sched: fix use-after-free in tcf_block_put_extJiri Pirko1-6/+7
Since the block is freed with last chain being put, once we reach the end of iteration of list_for_each_entry_safe, the block may be already freed. I'm hitting this only by creating and deleting clsact: [ 202.171952] ================================================================== [ 202.180182] BUG: KASAN: use-after-free in tcf_block_put_ext+0x240/0x390 [ 202.187590] Read of size 8 at addr ffff880225539a80 by task tc/796 [ 202.194508] [ 202.196185] CPU: 0 PID: 796 Comm: tc Not tainted 4.15.0-rc2jiri+ #5 [ 202.203200] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016 [ 202.213613] Call Trace: [ 202.216369] dump_stack+0xda/0x169 [ 202.220192] ? dma_virt_map_sg+0x147/0x147 [ 202.224790] ? show_regs_print_info+0x54/0x54 [ 202.229691] ? tcf_chain_destroy+0x1dc/0x250 [ 202.234494] print_address_description+0x83/0x3d0 [ 202.239781] ? tcf_block_put_ext+0x240/0x390 [ 202.244575] kasan_report+0x1ba/0x460 [ 202.248707] ? tcf_block_put_ext+0x240/0x390 [ 202.253518] tcf_block_put_ext+0x240/0x390 [ 202.258117] ? tcf_chain_flush+0x290/0x290 [ 202.262708] ? qdisc_hash_del+0x82/0x1a0 [ 202.267111] ? qdisc_hash_add+0x50/0x50 [ 202.271411] ? __lock_is_held+0x5f/0x1a0 [ 202.275843] clsact_destroy+0x3d/0x80 [sch_ingress] [ 202.281323] qdisc_destroy+0xcb/0x240 [ 202.285445] qdisc_graft+0x216/0x7b0 [ 202.289497] tc_get_qdisc+0x260/0x560 Fix this by holding the block also by chain 0 and put chain 0 explicitly, out of the list_for_each_entry_safe loop at the very end of tcf_block_put_ext. Fixes: efbf78973978 ("net_sched: get rid of rcu_barrier() in tcf_block_put_ext()") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08bnxt_en: Fix sources of spurious netpoll warningsCalvin Owens1-2/+2
After applying 2270bc5da3497945 ("bnxt_en: Fix netpoll handling") and 903649e718f80da2 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."), we still see the following WARN fire: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160 bnxt_poll+0x0/0xd0 exceeded budget in poll <snip> Call Trace: [<ffffffff814be5cd>] dump_stack+0x4d/0x70 [<ffffffff8107e013>] __warn+0xd3/0xf0 [<ffffffff8107e07f>] warn_slowpath_fmt+0x4f/0x60 [<ffffffff8179519a>] netpoll_poll_dev+0x15a/0x160 [<ffffffff81795f38>] netpoll_send_skb_on_dev+0x168/0x250 [<ffffffff817962fc>] netpoll_send_udp+0x2dc/0x440 [<ffffffff815fa9be>] write_ext_msg+0x20e/0x250 [<ffffffff810c8125>] call_console_drivers.constprop.23+0xa5/0x110 [<ffffffff810c9549>] console_unlock+0x339/0x5b0 [<ffffffff810c9a88>] vprintk_emit+0x2c8/0x450 [<ffffffff810c9d5f>] vprintk_default+0x1f/0x30 [<ffffffff81173df5>] printk+0x48/0x50 [<ffffffffa0197713>] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core] [<ffffffffa0197b9b>] edac_mc_handle_error+0x42b/0x6e0 [edac_core] [<ffffffffa01c3a60>] sbridge_mce_output_error+0x410/0x10d0 [sb_edac] [<ffffffffa01c47cc>] sbridge_check_error+0xac/0x130 [sb_edac] [<ffffffffa0197f3c>] edac_mc_workq_function+0x3c/0x90 [edac_core] [<ffffffff81095f8b>] process_one_work+0x19b/0x480 [<ffffffff810967ca>] worker_thread+0x6a/0x520 [<ffffffff8109c7c4>] kthread+0xe4/0x100 [<ffffffff81884c52>] ret_from_fork+0x22/0x40 This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually given a non-zero budget. Signed-off-by: Calvin Owens <calvinowens@fb.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08Merge branch 'lockless-qdisc-series'David S. Miller10-176/+512
John Fastabend says: ==================== lockless qdisc series This series adds support for building lockless qdiscs. This is the result of noticing the qdisc lock is a common hot-spot in perf analysis of the Linux network stack, especially when testing with high packet per second rates. However, nothing is free and most qdiscs rely on the qdisc lock for their data structures so each qdisc must be converted on a case by case basis. In this series, to kick things off, we make pfifo_fast, mq, and mqprio lockless. Follow up series can address additional qdiscs as needed. For example sch_tbf might be useful. To allow this the lockless design is an opt-in flag. In some future utopia we convert all qdiscs and we get to drop this case analysis, but in order to make progress we live in the real-world. There are also a handful of optimizations I have behind this series and a few code cleanups that I couldn't figure out how to fit neatly into this series with out increasing the patch count. Once this is in additional patches can address this. The most notable is in skb_dequeue we can push the consumer lock out a bit and consume multiple skbs off the skb_array in pfifo fast per iteration. Ideally we could push arrays of packets at drivers as well but we would need the infrastructure for this. The other notable improvement is to do less locking in the overrun cases where bad tx queue list and gso_skb are being hit. Although, nice in theory in practice this is the error case and I haven't found a benchmark where this matters yet. For testing... My first test case uses multiple containers (via cilium) where multiple client containers use 'wrk' to benchmark connections with a server container running lighttpd. Where lighttpd is configured to use multiple threads, one per core. Additionally this test has a proxy agent running so all traffic takes an extra hop through a proxy container. In these cases each TCP packet traverses the egress qdisc layer at least four times and the ingress qdisc layer an additional four times. This makes for a good stress test IMO, perf details below. The other micro-benchmark I run is injecting packets directly into qdisc layer using pktgen. This uses the benchmark script, ./pktgen_bench_xmit_mode_queue_xmit.sh Benchmarks taken in two cases, "base" running latest net-next no changes to qdisc layer and "qdisc" tests run with qdisc lockless updates. Numbers reported in req/sec. All virtual 'veth' devices run with pfifo_fast in the qdisc test case. `wrk -t16 -c $conns -d30 "http://[$SERVER_IP4]:80"` conns 16 32 64 1024 ----------------------------------------------- base: 18831 20201 21393 29151 qdisc: 19309 21063 23899 29265 notice in all cases we see performance improvement when running with qdisc case. Microbenchmarks using pktgen are as follows, `pktgen_bench_xmit_mode_queue_xmit.sh -t 1 -i eth2 -c 20000000 base(mq): 2.1Mpps base(pfifo_fast): 2.1Mpps qdisc(mq): 2.6Mpps qdisc(pfifo_fast): 2.6Mpps notice numbers are the same for mq and pfifo_fast because only testing a single thread here. In both tests we see a nice bump in performance gain. The key with 'mq' is it is already per txq ring so contention is minimal in the above cases. Qdiscs such as tbf or htb which have more contention will likely show larger gains when/if lockless versions are implemented. Thanks to everyone who helped with this work especially Daniel Borkmann, Eric Dumazet and Willem de Bruijn for discussing the design and reviewing versions of the code. Changes from the RFC: dropped a couple patches off the end, fixed a bug with skb_queue_walk_safe not unlinking skb in all cases, fixed a lockdep splat with pfifo_fast_destroy not calling *_bh lock variant, addressed _most_ of Willem's comments, there was a bug in the bulk locking (final patch) of the RFC series. @Willem, I left out lockdep annotation for a follow on series to add lockdep more completely, rather than just in code I touched. Comments and feedback welcome. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08net: sched: pfifo_fast use skb_arrayJohn Fastabend2-53/+92
This converts the pfifo_fast qdisc to use the skb_array data structure and set the lockless qdisc bit. pfifo_fast is the first qdisc to support the lockless bit that can be a child of a qdisc requiring locking. So we add logic to clear the lock bit on initialization in these cases when the qdisc graft operation occurs. This also removes the logic used to pick the next band to dequeue from and instead just checks a per priority array for packets from top priority to lowest. This might need to be a bit more clever but seems to work for now. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08net: skb_array: expose peek APIJohn Fastabend1-0/+5
This adds a peek routine to skb_array.h for use with qdisc. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>