Age | Commit message (Collapse) | Author | Files | Lines |
|
Add a key flag to indicates that the device only needs
MIC space and not a real MIC.
In such cases, keep the MIC zeroed for ease of debug.
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
While the change between 802.11-2012 and 802.11-2016 to move from
requiring APs to set the two top bits to now requiring them to be
cleared was apparently unintentional and will be fixed, clients
should either way assume that the top five bits are reserved and
ignore them.
Implement that in mac80211.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This mostly serves as an example for how to add error strings
and erroneous attribute pointers.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Both cfg80211_rx_mgmt and cfg80211_report_obss_beacon functions send
reports to userspace using NL80211_ATTR_RX_SIGNAL_DBM attribute w/o
any processing of their input signal values. Which means that in
order to match userspace tools expectations, input signal values
for those functions are supposed to be in dBm units.
This patch cleans up comments, variable names, and trace reports
for those functions, replacing confusing 'mBm' by 'dBm'.
Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add support for drivers that implement static WEP internally for IBSS.
Add the WEP keys to the IBSS params struct, that will allow the driver
to use the keys in the join flow, and not only after the connection.
Signed-off-by: Tova Mussai <tova.mussai@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In the ieee80211_setup_sdata() we check if the interface type is valid
and, if not, call BUG(). This should never happen, but if there is
something wrong with the code, it will not be caught until the bug
happens when an interface is being set up. Calling BUG() is too
extreme for this and a WARN_ON() would be better used instead. Change
that.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Currently the restart flow enables RX back, and then proceeds
to tear down RX and TX aggregations.
The TX aggregation tear down calls synchronize_net(), which
waits for packet receiving to be done.
This is done for every session, while RX processing is already
active, and in some reproductions it takes up to 3 seconds.
Add a call once in the restart_work, before we have traffic
active again, and remove the subsequent calls when tearing
down the aggregation.
This requires to move down the code that turns off the
reconfig flag in order to be able to test it in
_ieee80211_stop_tx_ba_session().
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The 2016 version of the spec is more generic about when the
AP should update the power management state of the peer:
the AP shall update the state based on any management or
data frames. This means that even non-bufferable management
frames should be looked at to update to maintain the power
management state of the peer.
This can avoid problematic cases for example if a station
disappears while being asleep and then re-appears. The AP
would remember it as in power save, but the Authentication
frame couldn't be used to set the peer as awake again.
Note that this issues wasn't really critical since at some
point (after the association) we would have removed the
station and created another one with all the states cleared.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Enforce using PS_MANUAL_POLL in ps hwsim debugfs to trigger a poll,
only if PS_ENABLED was set before.
This is required due to commit c9491367b759 ("mac80211: always update the
PM state of a peer on MGMT / DATA frames") that enforces the ap to
check only mgmt/data frames ps bit, and then update station's power save
accordingly.
When sending only ps-poll (control frame) the ap will not be aware that
the station entered power save.
Setting ps enable before triggering ps_poll, will send NDP with PM bit
enabled first.
Signed-off-by: Adiel Aloni <adiel.aloni@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The function is only used with the file, so make it static.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Allow drivers to set the buffer station extended capability
for TDLS links, with a new hardware flag indicating this.
Signed-off-by: Yingying Tang <yintang@qti.qualcomm.com>
[change commit log/documentation wording]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There's no need to re-lookup the data structures now that
we actually get them immediately with from_timer(), just
avoid that. The struct has to be valid anyway, otherwise
the timer object itself would no longer be valid, and we
can't have a different version of the struct since only a
single session per TID is permitted.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Notice that in some cases I replaced "fall through on else" and
"otherwise fall through" comments with just a "fall through" comment,
which is what GCC is expecting to find.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Conflict was two parallel additions of include files to sch_generic.c,
no biggie.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 4675ff05de2d ("kmemcheck: rip it out") has removed the code but
for some reason SPDX header stayed in place. This looks like a rebase
mistake in the mmotm tree or the merge mistake. Let's drop those
leftovers as well.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull networking fixes from David Miller:
1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb
drivers).
2) Revert returning -EEXIST from __dev_alloc_name() as this propagates
to userspace and broke some apps. From Johannes Berg.
3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong
Wang.
4) Gianfar MAC can't do EEE so don't advertise it by default, from
Claudiu Manoil.
5) Relax strict netlink attribute validation, but emit a warning. From
David Ahern.
6) Fix regression in checksum offload of thunderx driver, from Florian
Westphal.
7) Fix UAPI bpf issues on s390, from Hendrik Brueckner.
8) New card support in iwlwifi, from Ihab Zhaika.
9) BBR congestion control bug fixes from Neal Cardwell.
10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren.
11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan.
12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni.
13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
net: mvpp2: fix the RSS table entry offset
tcp: evaluate packet losses upon RTT change
tcp: fix off-by-one bug in RACK
tcp: always evaluate losses in RACK upon undo
tcp: correctly test congestion state in RACK
bnxt_en: Fix sources of spurious netpoll warnings
tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
tcp_bbr: reset full pipe detection on loss recovery undo
tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
sfc: pass valid pointers from efx_enqueue_unwind
gianfar: Disable EEE autoneg by default
tcp: invalidate rate samples during SACK reneging
can: peak/pcie_fd: fix potential bug in restarting tx queue
can: usb_8dev: cancel urb on -EPIPE and -EPROTO
can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
can: esd_usb2: cancel urb on -EPIPE and -EPROTO
can: ems_usb: cancel urb on -EPIPE and -EPROTO
can: mcba_usb: cancel urb on -EPROTO
usbnet: fix alignment for frames with no ethernet header
tcp: use current time in tcp_rcv_space_adjust()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A series of fixes for the media subsytem:
- The largest amount of fixes in this series is with regards to
comments that aren't kernel-doc, but start with "/**".
A new check added for 4.15 makes it to produce a *huge* amount of
new warnings (I'm compiling here with W=1). Most of the patches in
this series fix those.
No code changes - just comment changes at the source files
- rc: some fixed in order to better handle RC repetition codes
- v4l-async: use the v4l2_dev from the root notifier when matching
sub-devices
- v4l2-fwnode: Check subdev count after checking port
- ov 13858 and et8ek8: compilation fix with randconfigs
- usbtv: a trivial new USB ID addition
- dibusb-common: don't do DMA on stack on firmware load
- imx274: Fix error handling, add MAINTAINERS entry
- sir_ir: detect presence of port"
* tag 'media/v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (50 commits)
media: imx274: Fix error handling, add MAINTAINERS entry
media: v4l: async: use the v4l2_dev from the root notifier when matching sub-devices
media: v4l2-fwnode: Check subdev count after checking port
media: et8ek8: select V4L2_FWNODE
media: ov13858: Select V4L2_FWNODE
media: rc: partial revert of "media: rc: per-protocol repeat period"
media: dvb: i2c transfers over usb cannot be done from stack
media: dvb-frontends: complete kernel-doc markups
media: docs: add documentation for frontend attach info
media: dvb_frontends: fix kernel-doc macros
media: drivers: remove "/**" from non-kernel-doc comments
media: lm3560: add a missing kernel-doc parameter
media: rcar_jpu: fix two kernel-doc markups
media: vsp1: add a missing kernel-doc parameter
media: soc_camera: fix a kernel-doc markup
media: mt2063: fix some kernel-doc warnings
media: radio-wl1273: fix a parameter name at kernel-doc macro
media: s3c-camif: add missing description at s3c_camif_find_format()
media: mtk-vpu: add description for wdt fields at struct mtk_vpu
media: vdec: fix some kernel-doc warnings
...
|
|
git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This pull is a bit larger than I'd like but a large bunch of it is
license fixes, AMD wanted to fix the licenses for a bunch of files
that were missing them,
Otherwise a bunch of TTM regression fix since the hugepage support,
some i915 and gvt fixes, a core connector free in a safe context fix,
and one bridge fix"
* tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux: (26 commits)
drm/bridge: analogix dp: Fix runtime PM state in get_modes() callback
Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk"
drm/vc4: Fix false positive WARN() backtrace on refcount_inc() usage
drm/i915: Call i915_gem_init_userptr() before taking struct_mutex
drm/exynos: remove unnecessary function declaration
drm/exynos: remove unnecessary descrptions
drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU
drm/exynos: Fix dma-buf import
drm/ttm: swap consecutive allocated pooled pages v4
drm: safely free connectors from connector_iter
drm/i915/gvt: set max priority for gvt context
drm/i915/gvt: Don't mark vgpu context as inactive when preempted
drm/i915/gvt: Limit read hw reg to active vgpu
drm/i915/gvt: Export intel_gvt_render_mmio_to_ring_id()
drm/i915/gvt: Emulate PCI expansion ROM base address register
drm/ttm: swap consecutive allocated cached pages v3
drm/ttm: roundup the shrink request to prevent skip huge pool
drm/ttm: add page order support in ttm_pages_put
drm/ttm: add set_pages_wb for handling page order more than zero
drm/ttm: add page order in page pool
...
|
|
Pull md fixes from Shaohua Li:
"Some MD fixes.
The notable one is a raid5-cache deadlock bug with dm-raid, others are
not significant"
* tag 'md/4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/raid1/10: add missed blk plug
md: limit mdstat resync progress to max_sectors
md/r5cache: move mddev_lock() out of r5c_journal_mode_set()
md/raid5: correct degraded calculation in raid5_error
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
"Another set of DT fixes:
- Fixes from overlay code rework. A trifecta of fixes to the locking,
an out of bounds access, and a memory leak in of_overlay_apply()
- Clean-up at25 eeprom binding document
- Remove leading '0x' in unit-addresses from binding docs"
* tag 'devicetree-fixes-for-4.15-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: overlay: Make node skipping in init_overlay_changeset() clearer
of: overlay: Fix out-of-bounds write in init_overlay_changeset()
of: overlay: Fix (un)locking in of_overlay_apply()
of: overlay: Fix memory leak in of_overlay_apply() error path
dt-bindings: eeprom: at25: Document device-specific compatible values
dt-bindings: eeprom: at25: Grammar s/are can/can/
dt-bindings: Remove leading 0x from bindings notation
of: overlay: Remove else after goto
of: Spelling s/changset/changeset/
of: unittest: Remove bogus overlay mutex release from overlay_data_add()
|
|
Pull virtio bugfixes from Michael Tsirkin:
"A couple of minor bugfixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_net: fix return value check in receive_mergeable()
virtio_mmio: add cleanup for virtio_mmio_remove
virtio_mmio: add cleanup for virtio_mmio_probe
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Just two small fixes for the new pvcalls frontend driver"
* tag 'for-linus-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/pvcalls: Fix a check in pvcalls_front_remove()
xen/pvcalls: check for xenbus_read() errors
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One notable fix for kexec on Power9, where we were not clearing MMU
PID properly which sometimes leads to hangs. Finally debugged to a
root cause by Nick.
A revert of a patch which tried to rework our panic handling to get
more output on the console, but inadvertently broke reporting the
panic to the hypervisor, which apparently people care about.
Then a fix for an oops in the PMU code, and finally some s/%p/%px/ in
xmon.
Thanks to: David Gibson, Nicholas Piggin, Ravi Bangoria"
* tag 'powerpc-4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/xmon: Don't print hashed pointers in xmon
powerpc/64s: Initialize ISAv3 MMU registers before setting partition table
Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"
powerpc/perf: Fix oops when grouping different pmu events
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2017-12-08
this is a pull request of 6 patches for net/master.
Martin Kelly provides 5 patches for various USB based CAN drivers, that
properly cancel the URBs on adapter unplug, so that the driver doesn't
end up in an endless loop. Stephane Grosjean provides a patch to restart
the tx queue if zero length packages are transmitted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move 'macaddr_count' from after 'netpoll' to after 'nest_level' to pack
and reduce a memory hole.
Fixes: 88ca59d1aaf28c25 (macvlan: remove unused fields in struct macvlan_dev)
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.15
Second set of fixes for 4.15. This time a lot of iwlwifi patches and
two brcmfmac patches. Most important here are the MIC and IVC fixes
for iwlwifi to unbreak 9000 series.
iwlwifi
* fix rate-scaling to not start lowest possible rate
* fix the TX queue hang detection for AP/GO modes
* fix the TX queue hang timeout in monitor interfaces
* fix packet injection
* remove a wrong error message when dumping PCI registers
* fix race condition with RF-kill
* tell mac80211 when the MIC has been stripped (9000 series)
* tell mac80211 when the IVC has been stripped (9000 series)
* add 2 new PCI IDs, one for 9000 and one for 22000
* fix a queue hang due during a P2P Remain-on-Channel operation
brcmfmac
* fix a race which sometimes caused a crash during sdio unbind
* fix a kernel-doc related build error
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The first and only parameter of sl_alloc() is unused, so remove it.
Fixes: 5342b77c4123 slip: ("Clean up create and destroy")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The macro used to access or set an RSS table entry was using an offset
of 8, while it should use an offset of 0. This lead to wrongly configure
the RSS table, not accessing the right entries.
Fixes: 1d7d15d79fb4 ("net: mvpp2: initialize the RSS tables")
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rahul Lakkireddy says:
====================
cxgb4: collect hardware logs via ethtool
Collect more hardware logs via ethtool --get-dump facility.
Patch 1 collects on-chip memory layout information.
Patch 2 collects on-chip MC memory dumps.
Patch 3 collects HMA memory dump.
Patch 4 evaluates and skips TX and RX payload regions in memory dumps.
Patch 5 collects egress and ingress SGE queue contexts.
Patch 6 collects PCIe configuration logs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use meminfo to identify the egress and ingress context regions and
fetch all valid contexts from those regions. Also flush all contexts
before attempting collection to prevent stale information.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use meminfo to identify TX and RX payload regions and skip them in
collection of EDC, MC, and HMA.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use meminfo to get base address and size of MC memory. Also use same
meminfo for EDC memory dumps.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Collect memory layout of various on-chip memory regions. Move code
for collecting on-chip memory information to cudbg_lib.c and update
cxgb4_debugfs.c to use the common function. Also include
cudbg_entity.h before cudbg_lib.h to avoid adding cudbg entity
structure forward declarations in cudbg_lib.h.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Stephen Hemminger says:
====================
veth and GSO maximums
This is the more general way to solving the issue of GSO limits
not being set correctly for containers on Azure. If a GSO packet
is sent to host that exceeds the limit (reported by NDIS), then
the host is forced to do segmentation in software which has noticeable
performance impact.
The core rtnetlink infrastructure already has the messages and
infrastructure to allow changing gso limits. With an updated iproute2
the following already works:
# ip li set dev dummy0 gso_max_size 30000
These patches are about making it easier with veth.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When new veth is created, and GSO values have been configured
on one device, clone those values to the peer.
For example:
# ip link add dev vm1 gso_max_size 65530 type veth peer name vm2
This should create vm1 <--> vm2 with both having GSO maximum
size set to 65530.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Netlink device already allows changing GSO sizes with
ip set command. The part that is missing is allowing overriding
GSO settings on device creation.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is nothing that says that number of TX queues == number of RX
queues. E.g. the ARTPEC-6 SoC has 2 TX queues and 1 RX queue.
This code is obviously wrong:
for (chan = 0; chan < tx_channel_count; chan++) {
struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan];
priv->rx_queue has size MTL_MAX_RX_QUEUES, so this will send an
uninitialized napi_struct to __napi_schedule(), causing us to
crash in net_rx_action(), because napi_struct->poll is zero.
[12846.759880] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[12846.768014] pgd = (ptrval)
[12846.770742] [00000000] *pgd=39ec7831, *pte=00000000, *ppte=00000000
[12846.777023] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM
[12846.782942] Modules linked in:
[12846.785998] CPU: 0 PID: 161 Comm: dropbear Not tainted 4.15.0-rc2-00285-gf5fb5f2f39a7 #36
[12846.794177] Hardware name: Axis ARTPEC-6 Platform
[12846.798879] task: (ptrval) task.stack: (ptrval)
[12846.803407] PC is at 0x0
[12846.805942] LR is at net_rx_action+0x274/0x43c
[12846.810383] pc : [<00000000>] lr : [<80bff064>] psr: 200e0113
[12846.816648] sp : b90d9ae8 ip : b90d9ae8 fp : b90d9b44
[12846.821871] r10: 00000008 r9 : 0013250e r8 : 00000100
[12846.827094] r7 : 0000012c r6 : 00000000 r5 : 00000001 r4 : bac84900
[12846.833619] r3 : 00000000 r2 : b90d9b08 r1 : 00000000 r0 : bac84900
Since each DMA channel can be used for rx and tx simultaneously,
the current code should probably be rewritten so that napi_struct is
embedded in a new struct stmmac_channel.
That way, stmmac_poll() can call stmmac_tx_clean() on just the tx queue
where we got the IRQ, instead of looping through all tx queues.
This is also how the xgbe driver does it (another driver for this IP).
Fixes: c22a3f48ef99 ("net: stmmac: adding multiple napi mechanism")
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Yuchung Cheng says:
====================
tcp: RACK loss recovery bug fixes
This patch set has four minor bug fixes in TCP RACK loss recovery.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RACK skips an ACK unless it advances the most recently delivered
TX timestamp (rack.mstamp). Since RACK also uses the most recent
RTT to decide if a packet is lost, RACK should still run the
loss detection whenever the most recent RTT changes. For example,
an ACK that does not advance the timestamp but triggers the cwnd
undo due to reordering, would then use the most recent (higher)
RTT measurement to detect further losses.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RACK should mark a packet lost when remaining wait time is zero.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When sender detects spurious retransmission, all packets
marked lost are remarked to be in-flight. However some may
be considered lost based on its timestamps in RACK. This patch
forces RACK to re-evaluate, which may be skipped previously if
the ACK does not advance RACK timestamp.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RACK does not test the loss recovery state correctly to compute
the reordering window. It assumes if lost_out is zero then TCP is
not in loss recovery. But it can be zero during recovery before
calling tcp_rack_detect_loss(): when an ACK acknowledges all
packets marked lost before receiving this ACK, but has not yet
to discover new ones by tcp_rack_detect_loss(). The fix is to
simply test the congestion state directly.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ALR table operations are a sequence of related register operations which
should be protected from concurrent access. The alr_cache should also be
protected. Add alr_mutex doing that.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since the block is freed with last chain being put, once we reach the
end of iteration of list_for_each_entry_safe, the block may be
already freed. I'm hitting this only by creating and deleting clsact:
[ 202.171952] ==================================================================
[ 202.180182] BUG: KASAN: use-after-free in tcf_block_put_ext+0x240/0x390
[ 202.187590] Read of size 8 at addr ffff880225539a80 by task tc/796
[ 202.194508]
[ 202.196185] CPU: 0 PID: 796 Comm: tc Not tainted 4.15.0-rc2jiri+ #5
[ 202.203200] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
[ 202.213613] Call Trace:
[ 202.216369] dump_stack+0xda/0x169
[ 202.220192] ? dma_virt_map_sg+0x147/0x147
[ 202.224790] ? show_regs_print_info+0x54/0x54
[ 202.229691] ? tcf_chain_destroy+0x1dc/0x250
[ 202.234494] print_address_description+0x83/0x3d0
[ 202.239781] ? tcf_block_put_ext+0x240/0x390
[ 202.244575] kasan_report+0x1ba/0x460
[ 202.248707] ? tcf_block_put_ext+0x240/0x390
[ 202.253518] tcf_block_put_ext+0x240/0x390
[ 202.258117] ? tcf_chain_flush+0x290/0x290
[ 202.262708] ? qdisc_hash_del+0x82/0x1a0
[ 202.267111] ? qdisc_hash_add+0x50/0x50
[ 202.271411] ? __lock_is_held+0x5f/0x1a0
[ 202.275843] clsact_destroy+0x3d/0x80 [sch_ingress]
[ 202.281323] qdisc_destroy+0xcb/0x240
[ 202.285445] qdisc_graft+0x216/0x7b0
[ 202.289497] tc_get_qdisc+0x260/0x560
Fix this by holding the block also by chain 0 and put chain 0
explicitly, out of the list_for_each_entry_safe loop at the very
end of tcf_block_put_ext.
Fixes: efbf78973978 ("net_sched: get rid of rcu_barrier() in tcf_block_put_ext()")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After applying 2270bc5da3497945 ("bnxt_en: Fix netpoll handling") and
903649e718f80da2 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."),
we still see the following WARN fire:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160
bnxt_poll+0x0/0xd0 exceeded budget in poll
<snip>
Call Trace:
[<ffffffff814be5cd>] dump_stack+0x4d/0x70
[<ffffffff8107e013>] __warn+0xd3/0xf0
[<ffffffff8107e07f>] warn_slowpath_fmt+0x4f/0x60
[<ffffffff8179519a>] netpoll_poll_dev+0x15a/0x160
[<ffffffff81795f38>] netpoll_send_skb_on_dev+0x168/0x250
[<ffffffff817962fc>] netpoll_send_udp+0x2dc/0x440
[<ffffffff815fa9be>] write_ext_msg+0x20e/0x250
[<ffffffff810c8125>] call_console_drivers.constprop.23+0xa5/0x110
[<ffffffff810c9549>] console_unlock+0x339/0x5b0
[<ffffffff810c9a88>] vprintk_emit+0x2c8/0x450
[<ffffffff810c9d5f>] vprintk_default+0x1f/0x30
[<ffffffff81173df5>] printk+0x48/0x50
[<ffffffffa0197713>] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core]
[<ffffffffa0197b9b>] edac_mc_handle_error+0x42b/0x6e0 [edac_core]
[<ffffffffa01c3a60>] sbridge_mce_output_error+0x410/0x10d0 [sb_edac]
[<ffffffffa01c47cc>] sbridge_check_error+0xac/0x130 [sb_edac]
[<ffffffffa0197f3c>] edac_mc_workq_function+0x3c/0x90 [edac_core]
[<ffffffff81095f8b>] process_one_work+0x19b/0x480
[<ffffffff810967ca>] worker_thread+0x6a/0x520
[<ffffffff8109c7c4>] kthread+0xe4/0x100
[<ffffffff81884c52>] ret_from_fork+0x22/0x40
This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting
in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually
given a non-zero budget.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
John Fastabend says:
====================
lockless qdisc series
This series adds support for building lockless qdiscs. This is
the result of noticing the qdisc lock is a common hot-spot in
perf analysis of the Linux network stack, especially when testing
with high packet per second rates. However, nothing is free and
most qdiscs rely on the qdisc lock for their data structures so
each qdisc must be converted on a case by case basis. In this
series, to kick things off, we make pfifo_fast, mq, and mqprio
lockless. Follow up series can address additional qdiscs as needed.
For example sch_tbf might be useful. To allow this the lockless
design is an opt-in flag. In some future utopia we convert all
qdiscs and we get to drop this case analysis, but in order to
make progress we live in the real-world.
There are also a handful of optimizations I have behind this
series and a few code cleanups that I couldn't figure out how
to fit neatly into this series with out increasing the patch
count. Once this is in additional patches can address this. The
most notable is in skb_dequeue we can push the consumer lock
out a bit and consume multiple skbs off the skb_array in pfifo
fast per iteration. Ideally we could push arrays of packets at
drivers as well but we would need the infrastructure for this.
The other notable improvement is to do less locking in the
overrun cases where bad tx queue list and gso_skb are being
hit. Although, nice in theory in practice this is the error
case and I haven't found a benchmark where this matters yet.
For testing...
My first test case uses multiple containers (via cilium) where
multiple client containers use 'wrk' to benchmark connections with
a server container running lighttpd. Where lighttpd is configured
to use multiple threads, one per core. Additionally this test has
a proxy agent running so all traffic takes an extra hop through a
proxy container. In these cases each TCP packet traverses the egress
qdisc layer at least four times and the ingress qdisc layer an
additional four times. This makes for a good stress test IMO, perf
details below.
The other micro-benchmark I run is injecting packets directly into
qdisc layer using pktgen. This uses the benchmark script,
./pktgen_bench_xmit_mode_queue_xmit.sh
Benchmarks taken in two cases, "base" running latest net-next no
changes to qdisc layer and "qdisc" tests run with qdisc lockless
updates. Numbers reported in req/sec. All virtual 'veth' devices
run with pfifo_fast in the qdisc test case.
`wrk -t16 -c $conns -d30 "http://[$SERVER_IP4]:80"`
conns 16 32 64 1024
-----------------------------------------------
base: 18831 20201 21393 29151
qdisc: 19309 21063 23899 29265
notice in all cases we see performance improvement when running
with qdisc case.
Microbenchmarks using pktgen are as follows,
`pktgen_bench_xmit_mode_queue_xmit.sh -t 1 -i eth2 -c 20000000
base(mq): 2.1Mpps
base(pfifo_fast): 2.1Mpps
qdisc(mq): 2.6Mpps
qdisc(pfifo_fast): 2.6Mpps
notice numbers are the same for mq and pfifo_fast because only
testing a single thread here. In both tests we see a nice bump
in performance gain. The key with 'mq' is it is already per
txq ring so contention is minimal in the above cases. Qdiscs
such as tbf or htb which have more contention will likely show
larger gains when/if lockless versions are implemented.
Thanks to everyone who helped with this work especially Daniel
Borkmann, Eric Dumazet and Willem de Bruijn for discussing the
design and reviewing versions of the code.
Changes from the RFC: dropped a couple patches off the end,
fixed a bug with skb_queue_walk_safe not unlinking skb in all
cases, fixed a lockdep splat with pfifo_fast_destroy not calling
*_bh lock variant, addressed _most_ of Willem's comments, there
was a bug in the bulk locking (final patch) of the RFC series.
@Willem, I left out lockdep annotation for a follow on series
to add lockdep more completely, rather than just in code I
touched.
Comments and feedback welcome.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This converts the pfifo_fast qdisc to use the skb_array data structure
and set the lockless qdisc bit. pfifo_fast is the first qdisc to support
the lockless bit that can be a child of a qdisc requiring locking. So
we add logic to clear the lock bit on initialization in these cases when
the qdisc graft operation occurs.
This also removes the logic used to pick the next band to dequeue from
and instead just checks a per priority array for packets from top priority
to lowest. This might need to be a bit more clever but seems to work
for now.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds a peek routine to skb_array.h for use with qdisc.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|