Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Two more fixes:
* sometimes, not enough tailroom was allocated for
software-encrypted management frames in mac80211
* cfg80211 regulatory restore got an additional condition,
needs to rerun the checks after that condition changes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The "p" buffer is 0x4000 bytes long. B3_RI_WTO_R1 is 0x190. The value
of "regs->len" is in the 1-0x4000 range. The bug here is that
"regs->len - B3_RI_WTO_R1" can be a negative value which would lead to
memory corruption and an abrupt crash.
Fixes: c3f8be961808 ("[PATCH] skge: expand ethtool debug register dump")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since we now prevent regulatory restore during STA disconnect
if concurrent AP interfaces are active, we need to reschedule
this check when the AP state changes. This fixes never doing
a restore when an AP is the last interface to stop. Or to put
it another way: we need to re-check after anything we check
here changes.
Cc: stable@vger.kernel.org
Fixes: 113f3aaa81bd ("cfg80211: Prevent regulatory restore during STA disconnect in concurrent interfaces")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Some drivers use IEEE80211_KEY_FLAG_SW_MGMT_TX to indicate that management
frames need to be software encrypted. Since normal data packets are still
encrypted by the hardware, crypto_tx_tailroom_needed_cnt gets decremented
after key upload to hw. This can lead to passing skbs to ccmp_encrypt_skb,
which don't have the necessary tailroom for software encryption.
Change the code to add tailroom for encrypted management packets, even if
crypto_tx_tailroom_needed_cnt is 0.
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Alexei Starovoitov says:
====================
v1->v2:
- reworded 2nd patch. It's a real dead lock. Not a false positive
- dropped the lockdep fix for up_read_non_owner in bpf_get_stackid
In addition to preempt_disable patch for socket filters
https://patchwork.ozlabs.org/patch/1032437/
First patch fixes lockdep false positive in percpu_freelist
Second patch fixes potential deadlock in bpf_prog_register
Third patch fixes another potential deadlock in stackmap access
from tracing bpf prog and from syscall.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The map_lookup_elem used to not acquiring spinlock
in order to optimize the reader.
It was true until commit 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
The syscall's map_lookup_elem(stackmap) calls bpf_stackmap_copy().
bpf_stackmap_copy() may find the elem no longer needed after the copy is done.
If that is the case, pcpu_freelist_push() saves this elem for reuse later.
This push requires a spinlock.
If a tracing bpf_prog got run in the middle of the syscall's
map_lookup_elem(stackmap) and this tracing bpf_prog is calling
bpf_get_stackid(stackmap) which also requires the same pcpu_freelist's
spinlock, it may end up with a dead lock situation as reported by
Eric Dumazet in https://patchwork.ozlabs.org/patch/1030266/
The situation is the same as the syscall's map_update_elem() which
needs to acquire the pcpu_freelist's spinlock and could race
with tracing bpf_prog. Hence, this patch fixes it by protecting
bpf_stackmap_copy() with this_cpu_inc(bpf_prog_active)
to prevent tracing bpf_prog from running.
A later syscall's map_lookup_elem commit f1a2e44a3aec ("bpf: add queue and stack maps")
also acquires a spinlock and races with tracing bpf_prog similarly.
Hence, this patch is forward looking and protects the majority
of the map lookups. bpf_map_offload_lookup_elem() is the exception
since it is for network bpf_prog only (i.e. never called by tracing
bpf_prog).
Fixes: 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Lockdep found a potential deadlock between cpu_hotplug_lock, bpf_event_mutex, and cpuctx_mutex:
[ 13.007000] WARNING: possible circular locking dependency detected
[ 13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted
[ 13.008124] ------------------------------------------------------
[ 13.008624] test_progs/246 is trying to acquire lock:
[ 13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
[ 13.009770]
[ 13.009770] but task is already holding lock:
[ 13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[ 13.010877]
[ 13.010877] which lock already depends on the new lock.
[ 13.010877]
[ 13.011532]
[ 13.011532] the existing dependency chain (in reverse order) is:
[ 13.012129]
[ 13.012129] -> #4 (bpf_event_mutex){+.+.}:
[ 13.012582] perf_event_query_prog_array+0x9b/0x130
[ 13.013016] _perf_ioctl+0x3aa/0x830
[ 13.013354] perf_ioctl+0x2e/0x50
[ 13.013668] do_vfs_ioctl+0x8f/0x6a0
[ 13.014003] ksys_ioctl+0x70/0x80
[ 13.014320] __x64_sys_ioctl+0x16/0x20
[ 13.014668] do_syscall_64+0x4a/0x180
[ 13.015007] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.015469]
[ 13.015469] -> #3 (&cpuctx_mutex){+.+.}:
[ 13.015910] perf_event_init_cpu+0x5a/0x90
[ 13.016291] perf_event_init+0x1b2/0x1de
[ 13.016654] start_kernel+0x2b8/0x42a
[ 13.016995] secondary_startup_64+0xa4/0xb0
[ 13.017382]
[ 13.017382] -> #2 (pmus_lock){+.+.}:
[ 13.017794] perf_event_init_cpu+0x21/0x90
[ 13.018172] cpuhp_invoke_callback+0xb3/0x960
[ 13.018573] _cpu_up+0xa7/0x140
[ 13.018871] do_cpu_up+0xa4/0xc0
[ 13.019178] smp_init+0xcd/0xd2
[ 13.019483] kernel_init_freeable+0x123/0x24f
[ 13.019878] kernel_init+0xa/0x110
[ 13.020201] ret_from_fork+0x24/0x30
[ 13.020541]
[ 13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
[ 13.021051] static_key_slow_inc+0xe/0x20
[ 13.021424] tracepoint_probe_register_prio+0x28c/0x300
[ 13.021891] perf_trace_event_init+0x11f/0x250
[ 13.022297] perf_trace_init+0x6b/0xa0
[ 13.022644] perf_tp_event_init+0x25/0x40
[ 13.023011] perf_try_init_event+0x6b/0x90
[ 13.023386] perf_event_alloc+0x9a8/0xc40
[ 13.023754] __do_sys_perf_event_open+0x1dd/0xd30
[ 13.024173] do_syscall_64+0x4a/0x180
[ 13.024519] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.024968]
[ 13.024968] -> #0 (tracepoints_mutex){+.+.}:
[ 13.025434] __mutex_lock+0x86/0x970
[ 13.025764] tracepoint_probe_register_prio+0x2d/0x300
[ 13.026215] bpf_probe_register+0x40/0x60
[ 13.026584] bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[ 13.027042] __do_sys_bpf+0x94f/0x1a90
[ 13.027389] do_syscall_64+0x4a/0x180
[ 13.027727] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.028171]
[ 13.028171] other info that might help us debug this:
[ 13.028171]
[ 13.028807] Chain exists of:
[ 13.028807] tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
[ 13.028807]
[ 13.029666] Possible unsafe locking scenario:
[ 13.029666]
[ 13.030140] CPU0 CPU1
[ 13.030510] ---- ----
[ 13.030875] lock(bpf_event_mutex);
[ 13.031166] lock(&cpuctx_mutex);
[ 13.031645] lock(bpf_event_mutex);
[ 13.032135] lock(tracepoints_mutex);
[ 13.032441]
[ 13.032441] *** DEADLOCK ***
[ 13.032441]
[ 13.032911] 1 lock held by test_progs/246:
[ 13.033239] #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[ 13.033909]
[ 13.033909] stack backtrace:
[ 13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477
[ 13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[ 13.035657] Call Trace:
[ 13.035859] dump_stack+0x5f/0x8b
[ 13.036130] print_circular_bug.isra.37+0x1ce/0x1db
[ 13.036526] __lock_acquire+0x1158/0x1350
[ 13.036852] ? lock_acquire+0x98/0x190
[ 13.037154] lock_acquire+0x98/0x190
[ 13.037447] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.037876] __mutex_lock+0x86/0x970
[ 13.038167] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.038600] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.039028] ? __mutex_lock+0x86/0x970
[ 13.039337] ? __mutex_lock+0x24a/0x970
[ 13.039649] ? bpf_probe_register+0x1d/0x60
[ 13.039992] ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
[ 13.040478] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.040906] tracepoint_probe_register_prio+0x2d/0x300
[ 13.041325] bpf_probe_register+0x40/0x60
[ 13.041649] bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[ 13.042068] ? __might_fault+0x3e/0x90
[ 13.042374] __do_sys_bpf+0x94f/0x1a90
[ 13.042678] do_syscall_64+0x4a/0x180
[ 13.042975] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.043382] RIP: 0033:0x7f23b10a07f9
[ 13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[ 13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
[ 13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
[ 13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
[ 13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[ 13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000
Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
there is no need to take bpf_event_mutex too.
bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
bpf_raw_tracepoints don't need to take this mutex.
Fixes: c4f6699dfcb8 ("bpf: introduce BPF_RAW_TRACEPOINT")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Lockdep warns about false positive:
[ 12.492084] 00000000e6b28347 (&head->lock){+...}, at: pcpu_freelist_push+0x2a/0x40
[ 12.492696] but this lock was taken by another, HARDIRQ-safe lock in the past:
[ 12.493275] (&rq->lock){-.-.}
[ 12.493276]
[ 12.493276]
[ 12.493276] and interrupts could create inverse lock ordering between them.
[ 12.493276]
[ 12.494435]
[ 12.494435] other info that might help us debug this:
[ 12.494979] Possible interrupt unsafe locking scenario:
[ 12.494979]
[ 12.495518] CPU0 CPU1
[ 12.495879] ---- ----
[ 12.496243] lock(&head->lock);
[ 12.496502] local_irq_disable();
[ 12.496969] lock(&rq->lock);
[ 12.497431] lock(&head->lock);
[ 12.497890] <Interrupt>
[ 12.498104] lock(&rq->lock);
[ 12.498368]
[ 12.498368] *** DEADLOCK ***
[ 12.498368]
[ 12.498837] 1 lock held by dd/276:
[ 12.499110] #0: 00000000c58cb2ee (rcu_read_lock){....}, at: trace_call_bpf+0x5e/0x240
[ 12.499747]
[ 12.499747] the shortest dependencies between 2nd lock and 1st lock:
[ 12.500389] -> (&rq->lock){-.-.} {
[ 12.500669] IN-HARDIRQ-W at:
[ 12.500934] _raw_spin_lock+0x2f/0x40
[ 12.501373] scheduler_tick+0x4c/0xf0
[ 12.501812] update_process_times+0x40/0x50
[ 12.502294] tick_periodic+0x27/0xb0
[ 12.502723] tick_handle_periodic+0x1f/0x60
[ 12.503203] timer_interrupt+0x11/0x20
[ 12.503651] __handle_irq_event_percpu+0x43/0x2c0
[ 12.504167] handle_irq_event_percpu+0x20/0x50
[ 12.504674] handle_irq_event+0x37/0x60
[ 12.505139] handle_level_irq+0xa7/0x120
[ 12.505601] handle_irq+0xa1/0x150
[ 12.506018] do_IRQ+0x77/0x140
[ 12.506411] ret_from_intr+0x0/0x1d
[ 12.506834] _raw_spin_unlock_irqrestore+0x53/0x60
[ 12.507362] __setup_irq+0x481/0x730
[ 12.507789] setup_irq+0x49/0x80
[ 12.508195] hpet_time_init+0x21/0x32
[ 12.508644] x86_late_time_init+0xb/0x16
[ 12.509106] start_kernel+0x390/0x42a
[ 12.509554] secondary_startup_64+0xa4/0xb0
[ 12.510034] IN-SOFTIRQ-W at:
[ 12.510305] _raw_spin_lock+0x2f/0x40
[ 12.510772] try_to_wake_up+0x1c7/0x4e0
[ 12.511220] swake_up_locked+0x20/0x40
[ 12.511657] swake_up_one+0x1a/0x30
[ 12.512070] rcu_process_callbacks+0xc5/0x650
[ 12.512553] __do_softirq+0xe6/0x47b
[ 12.512978] irq_exit+0xc3/0xd0
[ 12.513372] smp_apic_timer_interrupt+0xa9/0x250
[ 12.513876] apic_timer_interrupt+0xf/0x20
[ 12.514343] default_idle+0x1c/0x170
[ 12.514765] do_idle+0x199/0x240
[ 12.515159] cpu_startup_entry+0x19/0x20
[ 12.515614] start_kernel+0x422/0x42a
[ 12.516045] secondary_startup_64+0xa4/0xb0
[ 12.516521] INITIAL USE at:
[ 12.516774] _raw_spin_lock_irqsave+0x38/0x50
[ 12.517258] rq_attach_root+0x16/0xd0
[ 12.517685] sched_init+0x2f2/0x3eb
[ 12.518096] start_kernel+0x1fb/0x42a
[ 12.518525] secondary_startup_64+0xa4/0xb0
[ 12.518986] }
[ 12.519132] ... key at: [<ffffffff82b7bc28>] __key.71384+0x0/0x8
[ 12.519649] ... acquired at:
[ 12.519892] pcpu_freelist_pop+0x7b/0xd0
[ 12.520221] bpf_get_stackid+0x1d2/0x4d0
[ 12.520563] ___bpf_prog_run+0x8b4/0x11a0
[ 12.520887]
[ 12.521008] -> (&head->lock){+...} {
[ 12.521292] HARDIRQ-ON-W at:
[ 12.521539] _raw_spin_lock+0x2f/0x40
[ 12.521950] pcpu_freelist_push+0x2a/0x40
[ 12.522396] bpf_get_stackid+0x494/0x4d0
[ 12.522828] ___bpf_prog_run+0x8b4/0x11a0
[ 12.523296] INITIAL USE at:
[ 12.523537] _raw_spin_lock+0x2f/0x40
[ 12.523944] pcpu_freelist_populate+0xc0/0x120
[ 12.524417] htab_map_alloc+0x405/0x500
[ 12.524835] __do_sys_bpf+0x1a3/0x1a90
[ 12.525253] do_syscall_64+0x4a/0x180
[ 12.525659] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 12.526167] }
[ 12.526311] ... key at: [<ffffffff838f7668>] __key.13130+0x0/0x8
[ 12.526812] ... acquired at:
[ 12.527047] __lock_acquire+0x521/0x1350
[ 12.527371] lock_acquire+0x98/0x190
[ 12.527680] _raw_spin_lock+0x2f/0x40
[ 12.527994] pcpu_freelist_push+0x2a/0x40
[ 12.528325] bpf_get_stackid+0x494/0x4d0
[ 12.528645] ___bpf_prog_run+0x8b4/0x11a0
[ 12.528970]
[ 12.529092]
[ 12.529092] stack backtrace:
[ 12.529444] CPU: 0 PID: 276 Comm: dd Not tainted 5.0.0-rc3-00018-g2fa53f892422 #475
[ 12.530043] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[ 12.530750] Call Trace:
[ 12.530948] dump_stack+0x5f/0x8b
[ 12.531248] check_usage_backwards+0x10c/0x120
[ 12.531598] ? ___bpf_prog_run+0x8b4/0x11a0
[ 12.531935] ? mark_lock+0x382/0x560
[ 12.532229] mark_lock+0x382/0x560
[ 12.532496] ? print_shortest_lock_dependencies+0x180/0x180
[ 12.532928] __lock_acquire+0x521/0x1350
[ 12.533271] ? find_get_entry+0x17f/0x2e0
[ 12.533586] ? find_get_entry+0x19c/0x2e0
[ 12.533902] ? lock_acquire+0x98/0x190
[ 12.534196] lock_acquire+0x98/0x190
[ 12.534482] ? pcpu_freelist_push+0x2a/0x40
[ 12.534810] _raw_spin_lock+0x2f/0x40
[ 12.535099] ? pcpu_freelist_push+0x2a/0x40
[ 12.535432] pcpu_freelist_push+0x2a/0x40
[ 12.535750] bpf_get_stackid+0x494/0x4d0
[ 12.536062] ___bpf_prog_run+0x8b4/0x11a0
It has been explained that is a false positive here:
https://lkml.org/lkml/2018/7/25/756
Recap:
- stackmap uses pcpu_freelist
- The lock in pcpu_freelist is a percpu lock
- stackmap is only used by tracing bpf_prog
- A tracing bpf_prog cannot be run if another bpf_prog
has already been running (ensured by the percpu bpf_prog_active counter).
Eric pointed out that this lockdep splats stops other
legit lockdep splats in selftests/bpf/test_progs.c.
Fix this by calling local_irq_save/restore for stackmap.
Another false positive had also been worked around by calling
local_irq_save in commit 89ad2fa3f043 ("bpf: fix lockdep splat").
That commit added unnecessary irq_save/restore to fast path of
bpf hash map. irqs are already disabled at that point, since htab
is holding per bucket spin_lock with irqsave.
Let's reduce overhead for htab by introducing __pcpu_freelist_push/pop
function w/o irqsave and convert pcpu_freelist_push/pop to irqsave
to be used elsewhere (right now only in stackmap).
It stops lockdep false positive in stackmap with a bit of acceptable overhead.
Fixes: 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Disabled preemption is necessary for proper access to per-cpu maps
from BPF programs.
But the sender side of socket filters didn't have preemption disabled:
unix_dgram_sendmsg->sk_filter->sk_filter_trim_cap->bpf_prog_run_save_cb->BPF_PROG_RUN
and a combination of af_packet with tun device didn't disable either:
tpacket_snd->packet_direct_xmit->packet_pick_tx_queue->ndo_select_queue->
tun_select_queue->tun_ebpf_select_queue->bpf_prog_run_clear_cb->BPF_PROG_RUN
Disable preemption before executing BPF programs (both classic and extended).
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Previously, bpf_num_possible_cpus() had a bug when calculating a
number of possible CPUs in the case of sparse CPU allocations, as
it was considering only the first range or element of
/sys/devices/system/cpu/possible.
E.g. in the case of "0,2-3" (CPU 1 is not available), the function
returned 1 instead of 3.
This patch fixes the function by making it parse all CPU ranges and
elements.
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
When calling firmware to allocate a CP ring or NQ, an interrupt associated
with that ring may be generated immediately before the doorbell is even
setup after the firmware call returns. When servicing the interrupt, the
driver may crash when trying to access the doorbell.
Fix it by disabling interrupt on that vector until the doorbell is
set up.
Fixes: 697197e5a173 ("bnxt_en: Re-structure doorbells.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2019-01-31
An update from ieee802154 for your *net* tree.
I waited a while to see if anything else comes up, but it seems this time
we only have one fixup patch for the -rc rounds.
Colin fixed some indentation in the mcr20a drivers. That's about it.
If there are any problems with taking these two before the final 5.0 let
me know.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot was able to catch a bug in rds [1]
The issue here is that the socket might be found in a hash table
but that its refcount has already be set to 0 by another cpu.
We need to use refcount_inc_not_zero() to be safe here.
[1]
refcount_t: increment on 0; use-after-free.
WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked lib/refcount.c:153 [inline]
WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked+0x61/0x70 lib/refcount.c:151
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 23129 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #53
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
panic+0x2cb/0x65c kernel/panic.c:214
__warn.cold+0x20/0x48 kernel/panic.c:571
report_bug+0x263/0x2b0 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:178 [inline]
fixup_bug arch/x86/kernel/traps.c:173 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:refcount_inc_checked lib/refcount.c:153 [inline]
RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:151
Code: 1d 51 63 c8 06 31 ff 89 de e8 eb 1b f2 fd 84 db 75 dd e8 a2 1a f2 fd 48 c7 c7 60 9f 81 88 c6 05 31 63 c8 06 01 e8 af 65 bb fd <0f> 0b eb c1 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49
RSP: 0018:ffff8880a0cbf1e8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90006113000
RDX: 000000000001047d RSI: ffffffff81685776 RDI: 0000000000000005
RBP: ffff8880a0cbf1f8 R08: ffff888097c9e100 R09: ffffed1015ce5021
R10: ffffed1015ce5020 R11: ffff8880ae728107 R12: ffff8880723c20c0
R13: ffff8880723c24b0 R14: dffffc0000000000 R15: ffffed1014197e64
sock_hold include/net/sock.h:647 [inline]
rds_sock_addref+0x19/0x20 net/rds/af_rds.c:675
rds_find_bound+0x97c/0x1080 net/rds/bind.c:82
rds_recv_incoming+0x3be/0x1430 net/rds/recv.c:362
rds_loop_xmit+0xf3/0x2a0 net/rds/loop.c:96
rds_send_xmit+0x1355/0x2a10 net/rds/send.c:355
rds_sendmsg+0x323c/0x44e0 net/rds/send.c:1368
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg+0xdd/0x130 net/socket.c:631
__sys_sendto+0x387/0x5f0 net/socket.c:1788
__do_sys_sendto net/socket.c:1800 [inline]
__se_sys_sendto net/socket.c:1796 [inline]
__x64_sys_sendto+0xe1/0x1a0 net/socket.c:1796
do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458089
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fc266df8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000458089
RDX: 0000000000000000 RSI: 00000000204b3fff RDI: 0000000000000005
RBP: 000000000073bf00 R08: 00000000202b4000 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc266df96d4
R13: 00000000004c56e4 R14: 00000000004d94a8 R15: 00000000ffffffff
Fixes: cc4dfb7f70a3 ("rds: fix two RCU related problems")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: rds-devel@oss.oracle.com
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The test_insert_dup() function from lib/test_rhashtable.c passes a
pointer to a stack object to rhltable_init(). Allocate the hash table
dynamically to avoid that the following is reported with object
debugging enabled:
ODEBUG: object (ptrval) is on stack (ptrval), but NOT annotated.
WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:368 __debug_object_init+0x312/0x480
Modules linked in:
EIP: __debug_object_init+0x312/0x480
Call Trace:
? debug_object_init+0x1a/0x20
? __init_work+0x16/0x30
? rhashtable_init+0x1e1/0x460
? sched_clock_cpu+0x57/0xe0
? rhltable_init+0xb/0x20
? test_insert_dup+0x32/0x20f
? trace_hardirqs_on+0x38/0xf0
? ida_dump+0x10/0x10
? jhash+0x130/0x130
? my_hashfn+0x30/0x30
? test_rht_init+0x6aa/0xab4
? ida_dump+0x10/0x10
? test_rhltable+0xc5c/0xc5c
? do_one_initcall+0x67/0x28e
? trace_hardirqs_off+0x22/0xe0
? restore_all_kernel+0xf/0x70
? trace_hardirqs_on_thunk+0xc/0x10
? restore_all_kernel+0xf/0x70
? kernel_init_freeable+0x142/0x213
? rest_init+0x230/0x230
? kernel_init+0x10/0x110
? schedule_tail_wrapper+0x9/0xc
? ret_from_fork+0x19/0x24
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The size of L2TPv2 header with all optional fields is 14 bytes.
l2tp_udp_recv_core only moves 10 bytes to the linear part of a
skb. This may lead to l2tp_recv_common read data outside of a skb.
This patch make sure that there is at least 14 bytes in the linear
part of a skb to meet the maximum need of l2tp_udp_recv_core and
l2tp_recv_common. The minimum size of both PPP HDLC-like frame and
Ethernet frame is larger than 14 bytes, so we are safe to do so.
Also remove L2TP_HDR_SIZE_NOSEQ, it is unused now.
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
WCN3990 is SNOC, not PCI. This prevents probing WCN3990.
Fixes: 367c899f622c ("ath10k: add bus type check in ath10k_init_hw_params")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Report correct eeprom per channel power value.
Fix chan_vs_power map in mt76x0_get_power_info routine
Fixes: f2a2e819d672 ("mt76x0: remove eeprom dependency from mt76x0_get_power_info")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
The are a couple of statments that are one level too deep, fix this by
removing tabs.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
Jose Abreu says:
====================
net: stmmac: Misc fixes
Some misc fixes for stmmac targeting -net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In stmmac xmit callback we use a different flow for TSO packets but TSO
xmit callback is not disabling the EEE mode.
Fix this by disabling earlier the EEE mode, i.e. before calling the TSO
xmit callback.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The number of TSO enabled channels in HW can be different than the
number of total channels. There is no way to determined, at runtime, the
number of TSO capable channels and its safe to assume that if TSO is
enabled then at least channel 0 will be TSO capable.
Lets always send TSO packets from Queue 0.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If we don't have DT then stmmac_clk will not be available. Let's add a
new Platform Data field so that we can specify the refclk by this mean.
This way we can still use the coalesce command in PCI based setups.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
I ran into the issue that while l3 mode is working fine, l3s mode
does not have any connectivity to kube-apiserver and hence all pods
end up in Error state as well. The ipvlan master device sits on
top of a bond device and hostns traffic to kube-apiserver (also running
in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
where the latter is the address of the bond0. While in l3 mode, a
curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
works fine from hostns, neither of them do in case of l3s. In the
latter only a curl to https://127.0.0.1:37573 appeared to work where
for local addresses of bond0 I saw kernel suddenly starting to emit
ARP requests to query HW address of bond0 which remained unanswered
and neighbor entries in INCOMPLETE state. These ARP requests only
happen while in l3s.
Debugging this further, I found the issue is that l3s mode is piggy-
backing on l3 master device, and in this case local routes are using
l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev
if relevant") and 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be
a loopback"). I found that reverting them back into using the
net->loopback_dev fixed ipvlan l3s connectivity and got everything
working for the CNI.
Now judging from 4fbae7d83c98 ("ipvlan: Introduce l3s mode") and the
l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
on l3 master device is to get the l3mdev_ip_rcv() receive hook for
setting the dst entry of the input route without adding its own
ipvlan specific hacks into the receive path, however, any l3 domain
semantics beyond just that are breaking l3s operation. Note that
ipvlan also has the ability to dynamically switch its internal
operation from l3 to l3s for all ports via ipvlan_set_port_mode()
at runtime. In any case, l3 vs l3s soley distinguishes itself by
'de-confusing' netfilter through switching skb->dev to ipvlan slave
device late in NF_INET_LOCAL_IN before handing the skb to L4.
Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
without any additional l3mdev semantics on top. This should also have
minimal impact since dev->priv_flags is already hot in cache. With
this set, l3s mode is working fine and I also get things like
masquerading pod traffic on the ipvlan master properly working.
[0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
Fixes: 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback")
Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Martynas Pumputis <m@lambda.lt>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use pskb_may_pull() to make sure the optional fields are in skb linear
parts, so we can safely read them later.
It's easy to reproduce the issue with a net driver that supports paged
skb data. Just create a L2TPv3 over IP tunnel and then generates some
network traffic.
Once reproduced, rx err in /sys/kernel/debug/l2tp/tunnels will increase.
Changes in v4:
1. s/l2tp_v3_pull_opt/l2tp_v3_ensure_opt_in_linear/
2. s/tunnel->version != L2TP_HDR_VER_2/tunnel->version == L2TP_HDR_VER_3/
3. Add 'Fixes' in commit messages.
Changes in v3:
1. To keep consistency, move the code out of l2tp_recv_common.
2. Use "net" instead of "net-next", since this is a bug fix.
Changes in v2:
1. Only fix L2TPv3 to make code simple.
To fix both L2TPv3 and L2TPv2, we'd better refactor l2tp_recv_common.
It's complicated to do so.
2. Reloading pointers after pskb_may_pull
Fixes: f7faffa3ff8e ("l2tp: Add L2TPv3 protocol support")
Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support")
Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Call tun_set_real_num_queues() after the increment of tun->numqueues
since the former depends on it. Otherwise, the number of queues is not
correctly accounted for, which results to warnings similar to:
"vnet0 selects TX queue 11, but real number of TX queues is 11".
Fixes: 0b7959b62573 ("tun: publish tfile after it's fully initialized")
Reported-and-tested-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb->cb may contain data from previous layers (in an observed case
IPv4 with L3 Master Device). In the observed scenario, the data in
IPCB(skb)->frags was misinterpreted as IP6CB(skb)->frag_max_size,
eventually caused an unexpected IPv6 fragmentation in ip6_fragment()
through ip6_finish_output().
This patch clears IP6CB(skb), which potentially contains garbage data,
on the SRH ip4ip6 encapsulation.
Fixes: 32d99d0b6702 ("ipv6: sr: add support for ip4ip6 encapsulation")
Signed-off-by: Yohei Kanemaru <yohei.kanemaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Toshiaki Makita says:
====================
virtio_net: Fix problems around XDP tx and napi_tx
While I'm looking into how to account standard tx counters on XDP tx
processing, I found several bugs around XDP tx and napi_tx.
Patch1: Fix oops on error path. Patch2 depends on this.
Patch2: Fix memory corruption on freeing xdp_frames with napi_tx enabled.
Patch3: Minor fix patch5 depends on.
Patch4: Fix memory corruption on processing xdp_frames when XDP is disabled.
Also patch5 depends on this.
Patch5: Fix memory corruption on processing xdp_frames while XDP is being
disabled.
Patch6: Minor fix patch7 depends on.
Patch7: Fix memory corruption on freeing sk_buff or xdp_frames when a normal
queue is reused for XDP and vise versa.
v2:
- patch5: Make rcu_assign_pointer/synchronize_net conditional instead of
_virtnet_set_queues.
- patch7: Use napi_consume_skb() instead of dev_consume_skb_any()
====================
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We do not reset or free up unused buffers when enabling/disabling XDP,
so it can happen that xdp_frames are freed after disabling XDP or
sk_buffs are freed after enabling XDP on xdp tx queues.
Thus we need to handle both forms (xdp_frames and sk_buffs) regardless
of XDP setting.
One way to trigger this problem is to disable XDP when napi_tx is
enabled. In that case, virtnet_xdp_set() calls virtnet_napi_enable()
which kicks NAPI. The NAPI handler will call virtnet_poll_cleantx()
which invokes free_old_xmit_skbs() for queues which have been used by
XDP.
Note that even with this change we need to keep skipping
free_old_xmit_skbs() from NAPI handlers when XDP is enabled, because XDP
tx queues do not aquire queue locks.
- v2: Use napi_consume_skb() instead of dev_consume_skb_any()
Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
put_page() can work as a fallback for freeing xdp_frames, but the
appropriate way is to use xdp_return_frame().
Fixes: cac320c850ef ("virtio_net: convert to use generic xdp_frame and xdp_return_frame API")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 8dcc5b0ab0ec ("virtio_net: fix ndo_xdp_xmit crash towards dev not
ready for XDP") tried to avoid access to unexpected sq while XDP is
disabled, but was not complete.
There was a small window which causes out of bounds sq access in
virtnet_xdp_xmit() while disabling XDP.
An example case of
- curr_queue_pairs = 6 (2 for SKB and 4 for XDP)
- online_cpu_num = xdp_queue_paris = 4
when XDP is enabled:
CPU 0 CPU 1
(Disabling XDP) (Processing redirected XDP frames)
virtnet_xdp_xmit()
virtnet_xdp_set()
_virtnet_set_queues()
set curr_queue_pairs (2)
check if rq->xdp_prog is not NULL
virtnet_xdp_sq(vi)
qp = curr_queue_pairs -
xdp_queue_pairs +
smp_processor_id()
= 2 - 4 + 1 = -1
sq = &vi->sq[qp] // out of bounds access
set xdp_queue_pairs (0)
rq->xdp_prog = NULL
Basically we should not change curr_queue_pairs and xdp_queue_pairs
while someone can read the values. Thus, when disabling XDP, assign NULL
to rq->xdp_prog first, and wait for RCU grace period, then change
xxx_queue_pairs.
Note that we need to keep the current order when enabling XDP though.
- v2: Make rcu_assign_pointer/synchronize_net conditional instead of
_virtnet_set_queues.
Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When XDP is disabled, curr_queue_pairs + smp_processor_id() can be
larger than max_queue_pairs.
There is no guarantee that we have enough XDP send queues dedicated for
each cpu when XDP is disabled, so do not count drops on sq in that case.
Fixes: 5b8f3c8d30a6 ("virtio_net: Add XDP related stats")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When _virtnet_set_queues() failed we did not restore real_num_rx_queues.
Fix this by placing the change of real_num_rx_queues after
_virtnet_set_queues().
This order is also in line with virtnet_set_channels().
Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When napi_tx is enabled, virtnet_poll_cleantx() called
free_old_xmit_skbs() even for xdp send queue.
This is bogus since the queue has xdp_frames, not sk_buffs, thus mangled
device tx bytes counters because skb->len is meaningless value, and even
triggered oops due to general protection fault on freeing them.
Since xdp send queues do not aquire locks, old xdp_frames should be
freed only in virtnet_xdp_xmit(), so just skip free_old_xmit_skbs() for
xdp send queues.
Similarly virtnet_poll_tx() called free_old_xmit_skbs(). This NAPI
handler is called even without calling start_xmit() because cb for tx is
by default enabled. Once the handler is called, it enabled the cb again,
and then the handler would be called again. We don't need this handler
for XDP, so don't enable cb as well as not calling free_old_xmit_skbs().
Also, we need to disable tx NAPI when disabling XDP, so
virtnet_poll_tx() can safely access curr_queue_pairs and
xdp_queue_pairs, which are not atomically updated while disabling XDP.
Fixes: b92f1e6751a6 ("virtio-net: transmit napi")
Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 4e09ff536284 ("virtio-net: disable NAPI only when enabled during
XDP set") tried to fix inappropriate NAPI enabling/disabling when
!netif_running(), but was not complete.
On error path virtio_net could enable NAPI even when !netif_running().
This can cause enabling NAPI twice on virtnet_open(), which would
trigger BUG_ON() in napi_enable().
Fixes: 4941d472bf95b ("virtio-net: do not reset during XDP set")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Lorenzo Bianconi says:
====================
erspan: always reports output key to userspace
Erspan protocol relies on output key to set session id header field.
However TUNNEL_KEY bit is cleared in order to not add key field to
the external GRE header and so the configured o_key is not reported
to userspace.
Fix the issue adding TUNNEL_KEY bit to the o_flags parameter dumping
device info
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As Erspan_v4, Erspan_v6 protocol relies on o_key to configure
session id header field. However TUNNEL_KEY bit is cleared in
ip6erspan_tunnel_xmit since ERSPAN protocol does not set the key field
of the external GRE header and so the configured o_key is not reported
to userspace. The issue can be triggered with the following reproducer:
$ip link add ip6erspan1 type ip6erspan local 2000::1 remote 2000::2 \
key 1 seq erspan_ver 1
$ip link set ip6erspan1 up
ip -d link sh ip6erspan1
ip6erspan1@NONE: <BROADCAST,MULTICAST> mtu 1422 qdisc noop state DOWN mode DEFAULT
link/ether ba:ff:09:24:c3:0e brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
ip6erspan remote 2000::2 local 2000::1 encaplimit 4 flowlabel 0x00000 ikey 0.0.0.1 iseq oseq
Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
ip6gre_fill_info
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Erspan protocol (version 1 and 2) relies on o_key to configure
session id header field. However TUNNEL_KEY bit is cleared in
erspan_xmit since ERSPAN protocol does not set the key field
of the external GRE header and so the configured o_key is not reported
to userspace. The issue can be triggered with the following reproducer:
$ip link add erspan1 type erspan local 192.168.0.1 remote 192.168.0.2 \
key 1 seq erspan_ver 1
$ip link set erspan1 up
$ip -d link sh erspan1
erspan1@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UNKNOWN mode DEFAULT
link/ether 52:aa:99:95:9a:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
erspan remote 192.168.0.2 local 192.168.0.1 ttl inherit ikey 0.0.0.1 iseq oseq erspan_index 0
Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
ipgre_fill_info
Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After a timeout event caused by for example a broadcast storm, when
the MAC and PHY are reset, the BQL TX queue needs to be reset as
well. Otherwise, the device will exhibit severe performance issues
even after the storm has ended.
Co-authored-by: David Gounaris <david.gounaris@infinera.com>
Signed-off-by: Mathias Thore <mathias.thore@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Johannes Berg says:
====================
various compat ioctl fixes
Back a long time ago, I already fixed a few of these by passing
the size of the struct ifreq to do_sock_ioctl(). However, Robert
found more cases, and now it won't be as simple because we'd have
to pass that down all the way to e.g. bond_do_ioctl() which isn't
really feasible.
Therefore, restore the old code.
While looking at why SIOCGIFNAME was broken, I realized that Al
had removed that case - which had been handled in an explicit
separate function - as well, and looking through his work at the
time I saw that bond ioctls were also affected by the erroneous
removal.
I've restored SIOCGIFNAME and bond ioctls by going through the
(now renamed) dev_ifsioc() instead of reintroducing their own
helper functions, which I hope is correct but have only tested
with SIOCGIFNAME.
====================
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Same story as before, these use struct ifreq and thus need
to be read with the shorter version to not cause faults.
Cc: stable@vger.kernel.org
Fixes: f92d4fc95341 ("kill bond_ioctl()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As reported by Robert O'Callahan in
https://bugzilla.kernel.org/show_bug.cgi?id=202273
reverting the previous changes in this area broke
the SIOCGIFNAME ioctl in compat again (I'd previously
fixed it after his previous report of breakage in
https://bugzilla.kernel.org/show_bug.cgi?id=199469).
This is obviously because I fixed SIOCGIFNAME more or
less by accident.
Fix it explicitly now by making it pass through the
restored compat translation code.
Cc: stable@vger.kernel.org
Fixes: 4cf808e7ac32 ("kill dev_ifname32()")
Reported-by: Robert O'Callahan <robert@ocallahan.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit bf4405737f9f ("kill dev_ifsioc()").
This wasn't really unused as implied by the original commit,
it still handles the copy to/from user differently, and the
commit thus caused issues such as
https://bugzilla.kernel.org/show_bug.cgi?id=199469
and
https://bugzilla.kernel.org/show_bug.cgi?id=202273
However, deviating from a strict revert, rename dev_ifsioc()
to compat_ifreq_ioctl() to be clearer as to its purpose and
add a comment.
Cc: stable@vger.kernel.org
Fixes: bf4405737f9f ("kill dev_ifsioc()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 1cebf8f143c2 ("socket: fix struct ifreq
size in compat ioctl"), it's a bugfix for another commit that
I'll revert next.
This is not a 'perfect' revert, I'm keeping some coding style
intact rather than revert to the state with indentation errors.
Cc: stable@vger.kernel.org
Fixes: 1cebf8f143c2 ("socket: fix struct ifreq size in compat ioctl")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Yonghong Song says:
====================
The current btf implementation disallows the typedef of
a func_proto type. This actually is allowed per C standard.
This patch fixed btf verification to permit such types.
Patch #1 fixed the kernel side and Patch #2 fixed
the tools test_btf test.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Fixed one test_btf raw test such that typedef func_proto
is permitted now.
Fixes: 78a2540e8945 ("tools/bpf: Add tests for BTF_KIND_FUNC_PROTO and BTF_KIND_FUNC")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Current implementation does not allow typedef func_proto.
But it is actually allowed.
-bash-4.4$ cat t.c
typedef int (f) (int);
f *g;
-bash-4.4$ clang -O2 -g -c -target bpf t.c -Xclang -target-feature -Xclang +dwarfris
-bash-4.4$ pahole -JV t.o
File t.o:
[1] PTR (anon) type_id=2
[2] TYPEDEF f type_id=3
[3] FUNC_PROTO (anon) return=4 args=(4 (anon))
[4] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
-bash-4.4$
This patch related btf verifier to allow such (typedef func_proto)
patterns.
Fixes: 2667a2626f4d ("bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Pull networking fixes from David Miller:
1) Need to save away the IV across tls async operations, from Dave
Watson.
2) Upon successful packet processing, we should liberate the SKB with
dev_consume_skb{_irq}(). From Yang Wei.
3) Only apply RX hang workaround on effected macb chips, from Harini
Katakam.
4) Dummy netdev need a proper namespace assigned to them, from Josh
Elsasser.
5) Some paths of nft_compat run lockless now, and thus we need to use a
proper refcnt_t. From Florian Westphal.
6) Avoid deadlock in mlx5 by doing IRQ locking, from Moni Shoua.
7) netrom does not refcount sockets properly wrt. timers, fix that by
using the sock timer API. From Cong Wang.
8) Fix locking of inexact inserts of xfrm policies, from Florian
Westphal.
9) Missing xfrm hash generation bump, also from Florian.
10) Missing of_node_put() in hns driver, from Yonglong Liu.
11) Fix DN_IFREQ_SIZE, from Johannes Berg.
12) ip6mr notifier is invoked during traversal of wrong table, from Nir
Dotan.
13) TX promisc settings not performed correctly in qed, from Manish
Chopra.
14) Fix OOB access in vhost, from Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
MAINTAINERS: Add entry for XDP (eXpress Data Path)
net: set default network namespace in init_dummy_netdev()
net: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles
net: caif: call dev_consume_skb_any when skb xmit done
net: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: macb: Apply RXUBR workaround only to versions with errata
net: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq
net: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq
net: tls: Fix deadlock in free_resources tx
net: tls: Save iv in tls_rec for async crypto requests
vhost: fix OOB in get_rx_bufs()
qed: Fix stack out of bounds bug
qed: Fix system crash in ll2 xmit
qed: Fix VF probe failure while FLR
qed: Fix LACP pdu drops for VFs
qed: Fix bug in tx promiscuous mode settings
net: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
netfilter: ipt_CLUSTERIP: fix warning unused variable cn
...
|
|
Add multiple people as maintainers for XDP, sorted alphabetically.
XDP is also tied to driver level support and code, but we cannot add all
drivers to the list. Instead K: and N: match on 'xdp' in hope to catch some
of those changes in drivers.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Assign a default net namespace to netdevs created by init_dummy_netdev().
Fixes a NULL pointer dereference caused by busy-polling a socket bound to
an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
if napi_poll() received packets:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
IP: napi_busy_loop+0xd6/0x200
Call Trace:
sock_poll+0x5e/0x80
do_sys_poll+0x324/0x5a0
SyS_poll+0x6c/0xf0
do_syscall_64+0x6b/0x1f0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: 7db6b048da3b ("net: Commonize busy polling code to focus on napi_id instead of socket")
Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The skb should be freed by dev_consume_skb_any() in b44_start_xmit()
when bounce_skb is used. The skb is be replaced by bounce_skb, so the
original skb should be consumed(not drop).
dev_consume_skb_irq() should be called in b44_tx() when skb xmit
done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|