summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2021-02-08perf powerpc: Support exposing Performance Monitor Counter SPRs as part of ↵Athira Rajeev3-6/+34
extended regs To enable presenting of Performance Monitor Counter Registers (PMC1 to PMC6) as part of extended regsiters, this patch adds these to sample_reg_mask in the tool side (to use with -I? option). Simplified the PERF_REG_PMU_MASK_300/31 definition. Excluded the unsupported SPRs (MMCR3, SIER2, SIER3) from extended mask value for CPU_FTR_ARCH_300. Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08perf probe: Add protection to avoid endless loopJianlin Lv1-2/+6
if dwarf_offdie() returns NULL, the continue statement forces the next iteration of the loop without updating the 'off' variable. It will cause an endless loop in the process of traversing the compile unit. So add exception protection for looping CUs. Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: jianlin.lv@arm.com Link: http://lore.kernel.org/lkml/20210203145702.1219509-1-Jianlin.Lv@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf trace-event-info: Rename for_each_event.Ian Rogers1-5/+5
Avoid a naming conflict with for_each_event with similar code in parse-events.c, rename to for_each_event_tps. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210203052659.2975736-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo368-1550/+3502
To pick up fixes Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf powerpc: Fix gap between kernel end and module startAthira Rajeev2-0/+25
Running "perf mem report" in TUI mode fails with ENOMEM message in powerpc: failed to process sample Running with debug and verbose options points that issue is while allocating memory for sample histograms. The error path is: symbol__inc_addr_samples() -> __symbol__inc_addr_samples() -> annotated_source__histogram() symbol__inc_addr_samples() calls annotated_source__alloc_histograms () to allocate memory for sample histograms using calloc(). Here calloc() fails since the size of symbol is huge. The size of a symbol is calculated as difference between its start and end address. Example histogram allocation that fails is: sym->name is _end sym->start is 0xc0000000027a0000 sym->end is 0xc008000003890000 symbol__size(sym) is 0x80000010f0000 In the above case, the difference between sym->start (0xc0000000027a0000) and sym->end (0xc008000003890000) is huge. This is same problem as in s390 and arm64 which are fixed in commits: b9c0a64901d5 ("perf annotate: Fix s390 gap between kernel end and module start") 78886f3ed37e ("perf symbols: Fix arm64 gap between kernel start and module end") When this symbol was read first, its start and end address was set to address which matches with data from /proc/kallsyms. After symbol__new(): symbol__new: _end 0xc0000000027a0000-0xc0000000027a0000 From /proc/kallsyms: ... c000000002799370 b backtrace_flag c000000002799378 B radix_tree_node_cachep c000000002799380 B __bss_stop c0000000027a0000 B _end c008000003890000 t icmp_checkentry [ip_tables] c008000003890038 t ipt_alloc_initial_table [ip_tables] c008000003890468 T ipt_do_table [ip_tables] c008000003890de8 T ipt_unregister_table_pre_exit [ip_tables] ... Perf calls function symbols__fixup_end() which sets the end of symbol to 0xc008000003890000, which is the next address and this is the start address of first module (icmp_checkentry in above) which will make the huge symbol size of 0x80000010f0000. After symbols__fixup_end: symbols__fixup_end: sym->name: _end sym->start: 0xc0000000027a0000 sym->end: 0xc008000003890000 On powerpc, kernel text segment is located at 0xc000000000000000 whereas the modules are located at very high memory addresses, 0xc00800000xxxxxxx. Since the gap between end of kernel text segment and beginning of first module's address is high, histogram allocation using calloc fails. Fix this by detecting the kernel's last symbol and limiting the range of last kernel symbol to pagesize. Signed-off-by: Athira Rajeev<atrajeev@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-By: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1609208054-1566-1-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf inject jit: Add namespaces supportYonatan Goldschmidt5-22/+82
This patch fixes "perf inject --jit" to properly operate on namespaced/containerized processes: * jitdump files are generated by the process, thus they should be looked up in its mount NS. * DSOs of injected MMAP events will later be looked up in the process mount NS, so write them into its NS. * PIDs & TIDs from jitdump events need to be translated to the PID as seen by "perf record" before written into MMAP events. For a process in a different PID NS, the TID & PID given in the jitdump event are actually ignored; I use the TID & PID of the thread which mmap()ed the jitdump file. This is simplified and won't do for forks of the initial process, if they continue using the same jitdump file. Future patches might improve it. This was tested by recording a NodeJS process running with "--perf-prof", inside a Docker container, and by recording another NodeJS process running in the same namespaces as perf itself, to make sure it's not broken for non-containerized processes. Signed-off-by: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20201105015604.1726943-1-yonatan.goldschmidt@granulate.io Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf namespaces: Add 'in_pidns' to nsinfo structYonatan Goldschmidt2-2/+10
Provides an accurate mean to determine if the owner thread is in a different PID namespace. Signed-off-by: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20201105015418.1725218-1-yonatan.goldschmidt@granulate.io Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf tools: Use scandir() to iterate threads when synthesizing PERF_RECORD_ ↵Namhyung Kim1-11/+17
events Like in __event__synthesize_thread(), I think it's better to use scandir() instead of the readdir() loop. In case some malicious task continues to create new threads, the readdir() loop will run over and over to collect tids. The scandir() also has the problem but the window is much smaller since it doesn't do much work during the iteration. Also add filter_task() function as we only care the tasks. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210202090118.2008551-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf tools: Skip PERF_RECORD_MMAP event synthesis for kernel threadsNamhyung Kim1-9/+23
To synthesize information to resolve sample IPs, it needs to scan task and mmap info from the /proc filesystem. For each process, it opens (and reads) status and maps file respectively. But as kernel threads don't have memory maps so we can skip the maps file. To find kernel threads, check "VmPeak:" line in /proc/<PID>/status file. It's about the peak virtual memory usage so only user-level tasks have that. Note that it's possible to miss the line due to partial reads. So we should double-check if it's a really kernel thread when there's no VmPeak line. Thus check "Threads:" line (which follows the VmPeak line whether or not it exists) to be sure it's read enough data - just in case of deeply nested pid namespaces or large number of supplementary groups are involved. This is for user process: $ head -40 /proc/1/status Name: systemd Umask: 0000 State: S (sleeping) Tgid: 1 Ngid: 0 Pid: 1 PPid: 0 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 256 Groups: NStgid: 1 NSpid: 1 NSpgid: 1 NSsid: 1 VmPeak: 234192 kB <-- here VmSize: 169964 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 29528 kB VmRSS: 6104 kB RssAnon: 2756 kB RssFile: 3348 kB RssShmem: 0 kB VmData: 19776 kB VmStk: 1036 kB VmExe: 784 kB VmLib: 9532 kB VmPTE: 116 kB VmSwap: 2400 kB HugetlbPages: 0 kB CoreDumping: 0 THP_enabled: 1 Threads: 1 <-- and here SigQ: 1/62808 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 7be3c0fe28014a03 SigIgn: 0000000000001000 And this is for kernel thread: $ head -20 /proc/2/status Name: kthreadd Umask: 0000 State: S (sleeping) Tgid: 2 Ngid: 0 Pid: 2 PPid: 0 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 64 Groups: NStgid: 2 NSpid: 2 NSpgid: 0 NSsid: 0 Threads: 1 <-- here SigQ: 1/62808 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210202090118.2008551-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf tools: Use /proc/<PID>/task/<TID>/status for PERF_RECORD_ event synthesisNamhyung Kim1-11/+14
To save memory usage, it needs to reduce the number of entries in the proc filesystem. It's using /proc/<PID>/task directory to traverse threads in the process and then kernel creates /proc/<PID>/task/<TID> entries. After that it checks the thread info using the /proc/<TID>/status file rather than /proc/<PID>/task/<TID>/status. As far as I can see, they are the same and contain all the info we need. Using the latter eliminates the unnecessary /proc/<TID> entry. This can be useful especially a large number of threads are used in the system. In my experiment around 1KB of memory on average was saved for each thread (which is not a thread group leader). To do this, pass both pid and tid to perf_event_prepare_comm() if it knows them. In case it doesn't know, passing 0 as pid will do the old way. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210202090118.2008551-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf vendor events arm64: Reference common and uarch events for A76John Garry8-150/+76
Reduce duplication in the JSONs by referencing standard events from armv8-common-and-microarch.json In general the "PublicDescription" fields are not modified when somewhat significantly worded differently than the standard. Apart from that, description and names for events slightly different to standard are changed (to standard) for consistency. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxarm@openeuler.org Link: https://lore.kernel.org/r/1611835236-34696-5-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf vendor events arm64: Reference common and uarch events for Ampere eMagJohn Garry7-97/+31
Reduce duplication in the JSONs by referencing standard events from armv8-common-and-microarch.json In general the "PublicDescription" fields are not modified when somewhat significantly worded differently than the standard. Apart from that, description and names for events slightly different to standard are changed (to standard) for consistency. Note that names for events 0x34 and 0x35 are non-standard and remain unchanged. Those events came from the following originally: https://github.com/AmpereComputing/ampere-centos-kernel/blob/4c2479c67bbcf35b35224db12a092b33682b181c/Documentation/arm64/eMAG-ARM-CoreImpDefined.pdf Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com> Cc: mathieu.poirier@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: linuxarm@openeuler.org Link: https://lore.kernel.org/r/1611835236-34696-4-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf vendor events arm64: Add common and uarch event JSONJohn Garry1-0/+248
Add a common and microarch JSON, which can be referenced from CPU JSONs. For now, brief and public description are as event brief event description from the ARMv8 ARM [0], D7-11. The list of events is not complete, as not all events will be referenced yet. Reference document is at the following: [0] https://documentation-service.arm.com/static/5fa3bd1eb209f547eebd4141?token= Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxarm@openeuler.org Link: https://lore.kernel.org/r/1611835236-34696-3-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf vendor events arm64: Fix Ampere eMag event typoJohn Garry1-1/+1
The "briefdescription" for event 0x35 has a typo - fix it. Fixes: d35c595bf005 ("perf vendor events arm64: Revise core JSON events for eMAG") Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxarm@openeuler.org Link: https://lore.kernel.org/r/1611835236-34696-2-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf script: Support DSO filter like in other perf toolsJin Yao2-0/+5
Other perf tool builtins already supported a DSO filter. For example: $ perf report --dsos a,b,c which only considers symbols in these dsos. Now the DSO filter is supported in 'perf script': root@kbl-ppc:~# ./perf script --dsos "[kernel.kallsyms]" perf 18123 [000] 6142863.075104: 1 cycles: ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms]) perf 18123 [000] 6142863.075107: 1 cycles: ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms]) perf 18123 [000] 6142863.075108: 10 cycles: ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms]) perf 18123 [000] 6142863.075109: 273 cycles: ffffffff9ca7730a native_write_msr+0xa ([kernel.kallsyms]) perf 18123 [000] 6142863.075110: 7684 cycles: ffffffff9ca3c9c0 native_sched_clock+0x50 ([kernel.kallsyms]) perf 18123 [000] 6142863.075112: 213017 cycles: ffffffff9d765a92 syscall_exit_to_user_mode+0x32 ([kernel.kallsyms]) perf 18123 [001] 6142863.075156: 1 cycles: ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms]) perf 18123 [001] 6142863.075158: 1 cycles: ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms]) perf 18123 [001] 6142863.075159: 17 cycles: ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms]) Committer testing: $ perf script ls 2364888 29303.010949: 1 cycles:u: ffffffffa4bbc6a9 [unknown] ([unknown]) ls 2364888 29303.010957: 1 cycles:u: ffffffffa429ef48 [unknown] ([unknown]) ls 2364888 29303.010961: 1 cycles:u: ffffffffa4260133 [unknown] ([unknown]) ls 2364888 29303.010964: 5 cycles:u: ffffffffa429efad [unknown] ([unknown]) ls 2364888 29303.010967: 41 cycles:u: ffffffffa42a4586 [unknown] ([unknown]) ls 2364888 29303.010972: 435 cycles:u: ffffffffa429efe0 [unknown] ([unknown]) ls 2364888 29303.010978: 5142 cycles:u: 7f9b95bc2abf __GI___tunables_init+0x11f (/usr/lib64/ld-2.32.so) ls 2364888 29303.011006: 38551 cycles:u: ffffffffa4290f61 [unknown] ([unknown]) ls 2364888 29303.011486: 238234 cycles:u: 7f9b95bb7741 _dl_relocate_object+0xa71 (/usr/lib64/ld-2.32.so) ls 2364888 29303.011937: 415870 cycles:u: 7f9b95a1c80e __strcoll_l+0xe (/usr/lib64/libc-2.32.so) $ Before: $ perf script --dsos /usr/lib64/libc-2.32.so |& head -5 Error: unknown option `dsos' Usage: perf script [<options>] or: perf script [<options>] record <script> [<record-options>] <command> or: perf script [<options>] report <script> [script-args] $ After: $ perf script --dsos /usr/lib64/libc-2.32.so ls 2364888 29303.011937: 415870 cycles:u: 7f9b95a1c80e __strcoll_l+0xe (/usr/lib64/libc-2.32.so) $ Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210124232750.19170-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf tools: Fix DSO filtering when not finding a map for a sampled addressArnaldo Carvalho de Melo1-0/+2
When we lookup an address and don't find a map we should filter that sample if the user specified a list of --dso entries to filter on, fix it. Before: $ perf script sleep 274800 2843.556162: 1 cycles:u: ffffffffbb26bff4 [unknown] ([unknown]) sleep 274800 2843.556168: 1 cycles:u: ffffffffbb2b047d [unknown] ([unknown]) sleep 274800 2843.556171: 1 cycles:u: ffffffffbb2706b2 [unknown] ([unknown]) sleep 274800 2843.556174: 6 cycles:u: ffffffffbb2b0267 [unknown] ([unknown]) sleep 274800 2843.556176: 59 cycles:u: ffffffffbb2b03b1 [unknown] ([unknown]) sleep 274800 2843.556180: 691 cycles:u: ffffffffbb26bff4 [unknown] ([unknown]) sleep 274800 2843.556189: 9160 cycles:u: 7fa9550eeaa3 __GI___tunables_init+0xf3 (/usr/lib64/ld-2.32.so) sleep 274800 2843.556312: 86937 cycles:u: 7fa9550e157b _dl_lookup_symbol_x+0x4b (/usr/lib64/ld-2.32.so) $ So we have some samples we somehow didn't find in a map for, if we now do: $ perf report --stdio --dso /usr/lib64/ld-2.32.so # dso: /usr/lib64/ld-2.32.so # # Total Lost Samples: 0 # # Samples: 8 of event 'cycles:u' # Event count (approx.): 96856 # # Overhead Command Symbol # ........ ....... ........................ # 89.76% sleep [.] _dl_lookup_symbol_x 9.46% sleep [.] __GI___tunables_init 0.71% sleep [k] 0xffffffffbb26bff4 0.06% sleep [k] 0xffffffffbb2b03b1 0.01% sleep [k] 0xffffffffbb2b0267 0.00% sleep [k] 0xffffffffbb2706b2 0.00% sleep [k] 0xffffffffbb2b047d $ After this patch we get the right output with just entries for the DSOs specified in --dso: $ perf report --stdio --dso /usr/lib64/ld-2.32.so # dso: /usr/lib64/ld-2.32.so # # Total Lost Samples: 0 # # Samples: 8 of event 'cycles:u' # Event count (approx.): 96856 # # Overhead Command Symbol # ........ ....... ........................ # 89.76% sleep [.] _dl_lookup_symbol_x 9.46% sleep [.] __GI___tunables_init $ # Fixes: 96415e4d3f5fdf9c ("perf symbols: Avoid unnecessary symbol loading when dso list is specified") Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210128131209.GD775562@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf stat: Add Topdown metrics events as default eventsKan Liang5-0/+26
The Topdown Microarchitecture Analysis (TMA) Method is a structured analysis methodology to identify critical performance bottlenecks in out-of-order processors. From the Ice Lake and later platforms, the Topdown information can be retrieved from the dedicated "metrics" register, which isn't impacted by other events. Also, the Topdown metrics support both per thread/process and per core measuring. Adding Topdown metrics events as default events can enrich the default measuring information, and would not cost any extra multiplexing. Introduce arch_evlist__add_default_attrs() to allow architecture specific default events. Add the Topdown metrics events in the X86 specific arch_evlist__add_default_attrs(). Other architectures can add their own default events later separately. With the patch: $ perf stat sleep 1 Performance counter stats for 'sleep 1': 0.82 msec task-clock:u # 0.001 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 61 page-faults:u # 0.074 M/sec 319,941 cycles:u # 0.388 GHz 242,802 instructions:u # 0.76 insn per cycle 54,380 branches:u # 66.028 M/sec 4,043 branch-misses:u # 7.43% of all branches 1,585,555 slots:u # 1925.189 M/sec 238,941 topdown-retiring:u # 15.0% retiring 410,378 topdown-bad-spec:u # 25.8% bad speculation 634,222 topdown-fe-bound:u # 39.9% frontend bound 304,675 topdown-be-bound:u # 19.2% backend bound 1.001791625 seconds time elapsed 0.000000000 seconds user 0.001572000 seconds sys Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20210121133752.118327-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf test: Add parse-metric memory bandwidth testcaseJohn Garry1-0/+24
Event duration_time in a metric expression requires special handling. Improve test coverage by including a metric whose expression includes duration_time. The actual metric is a copied from the L1D_Cache_Fill_BW metric on my broadwell machine. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linuxarm@openeuler.org Link: http://lore.kernel.org/lkml/1611578842-5749-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-02Merge tag 'clang-format-for-linux-v5.11-rc7' of git://github.com/ojeda/linuxLinus Torvalds1-4/+7
Pull clang-format update from Miguel Ojeda: "Update with the latest for_each macro list" * tag 'clang-format-for-linux-v5.11-rc7' of git://github.com/ojeda/linux: clang-format: Update with the latest for_each macro list
2021-02-02Merge tag 'dma-mapping-5.11-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-1/+5
Pull dma-mapping fix from Christoph Hellwig: "Fix a kernel crash in the new dma-mapping benchmark test (Barry Song)" * tag 'dma-mapping-5.11-1' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: benchmark: fix kernel crash when dma_map_single fails
2021-02-02Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2-16/+13
Pull vdpa fix from Michael Tsirkin: "A single mlx bugfix" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa/mlx5: Fix memory key MTT population
2021-02-02Merge tag 'net-5.11-rc7' of ↵Linus Torvalds44-107/+328
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.11-rc7, including fixes from bpf and mac80211 trees. Current release - regressions: - ip_tunnel: fix mtu calculation - mlx5: fix function calculation for page trees Previous releases - regressions: - vsock: fix the race conditions in multi-transport support - neighbour: prevent a dead entry from updating gc_list - dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add Previous releases - always broken: - bpf, cgroup: two copy_{from,to}_user() warn_on_once splats for BPF cgroup getsockopt infra when user space is trying to race against optlen, from Loris Reiff. - bpf: add missing fput() in BPF inode storage map update helper - udp: ipv4: manipulate network header of NATed UDP GRO fraglist - mac80211: fix station rate table updates on assoc - r8169: work around RTL8125 UDP HW bug - igc: report speed and duplex as unknown when device is runtime suspended - rxrpc: fix deadlock around release of dst cached on udp tunnel" * tag 'net-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits) net: hsr: align sup_multicast_addr in struct hsr_priv to u16 boundary net: ipa: fix two format specifier errors net: ipa: use the right accessor in ipa_endpoint_status_skip() net: ipa: be explicit about endianness net: ipa: add a missing __iomem attribute net: ipa: pass correct dma_handle to dma_free_coherent() r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is set net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS net: mvpp2: TCAM entry enable should be written after SRAM data net: lapb: Copy the skb before sending a packet net/mlx5e: Release skb in case of failure in tc update skb net/mlx5e: Update max_opened_tc also when channels are closed net/mlx5: Fix leak upon failure of rule creation net/mlx5: Fix function calculation for page trees docs: networking: swap words in icmp_errors_use_inbound_ifaddr doc udp: ipv4: manipulate network header of NATed UDP GRO fraglist net: ip_tunnel: fix mtu calculation vsock: fix the race conditions in multi-transport support net: sched: replaced invalid qdisc tree flush helper in qdisc_replace ibmvnic: device remove has higher precedence over reset ...
2021-02-02net: hsr: align sup_multicast_addr in struct hsr_priv to u16 boundaryAndreas Oetken1-1/+4
sup_multicast_addr is passed to ether_addr_equal for address comparison which casts the address inputs to u16 leading to an unaligned access. Aligning the sup_multicast_addr to u16 boundary fixes the issue. Signed-off-by: Andreas Oetken <andreas.oetken@siemens.com> Link: https://lore.kernel.org/r/20210202090304.2740471-1-ennoerlangen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge tag 'mlx5-fixes-2021-02-01' of ↵Jakub Kicinski4-9/+20
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-02-01 Please note the first patch in this series ("Fix function calculation for page trees") is fixing a regression due to previous fix in net which you didn't include in your previous rc pr. So I hope this series will make it into your next rc pr, so mlx5 won't be broken in the next rc. * tag 'mlx5-fixes-2021-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Release skb in case of failure in tc update skb net/mlx5e: Update max_opened_tc also when channels are closed net/mlx5: Fix leak upon failure of rule creation net/mlx5: Fix function calculation for page trees ==================== Link: https://lore.kernel.org/r/20210202070703.617251-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge branch 'net-ipa-a-few-bug-fixes'Jakub Kicinski3-6/+6
Alex Elder says: ==================== net: ipa: a few bug fixes This series fixes four minor bugs. The first two are things that sparse points out. All four are very simple and each patch should explain itself pretty well. ==================== Link: https://lore.kernel.org/r/20210201232609.3524451-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: fix two format specifier errorsAlex Elder1-2/+2
Fix two format specifiers that used %lu for a size_t in "ipa_mem.c". Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: use the right accessor in ipa_endpoint_status_skip()Alex Elder1-2/+2
When extracting the destination endpoint ID from the status in ipa_endpoint_status_skip(), u32_get_bits() is used. This happens to work, but it's wrong: the structure field is only 8 bits wide instead of 32. Fix this by using u8_get_bits() to get the destination endpoint ID. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: be explicit about endiannessAlex Elder1-1/+1
Sparse warns that the assignment of the metadata mask for a QMAP endpoint in ipa_endpoint_init_hdr_metadata_mask() is a bad assignment. We know we want the mask value to be big endian, even though the value we write is in host byte order. Use a __force tag to indicate we really mean it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: add a missing __iomem attributeAlex Elder1-1/+1
The virt local variable in gsi_channel_state() does not have an __iomem attribute but should. Fix this. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Amy Parker <enbyamy@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: pass correct dma_handle to dma_free_coherent()Dan Carpenter1-1/+1
The "ring->addr = addr;" assignment is done a few lines later so we can't use "ring->addr" yet. The correct dma_handle is "addr". Fixes: 650d1603825d ("soc: qcom: ipa: the generic software interface") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/YBjpTU2oejkNIULT@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is setHeiner Kallweit1-2/+2
So far phy_disconnect() is called before free_irq(). If CONFIG_DEBUG_SHIRQ is set and interrupt is shared, then free_irq() creates an "artificial" interrupt by calling the interrupt handler. The "link change" flag is set in the interrupt status register, causing phylib to eventually call phy_suspend(). Because the net_device is detached from the PHY already, the PHY driver can't recognize that WoL is configured and powers down the PHY. Fixes: f1e911d5d0df ("r8169: add basic phylib support") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/fe732c2c-a473-9088-3974-df83cfbd6efd@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGSSabyrzhan Tasbolatov1-0/+3
syzbot found WARNING in rds_rdma_extra_size [1] when RDS_CMSG_RDMA_ARGS control message is passed with user-controlled 0x40001 bytes of args->nr_local, causing order >= MAX_ORDER condition. The exact value 0x40001 can be checked with UIO_MAXIOV which is 0x400. So for kcalloc() 0x400 iovecs with sizeof(struct rds_iovec) = 0x10 is the closest limit, with 0x10 leftover. Same condition is currently done in rds_cmsg_rdma_args(). [1] WARNING: mm/page_alloc.c:5011 [..] Call Trace: alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267 alloc_pages include/linux/gfp.h:547 [inline] kmalloc_order+0x2e/0xb0 mm/slab_common.c:837 kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853 kmalloc_array include/linux/slab.h:592 [inline] kcalloc include/linux/slab.h:621 [inline] rds_rdma_extra_size+0xb2/0x3b0 net/rds/rdma.c:568 rds_rm_size net/rds/send.c:928 [inline] Reported-by: syzbot+1bd2b07f93745fa38425@syzkaller.appspotmail.com Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Link: https://lore.kernel.org/r/20210201203233.1324704-1-snovitoll@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: mvpp2: TCAM entry enable should be written after SRAM dataStefan Chulski1-5/+5
Last TCAM data contains TCAM enable bit. It should be written after SRAM data before entry enabled. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Stefan Chulski <stefanc@marvell.com> Link: https://lore.kernel.org/r/1612172139-28343-1-git-send-email-stefanc@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: lapb: Copy the skb before sending a packetXie He1-1/+2
When sending a packet, we will prepend it with an LAPB header. This modifies the shared parts of a cloned skb, so we should copy the skb rather than just clone it, before we prepend the header. In "Documentation/networking/driver.rst" (the 2nd point), it states that drivers shouldn't modify the shared parts of a cloned skb when transmitting. The "dev_queue_xmit_nit" function in "net/core/dev.c", which is called when an skb is being sent, clones the skb and sents the clone to AF_PACKET sockets. Because the LAPB drivers first remove a 1-byte pseudo-header before handing over the skb to us, if we don't copy the skb before prepending the LAPB header, the first byte of the packets received on AF_PACKET sockets can be corrupted. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20210201055706.415842-1-xie.he.0141@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge tag 'mac80211-for-net-2021-02-02' of ↵Jakub Kicinski3-4/+8
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Two fixes: - station rate tables were not updated correctly after association, leading to bad configuration - rtl8723bs (staging) was initializing data incorrectly after the previous fix and needed to move the init later * tag 'mac80211-for-net-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211: staging: rtl8723bs: Move wiphy setup to after reading the regulatory settings from the chip mac80211: fix station rate table updates on assoc ==================== Link: https://lore.kernel.org/r/20210202143505.37610-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01net/mlx5e: Release skb in case of failure in tc update skbMaor Dickman1-4/+12
In case of failure in tc update skb the packet is dropped without freeing the skb. Fixed by freeing the skb in case failure in tc update skb. Fixes: d6d27782864f ("net/mlx5: E-Switch, Restore chain id on miss") Fixes: c75690972228 ("net/mlx5e: Add tc chains offload support for nic flows") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01net/mlx5e: Update max_opened_tc also when channels are closedMaxim Mikityanskiy1-4/+2
max_opened_tc is used for stats, so that potentially non-zero stats won't disappear when num_tc decreases. However, mlx5e_setup_tc_mqprio fails to update it in the flow where channels are closed. This commit fixes it. The new value of priv->channels.params.num_tc is always checked on exit. In case of errors it will just be the old value, and in case of success it will be the updated value. Fixes: 05909babce53 ("net/mlx5e: Avoid reset netdev stats on configuration changes") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01net/mlx5: Fix leak upon failure of rule creationMaor Gottlieb1-0/+5
When creation of a new rule that requires allocation of an FTE fails, need to call to tree_put_node on the FTE in order to release its' resource. Fixes: cefc23554fc2 ("net/mlx5: Fix FTE cleanup") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Alaa Hleihel <alaa@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01net/mlx5: Fix function calculation for page treesDaniel Jurgens1-1/+1
The function calculation always results in a value of 0. This works generally, but when the release all pages feature is enabled it will result in crashes. Fixes: 0aa128475d33 ("net/mlx5: Maintain separate page trees for ECPF and PF functions") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01Merge branch '1GbE' of ↵Jakub Kicinski5-17/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-02-01 This series contains updates to igc and i40e drivers. Kai-Heng Feng fixes igc to report unknown speed and duplex during suspend as an attempted read will cause errors. Kevin Lo sets the default value to -IGC_ERR_NVM instead of success for writing shadow RAM as this could miss a timeout. Also propagates the return value for Flow Control configuration to properly pass on errors for igc. Aleksandr reverts commit 2ad1274fa35a ("i40e: don't report link up for a VF who hasn't enabled queues") as this can cause link flapping. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues" igc: check return value of ret_val in igc_config_fc_after_link_up igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr igc: Report speed and duplex as unknown when device is runtime suspended ==================== Link: https://lore.kernel.org/r/20210201214618.852831-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01docs: networking: swap words in icmp_errors_use_inbound_ifaddr docVincent Bernat1-1/+1
Signed-off-by: Vincent Bernat <vincent@bernat.ch> Link: https://lore.kernel.org/r/20210130190518.854806-1-vincent@bernat.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01udp: ipv4: manipulate network header of NATed UDP GRO fraglistDongseok Yi3-7/+66
UDP/IP header of UDP GROed frag_skbs are not updated even after NAT forwarding. Only the header of head_skb from ip_finish_output_gso -> skb_gso_segment is updated but following frag_skbs are not updated. A call path skb_mac_gso_segment -> inet_gso_segment -> udp4_ufo_fragment -> __udp_gso_segment -> __udp_gso_segment_list does not try to update UDP/IP header of the segment list but copy only the MAC header. Update port, addr and check of each skb of the segment list in __udp_gso_segment_list. It covers both SNAT and DNAT. Fixes: 9fd1ff5d2ac7 (udp: Support UDP fraglist GRO/GSO.) Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Link: https://lore.kernel.org/r/1611962007-80092-1-git-send-email-dseok.yi@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01net: ip_tunnel: fix mtu calculationVadim Fedorenko1-9/+7
dev->hard_header_len for tunnel interface is set only when header_ops are set too and already contains full overhead of any tunnel encapsulation. That's why there is not need to use this overhead twice in mtu calc. Fixes: fdafed459998 ("ip_gre: set dev->hard_header_len and dev->needed_headroom properly") Reported-by: Slava Bacherikov <mail@slava.cc> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/1611959267-20536-1-git-send-email-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01vsock: fix the race conditions in multi-transport supportAlexander Popov1-5/+12
There are multiple similar bugs implicitly introduced by the commit c0cfa2d8a788fcf4 ("vsock: add multi-transports support") and commit 6a2c0962105ae8ce ("vsock: prevent transport modules unloading"). The bug pattern: [1] vsock_sock.transport pointer is copied to a local variable, [2] lock_sock() is called, [3] the local variable is used. VSOCK multi-transport support introduced the race condition: vsock_sock.transport value may change between [1] and [2]. Let's copy vsock_sock.transport pointer to local variables after the lock_sock() call. Fixes: c0cfa2d8a788fcf4 ("vsock: add multi-transports support") Signed-off-by: Alexander Popov <alex.popov@linux.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Link: https://lore.kernel.org/r/20210201084719.2257066-1-alex.popov@linux.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01net: sched: replaced invalid qdisc tree flush helper in qdisc_replaceAlexander Ovechkin1-1/+1
Commit e5f0e8f8e456 ("net: sched: introduce and use qdisc tree flush/purge helpers") introduced qdisc tree flush/purge helpers, but erroneously used flush helper instead of purge helper in qdisc_replace function. This issue was found in our CI, that tests various qdisc setups by configuring qdisc and sending data through it. Call of invalid helper sporadically leads to corruption of vt_tree/cf_tree of hfsc_class that causes kernel oops: Oops: 0000 [#1] SMP PTI CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.11.0-8f6859df #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:rb_insert_color+0x18/0x190 Code: c3 31 c0 c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 48 8b 07 48 85 c0 0f 84 05 01 00 00 48 8b 10 f6 c2 01 0f 85 34 01 00 00 <48> 8b 4a 08 49 89 d0 48 39 c1 74 7d 48 85 c9 74 32 f6 01 01 75 2d RSP: 0018:ffffc900000b8bb0 EFLAGS: 00010246 RAX: ffff8881ef4c38b0 RBX: ffff8881d956e400 RCX: ffff8881ef4c38b0 RDX: 0000000000000000 RSI: ffff8881d956f0a8 RDI: ffff8881d956e4b0 RBP: 0000000000000000 R08: 000000d5c4e249da R09: 1600000000000000 R10: ffffc900000b8be0 R11: ffffc900000b8b28 R12: 0000000000000001 R13: 000000000000005a R14: ffff8881f0905000 R15: ffff8881f0387d00 FS: 0000000000000000(0000) GS:ffff8881f8b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 00000001f4796004 CR4: 0000000000060ee0 Call Trace: <IRQ> init_vf.isra.19+0xec/0x250 [sch_hfsc] hfsc_enqueue+0x245/0x300 [sch_hfsc] ? fib_rules_lookup+0x12a/0x1d0 ? __dev_queue_xmit+0x4b6/0x930 ? hfsc_delete_class+0x250/0x250 [sch_hfsc] __dev_queue_xmit+0x4b6/0x930 ? ip6_finish_output2+0x24d/0x590 ip6_finish_output2+0x24d/0x590 ? ip6_output+0x6c/0x130 ip6_output+0x6c/0x130 ? __ip6_finish_output+0x110/0x110 mld_sendpack+0x224/0x230 mld_ifc_timer_expire+0x186/0x2c0 ? igmp6_group_dropped+0x200/0x200 call_timer_fn+0x2d/0x150 run_timer_softirq+0x20c/0x480 ? tick_sched_do_timer+0x60/0x60 ? tick_sched_timer+0x37/0x70 __do_softirq+0xf7/0x2cb irq_exit+0xa0/0xb0 smp_apic_timer_interrupt+0x74/0x150 apic_timer_interrupt+0xf/0x20 </IRQ> Fixes: e5f0e8f8e456 ("net: sched: introduce and use qdisc tree flush/purge helpers") Signed-off-by: Alexander Ovechkin <ovov@yandex-team.ru> Reported-by: Alexander Kuznetsov <wwfq@yandex-team.ru> Acked-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Acked-by: Dmitry Yakunin <zeil@yandex-team.ru> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/20210201200049.299153-1-ovov@yandex-team.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01ibmvnic: device remove has higher precedence over resetLijun Pan1-5/+0
Returning -EBUSY in ibmvnic_remove() does not actually hold the removal procedure since driver core doesn't care for the return value (see __device_release_driver() in drivers/base/dd.c calling dev->bus->remove()) though vio_bus_remove (in arch/powerpc/platforms/pseries/vio.c) records the return value and passes it on. [1] During the device removal precedure, checking for resetting bit is dropped so that we can continue executing all the cleanup calls in the rest of the remove function. Otherwise, it can cause latent memory leaks and kernel crashes. [1] https://lore.kernel.org/linuxppc-dev/20210117101242.dpwayq6wdgfdzirl@pengutronix.de/T/#m48f5befd96bc9842ece2a3ad14f4c27747206a53 Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Fixes: 7d7195a026ba ("ibmvnic: Do not process device remove during device reset") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Link: https://lore.kernel.org/r/20210129043402.95744-1-ljp@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01net: dsa: mv88e6xxx: override existent unicast portvec in port_fdb_addDENG Qingfang1-1/+5
Having multiple destination ports for a unicast address does not make sense. Make port_db_load_purge override existent unicast portvec instead of adding a new port bit. Fixes: 884729399260 ("net: dsa: mv88e6xxx: handle multiple ports in ATU") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20210130134334.10243-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"Aleksandr Loktionov2-13/+1
This reverts commit 2ad1274fa35ace5c6360762ba48d33b63da2396c VF queues were not brought up when PF was brought up after being downed if the VF driver disabled VFs queues during PF down. This could happen in some older or external VF driver implementations. The problem was that PF driver used vf->queues_enabled as a condition to decide what link-state it would send out which caused the issue. Remove the check for vf->queues_enabled in the VF link notify. Now VF will always be notified of the current link status. Also remove the queues_enabled member from i40e_vf structure as it is not used anymore. Otherwise VNF implementation was broken and caused a link flap. The original commit was a workaround to avoid breaking existing VFs though it's really a fault of the VF code not the PF. The commit should be safe to revert as all of the VFs we know of have been fixed. Also, since we now know there is a related bug in the workaround, removing it is preferred. Fixes: 2ad1274fa35a ("i40e: don't report link up for a VF who hasn't enabled") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-01Merge tag 'media/v5.11-3' of ↵Linus Torvalds6-26/+114
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "The rockship rkisp1 driver will be promoted from staging in 5.11. While not too late, do a few uAPI changes which are needed to better support its functionalities" * tag 'media/v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: rockchip: rkisp1: extend uapi array sizes media: rockchip: rkisp1: carry ip version information media: rockchip: rkisp1: reduce number of histogram grid elements in uapi media: rkisp1: stats: mask the hist_bins values media: rkisp1: stats: remove a wrong cast to u8 media: rkisp1: uapi: change hist_bins array type from __u16 to __u32
2021-02-01staging: rtl8723bs: Move wiphy setup to after reading the regulatory ↵Hans de Goede1-2/+2
settings from the chip Commit 81f153faacd0 ("staging: rtl8723bs: fix wireless regulatory API misuse") moved the wiphy_apply_custom_regulatory() call to earlier in the driver's init-sequence, so that it gets called before wiphy_register(). But at this point in time the eFuses which code the regulatory-settings for the chip have not been read by the driver yet, causing _rtw_reg_apply_flags() to set the IEEE80211_CHAN_DISABLED flag on *all* channels. On the device where I initially tested the fix, a Jumper EZpad 7 tablet, this does not cause any problems because shortly after init the rtw_reg_notifier() gets called fixing things up. I guess this happens into response to receiving a (broadcast) packet with regulatory info from the access-point ? But on another device with a RTL8723BS wifi chip, an Acer Switch 10E (SW3-016), the rtw_reg_notifier() never gets called. I assume that some fuse has been set on this device to ignore regulatory info received from access-points. This means that on the Acer the driver is stuck in a state with all channels disabled, leading to non working Wifi. We cannot move the wiphy_apply_custom_regulatory() call back, because that call must be made before the wiphy_register() call. Instead move the entire rtw_wdev_alloc() call to after the Efuses have been read, fixing all channels being disabled in the initial channel-map. Fixes: 81f153faacd0 ("staging: rtl8723bs: fix wireless regulatory API misuse") Cc: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20210201152956.370186-2-hdegoede@redhat.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>